elm package¶
elm.elmk Module¶
This file contains ELMKernel classes and all developed methods.
- class elm.elmk.ELMKernel(params=[])[source]¶
Bases: elm.mltools.MLTools
A Python implementation of ELM Kernel defined by Huang[1].
An ELM is a single-hidden layer feedforward network (SLFN) proposed by Huang back in 2006, in 2012 the author revised and introduced a new concept of using kernel functions to his previous work.
This implementation currently accepts both methods proposed at 2012, random neurons and kernel functions to estimate classifier/regression functions.
Let the dimensionality “d” of the problem be the sum of “t” size (number of targets per pattern) and “f” size (number of features per pattern). So, d = t + f
The data will be set as Pattern = (Target | Features).
If database has N patterns, its size follows Nxd.
Note
[1] Paper reference: Huang, 2012, “Extreme Learning Machine for Regression and Multiclass Classification”
Variables: - output_weight (numpy.ndarray) – a column vector (Nx1) calculated after training, represent :math:beta.
- training_patterns (numpy.ndarray) –
a matrix (Nxd) containing all patterns used for training.
Need to save all training patterns to perform kernel calculation at testing and prediction phase.
- param_kernel_function (str) – kernel function that will be used for training.
- param_c (float) – regularization coefficient (C) used for training.
- param_kernel_params (list of float) – kernel function parameters that will be used for training.
Other Parameters: - regressor_name (str) – The name of classifier/regressor.
- available_kernel_functions (list of str) – List with all available kernel functions.
- default_param_kernel_function (str) – Default kernel function if not set at class constructor.
- default_param_c (float) – Default parameter c value if not set at class constructor.
- default_param_kernel_params (list of float) – Default kernel function parameters if not set at class constructor.
Note
- regressor_name: defaults to “elmk”.
- default_param_kernel_function: defaults to “rbf”.
- default_param_c: defaults to 9.
- default_param_kernel_params: defaults to [-15].
- __init__(params=[])[source]¶
Class constructor.
Parameters: params (list) – first argument (str) is an available kernel function, second argument (float) is the coefficient C of regularization and the third and last argument is a list of arguments for the kernel function. Example
>>> import elm >>> params = ["linear", 5, []] >>> elmk = elm.ELMKernel(params)
- predict(horizon=1)[source]¶
Predict next targets based on previous training.
Parameters: horizon (int) – number of predictions. Returns: numpy.ndarray: a column vector containing all predicted targets.
- search_param(database, dataprocess=None, path_filename=(u'', u''), save=False, cv=u'ts', of=u'rmse', kf=None, eval=50)[source]¶
Search best hyperparameters for classifier/regressor based on optunity algorithms.
Parameters: - database (numpy.ndarray) – a matrix containing all patterns that will be used for training/testing at some cross-validation method.
- dataprocess (DataProcess) – an object that will pre-process database before training. Defaults to None.
- path_filename (tuple) – TODO.
- save (bool) – TODO.
- cv (str) – Cross-validation method. Defaults to “ts”.
- of (str) – Objective function to be minimized at optunity.minimize. Defaults to “rmse”.
- kf (list of str) – a list of kernel functions to be used by the search. Defaults to None, this set all available functions.
- eval (int) – Number of steps (evaluations) to optunity algorithm.
Each set of hyperparameters will perform a cross-validation method chosen by param cv.
- Available cv methods:
- “ts” mltools.time_series_cross_validation()
Perform a time-series cross-validation suggested by Hydman.
- “kfold” mltools.kfold_cross_validation()
Perform a k-fold cross-validation.
- Available of function:
- “accuracy”, “rmse”, “mape”, “me”.
- test(testing_matrix, predicting=False)[source]¶
Calculate test predicted values based on previous training.
Parameters: - testing_matrix (numpy.ndarray) – a matrix containing all patterns that will be used for testing.
- predicting (bool) – Don’t set.
Returns: Error: testing error object containing expected, predicted targets and all error metrics.
Note
Testing matrix must have target variables as the first column.
- train(training_matrix, params=[])[source]¶
Calculate output_weight values needed to test/predict data.
If params is provided, this method will use at training phase. Else, it will use the default value provided at object initialization.
Parameters: - training_matrix (numpy.ndarray) – a matrix containing all patterns that will be used for training.
- params (list) – a list of parameters defined at ELMKernel.__init__()
Returns: Error: training error object containing expected, predicted targets and all error metrics.
Note
Training matrix must have target variables as the first column.
elm.elmr Module¶
This file contains ELMKernel classes and all developed methods.
- class elm.elmr.ELMRandom(params=[])[source]¶
Bases: elm.mltools.MLTools
A Python implementation of ELM Random Neurons defined by Huang[1].
An ELM is a single-hidden layer feedforward network (SLFN) proposed by Huang back in 2006, in 2012 the author revised and introduced a new concept of using kernel functions to his previous work.
This implementation currently accepts both methods proposed at 2012, random neurons and kernel functions to estimate classifier/regression functions.
Let the dimensionality “d” of the problem be the sum of “t” size (number of targets per pattern) and “f” size (number of features per pattern). So, d = t + f
The data will be set as Pattern = (Target | Features).
If database has N patterns, its size follows Nxd.
Note
[1] Paper reference: Huang, 2012, “Extreme Learning Machine for Regression and Multiclass Classification”
Variables: - input_weight (numpy.ndarray) – a random matrix (Lxd-1) needed to calculate H(x).
- output_weight (numpy.ndarray) – a column vector (Nx1) calculated after training, represent :math:beta.
- bias_of_hidden_neurons (numpy.ndarray) – a random column vector (Lx1) needed to calculate H(x).
- param_function (str) – function that will be used for training.
- param_c (float) – regularization coefficient (C) used for training.
- param_l (list of float) – number of neurons that will be used for training.
- param_opt (bool) – a boolean used to calculate an optimization when number of training patterns are much larger than neurons (N >> L).
Other Parameters: - regressor_name (str) – The name of classifier/regressor.
- available_functions (list of str) – List with all available functions.
- default_param_function (str) – Default function if not set at class constructor.
- default_param_c (float) – Default parameter c value if not set at class constructor.
- default_param_l (integer) – Default number of neurons if not set at class constructor.
- default_param_opt (bool) – Default boolean optimization flag.
Note
- regressor_name: defaults to “elmr”.
- default_param_function: defaults to “sigmoid”.
- default_param_c: defaults to 2 ** -6.
- default_param_l: defaults to 500.
- default_param_opt: defaults to False.
- __init__(params=[])[source]¶
Class constructor.
Parameters: params (list) – first argument (str) is an available function, second argument (float) is the coefficient C of regularization, the third is the number of hidden neurons and the last argument is an optimization boolean. Example
>>> import elm >>> params = ["sigmoid", 1, 500, False] >>> elmr = elm.ELMRandom(params)
- predict(horizon=1)[source]¶
Predict next targets based on previous training.
Parameters: horizon (int) – number of predictions. Returns: numpy.ndarray: a column vector containing all predicted targets.
- search_param(database, dataprocess=None, path_filename=(u'', u''), save=False, cv=u'ts', of=u'rmse', f=None, eval=50)[source]¶
Search best hyperparameters for classifier/regressor based on optunity algorithms.
Parameters: - database (numpy.ndarray) – a matrix containing all patterns that will be used for training/testing at some cross-validation method.
- dataprocess (DataProcess) – an object that will pre-process database before training. Defaults to None.
- path_filename (tuple) – TODO.
- save (bool) – TODO.
- cv (str) – Cross-validation method. Defaults to “ts”.
- of (str) – Objective function to be minimized at optunity.minimize. Defaults to “rmse”.
- f (list of str) – a list of functions to be used by the search. Defaults to None, this set all available functions.
- eval (int) – Number of steps (evaluations) to optunity algorithm.
Each set of hyperparameters will perform a cross-validation method chosen by param cv.
- Available cv methods:
- “ts” mltools.time_series_cross_validation()
Perform a time-series cross-validation suggested by Hydman.
- “kfold” mltools.kfold_cross_validation()
Perform a k-fold cross-validation.
- Available of function:
- “accuracy”, “rmse”, “mape”, “me”.
- test(testing_matrix, predicting=False)[source]¶
Calculate test predicted values based on previous training.
Parameters: - testing_matrix (numpy.ndarray) – a matrix containing all patterns that will be used for testing.
- predicting (bool) – Don’t set.
Returns: Error: testing error object containing expected, predicted targets and all error metrics.
Note
Testing matrix must have target variables as the first column.
- train(training_matrix, params=[])[source]¶
Calculate output_weight values needed to test/predict data.
If params is provided, this method will use at training phase. Else, it will use the default value provided at object initialization.
Parameters: - training_matrix (numpy.ndarray) – a matrix containing all patterns that will be used for training.
- params (list) – a list of parameters defined at ELMKernel.__init__()
Returns: Error: training error object containing expected, predicted targets and all error metrics.
Note
Training matrix must have target variables as the first column.
elm.mltools Module¶
This file contains MLTools class and all developed methods.
- class elm.mltools.CVError(fold_errors)[source]¶
Bases: object
CVError is a class that saves Error objects from all folds of a cross-validation method.
Variables: - fold_errors (list of Error) – a list of all Error objects created through cross-validation process.
- all_fold_errors (dict) – a dictionary containing lists of error values of all folds.
- all_fold_mean_errors (dict) – a dictionary containing the mean of all_fold_errors lists.
- class elm.mltools.Error(expected, predicted, regressor_name=u'')[source]¶
Bases: object
Error is a class that saves expected and predicted values to calculate error metrics.
Variables: - regressor_name (str) – Deprecated.
- expected_targets (numpy.ndarray) – array of expected values.
- predicted_targets (numpy.ndarray) – array of predicted values.
- dict_errors (dict) – a dictionary containing all calculated errors and their values.
- calc_metrics()[source]¶
Calculate all error metrics.
Available error metrics are “rmse”, “mse”, “mae”, “me”, “mpe”, “mape”, “std”, “hr”, “hr+”, “hr-” and “accuracy”.
- get(error)[source]¶
Calculate and return value of an error.
Parameters: error (str) – Error to be calculated. Returns: value of desired error. Return type: float
- get_anderson()[source]¶
Anderson-Darling test for data coming from a particular distribution.
Returns: statistic value, critical values and significance values. Return type: tuple Note
Need scipy.stats module to perform Anderson-Darling test.
- get_shapiro()[source]¶
Perform the Shapiro-Wilk test for normality.
Returns: statistic value and p-value. Return type: tuple Note
Need scipy.stats module to perform Shapiro-Wilk test.
- class elm.mltools.MLTools[source]¶
Bases: object
A Python implementation of several methods needed for machine learning classification/regression.
Variables: - last_training_pattern (numpy.ndarray) – Full path to the package to test.
- has_trained (boolean) – package_name str
- cv_best_rmse (float) – package_name str
- elm.mltools.kfold_cross_validation(ml, database, params, number_folds=10, dataprocess=None)[source]¶
Performs a k-fold cross-validation.
Parameters: - ml (ELMKernel or ELMRandom) –
- database (numpy.ndarray) – uses ‘data’ matrix to perform cross-validation.
- params (list) – list of parameters from ml to train/test.
- number_folds (int) – number of folds to be created from training and testing matrices.
- dataprocess (DataProcess) – an object that will pre-process database before training. Defaults to None.
Returns: tuple of CVError from training and testing.
Return type: tuple
- elm.mltools.read(file_name)[source]¶
Read data from txt file.
Parameters: file_name (str) – path and file name. Returns: numpy.ndarray: a matrix containing all read data.
- elm.mltools.split_sets(data, training_percent=None, n_test_samples=None, perm=False)[source]¶
Split data matrix into training and test matrices.
Training matrix size will be set using the training_percent parameter, so its samples are the firsts samples found at data matrix, the rest of samples will be testing matrix.
If neither training_percent or number_test_samples are set, an error will happen, only one of the parameters can be set at a time.
Parameters: - data (numpy.ndarray) – A matrix containing nxf patterns features.
- training_percent (float) – An optional parameter used to calculate the number of patterns of training matrix.
- n_test_samples (int) – An optional parameter used to set the number of patterns of testing matrix.
- perm (bool) – A flag to choose if should permute(shuffle) database before splitting sets.
Returns: Both training and test matrices.
Return type: tuple
- elm.mltools.time_series_cross_validation(ml, database, params, number_folds=10, dataprocess=None)[source]¶
Performs a k-fold cross-validation on a Time Series as described by Rob Hyndman.
Parameters: - ml (ELMKernel or ELMRandom) –
- database (numpy.ndarray) – uses ‘data’ matrix to perform cross-validation.
- params (list) – list of parameters from ml to train/test.
- number_folds (int) – number of folds to be created from training and testing matrices.
- dataprocess (DataProcess) – an object that will pre-process database before training. Defaults to None.
Returns: tuple of CVError from training and testing.
Return type: tuple