elm package¶

`elm.elmk` Module¶

This file contains ELMKernel classes and all developed methods.

class elm.elmk.ELMKernel(params=[])[source]¶

Bases: elm.mltools.MLTools

A Python implementation of ELM Kernel defined by Huang[1].

An ELM is a single-hidden layer feedforward network (SLFN) proposed by Huang back in 2006, in 2012 the author revised and introduced a new concept of using kernel functions to his previous work.

This implementation currently accepts both methods proposed at 2012, random neurons and kernel functions to estimate classifier/regression functions.

Let the dimensionality “d” of the problem be the sum of “t” size (number of targets per pattern) and “f” size (number of features per pattern). So, d = t + f

The data will be set as Pattern = (Target | Features).

If database has N patterns, its size follows Nxd.

Note

[1] Paper reference: Huang, 2012, “Extreme Learning Machine for Regression and Multiclass Classification”

Variables:

output_weight (numpy.ndarray) – a column vector (Nx1) calculated after training, represent :math:beta.
training_patterns (numpy.ndarray) –
a matrix (Nxd) containing all patterns used for training.

Need to save all training patterns to perform kernel calculation at testing and prediction phase.
param_kernel_function (str) – kernel function that will be used for training.
param_c (float) – regularization coefficient (C) used for training.
param_kernel_params (list of float) – kernel function parameters that will be used for training.

Other Parameters:

regressor_name (str) – The name of classifier/regressor.
available_kernel_functions (list of str) – List with all available kernel functions.
default_param_kernel_function (str) – Default kernel function if not set at class constructor.
default_param_c (float) – Default parameter c value if not set at class constructor.
default_param_kernel_params (list of float) – Default kernel function parameters if not set at class constructor.

Note

regressor_name: defaults to “elmk”.
default_param_kernel_function: defaults to “rbf”.
default_param_c: defaults to 9.
default_param_kernel_params: defaults to [-15].

__init__(params=[])[source]¶

Class constructor.

Parameters:	params (list) – first argument (str) is an available kernel function, second argument (float) is the coefficient C of regularization and the third and last argument is a list of arguments for the kernel function.

Example

>>> import elm
>>> params = ["linear", 5, []]
>>> elmk = elm.ELMKernel(params)

get_available_kernel_functions()[source]¶: Return available kernel functions.

predict(horizon=1)[source]¶

Predict next targets based on previous training.

Parameters:	horizon (int) – number of predictions.
Returns:	numpy.ndarray: a column vector containing all predicted targets.

print_parameters()[source]¶: Print parameters values.

search_param(database, dataprocess=None, path_filename=(u'', u''), save=False, cv=u'ts', of=u'rmse', kf=None, eval=50)[source]¶

Search best hyperparameters for classifier/regressor based on optunity algorithms.

Parameters:

database (numpy.ndarray) – a matrix containing all patterns that will be used for training/testing at some cross-validation method.
dataprocess (DataProcess) – an object that will pre-process database before training. Defaults to None.
path_filename (tuple) – TODO.
save (bool) – TODO.
cv (str) – Cross-validation method. Defaults to “ts”.
of (str) – Objective function to be minimized at optunity.minimize. Defaults to “rmse”.
kf (list of str) – a list of kernel functions to be used by the search. Defaults to None, this set all available functions.
eval (int) – Number of steps (evaluations) to optunity algorithm.

Each set of hyperparameters will perform a cross-validation method chosen by param cv.

Available cv methods:

“ts” mltools.time_series_cross_validation()

Perform a time-series cross-validation suggested by Hydman.
“kfold” mltools.kfold_cross_validation()

Perform a k-fold cross-validation.

Available of function:

“accuracy”, “rmse”, “mape”, “me”.

`elm.elmr` Module¶

This file contains ELMKernel classes and all developed methods.

class elm.elmr.ELMRandom(params=[])[source]¶

Bases: elm.mltools.MLTools

A Python implementation of ELM Random Neurons defined by Huang[1].

An ELM is a single-hidden layer feedforward network (SLFN) proposed by Huang back in 2006, in 2012 the author revised and introduced a new concept of using kernel functions to his previous work.

This implementation currently accepts both methods proposed at 2012, random neurons and kernel functions to estimate classifier/regression functions.

Let the dimensionality “d” of the problem be the sum of “t” size (number of targets per pattern) and “f” size (number of features per pattern). So, d = t + f

The data will be set as Pattern = (Target | Features).

If database has N patterns, its size follows Nxd.

Note

[1] Paper reference: Huang, 2012, “Extreme Learning Machine for Regression and Multiclass Classification”

Variables:

input_weight (numpy.ndarray) – a random matrix (Lxd-1) needed to calculate H(x).
output_weight (numpy.ndarray) – a column vector (Nx1) calculated after training, represent :math:beta.
bias_of_hidden_neurons (numpy.ndarray) – a random column vector (Lx1) needed to calculate H(x).
param_function (str) – function that will be used for training.
param_c (float) – regularization coefficient (C) used for training.
param_l (list of float) – number of neurons that will be used for training.
param_opt (bool) – a boolean used to calculate an optimization when number of training patterns are much larger than neurons (N >> L).

Other Parameters:

regressor_name (str) – The name of classifier/regressor.
available_functions (list of str) – List with all available functions.
default_param_function (str) – Default function if not set at class constructor.
default_param_c (float) – Default parameter c value if not set at class constructor.
default_param_l (integer) – Default number of neurons if not set at class constructor.
default_param_opt (bool) – Default boolean optimization flag.

Note

regressor_name: defaults to “elmr”.
default_param_function: defaults to “sigmoid”.
default_param_c: defaults to 2 ** -6.
default_param_l: defaults to 500.
default_param_opt: defaults to False.

__init__(params=[])[source]¶

Class constructor.

Parameters:	params (list) – first argument (str) is an available function, second argument (float) is the coefficient C of regularization, the third is the number of hidden neurons and the last argument is an optimization boolean.

Example

>>> import elm
>>> params = ["sigmoid", 1, 500, False]
>>> elmr = elm.ELMRandom(params)

get_available_functions()[source]¶: Return available functions.

predict(horizon=1)[source]¶

Predict next targets based on previous training.

Parameters:	horizon (int) – number of predictions.
Returns:	numpy.ndarray: a column vector containing all predicted targets.

print_parameters()[source]¶: Print current parameters.

search_param(database, dataprocess=None, path_filename=(u'', u''), save=False, cv=u'ts', of=u'rmse', f=None, eval=50)[source]¶

Search best hyperparameters for classifier/regressor based on optunity algorithms.

Parameters:

database (numpy.ndarray) – a matrix containing all patterns that will be used for training/testing at some cross-validation method.
dataprocess (DataProcess) – an object that will pre-process database before training. Defaults to None.
path_filename (tuple) – TODO.
save (bool) – TODO.
cv (str) – Cross-validation method. Defaults to “ts”.
of (str) – Objective function to be minimized at optunity.minimize. Defaults to “rmse”.
f (list of str) – a list of functions to be used by the search. Defaults to None, this set all available functions.
eval (int) – Number of steps (evaluations) to optunity algorithm.

Each set of hyperparameters will perform a cross-validation method chosen by param cv.

Available cv methods:

“ts” mltools.time_series_cross_validation()

Perform a time-series cross-validation suggested by Hydman.
“kfold” mltools.kfold_cross_validation()

Perform a k-fold cross-validation.

Available of function:

“accuracy”, “rmse”, “mape”, “me”.

`elm.mltools` Module¶

This file contains MLTools class and all developed methods.

class elm.mltools.CVError(fold_errors)[source]¶

Bases: object

CVError is a class that saves Error objects from all folds of a cross-validation method.

Variables:	fold_errors (list of `Error`) – a list of all Error objects created through cross-validation process. all_fold_errors (dict) – a dictionary containing lists of error values of all folds. all_fold_mean_errors (dict) – a dictionary containing the mean of all_fold_errors lists.

calc_metrics()[source]¶

Calculate a folds mean of all error metrics.

Available error metrics are “rmse”, “mse”, “mae”, “me”, “mpe”, “mape”, “std”, “hr”, “hr+”, “hr-” and “accuracy”.

print_errors()[source]¶: Print a mean of all error through all folds.

class elm.mltools.Error(expected, predicted, regressor_name=u'')[source]¶

Bases: object

Error is a class that saves expected and predicted values to calculate error metrics.

Variables:	regressor_name (str) – Deprecated. expected_targets (numpy.ndarray) – array of expected values. predicted_targets (numpy.ndarray) – array of predicted values. dict_errors (dict) – a dictionary containing all calculated errors and their values.

calc_metrics()[source]¶

Calculate all error metrics.

Available error metrics are “rmse”, “mse”, “mae”, “me”, “mpe”, “mape”, “std”, “hr”, “hr+”, “hr-” and “accuracy”.

get(error)[source]¶

Calculate and return value of an error.

Parameters:	error (str) – Error to be calculated.
Returns:	value of desired error.
Return type:	float

get_anderson()[source]¶

Anderson-Darling test for data coming from a particular distribution.

Returns:	statistic value, critical values and significance values.
Return type:	tuple

Note

Need scipy.stats module to perform Anderson-Darling test.

get_shapiro()[source]¶

Perform the Shapiro-Wilk test for normality.

Returns:	statistic value and p-value.
Return type:	tuple

Note

Need scipy.stats module to perform Shapiro-Wilk test.

print_errors()[source]¶: Print all errors metrics.

Note

For better printing format, install prettytable.

print_values()[source]¶: Print expected and predicted values.

class elm.mltools.MLTools[source]¶

Bases: object

A Python implementation of several methods needed for machine learning classification/regression.

Variables:	last_training_pattern (numpy.ndarray) – Full path to the package to test. has_trained (boolean) – package_name str cv_best_rmse (float) – package_name str

load_regressor(file_name)[source]¶: Load classifier/regressor to memory.

save_regressor(file_name)[source]¶: Save current classifier/regressor to file_name file.

elm.mltools.kfold_cross_validation(ml, database, params, number_folds=10, dataprocess=None)[source]¶

Performs a k-fold cross-validation.

Parameters:	ml (`ELMKernel` or `ELMRandom`) – database (numpy.ndarray) – uses ‘data’ matrix to perform cross-validation. params (list) – list of parameters from ml to train/test. number_folds (int) – number of folds to be created from training and testing matrices. dataprocess (`DataProcess`) – an object that will pre-process database before training. Defaults to None.
Returns:	tuple of `CVError` from training and testing.
Return type:	tuple

elm.mltools.read(file_name)[source]¶

Read data from txt file.

Parameters:	file_name (str) – path and file name.
Returns:	numpy.ndarray: a matrix containing all read data.

elm.mltools.split_sets(data, training_percent=None, n_test_samples=None, perm=False)[source]¶

Split data matrix into training and test matrices.

Training matrix size will be set using the training_percent parameter, so its samples are the firsts samples found at data matrix, the rest of samples will be testing matrix.

If neither training_percent or number_test_samples are set, an error will happen, only one of the parameters can be set at a time.

Parameters:	data (numpy.ndarray) – A matrix containing nxf patterns features. training_percent (float) – An optional parameter used to calculate the number of patterns of training matrix. n_test_samples (int) – An optional parameter used to set the number of patterns of testing matrix. perm (bool) – A flag to choose if should permute(shuffle) database before splitting sets.
Returns:	Both training and test matrices.
Return type:	tuple

elm.mltools.time_series_cross_validation(ml, database, params, number_folds=10, dataprocess=None)[source]¶

Performs a k-fold cross-validation on a Time Series as described by Rob Hyndman.

elm package¶

elm.elmk Module¶

elm.elmr Module¶

elm.mltools Module¶

`elm.elmk` Module¶

`elm.elmr` Module¶

`elm.mltools` Module¶