elm package

elm.elmk Module

This file contains ELMKernel classes and all developed methods.

class elm.elmk.ELMKernel(params=[])[source]

Bases: elm.mltools.MLTools

A Python implementation of ELM Kernel defined by Huang[1].

An ELM is a single-hidden layer feedforward network (SLFN) proposed by Huang back in 2006, in 2012 the author revised and introduced a new concept of using kernel functions to his previous work.

This implementation currently accepts both methods proposed at 2012, random neurons and kernel functions to estimate classifier/regression functions.

Let the dimensionality “d” of the problem be the sum of “t” size (number of targets per pattern) and “f” size (number of features per pattern). So, d = t + f

The data will be set as Pattern = (Target | Features).

If database has N patterns, its size follows Nxd.

Note

[1] Paper reference: Huang, 2012, “Extreme Learning Machine for Regression and Multiclass Classification”

Variables:
  • output_weight (numpy.ndarray) – a column vector (Nx1) calculated after training, represent :math:beta.
  • training_patterns (numpy.ndarray) –

    a matrix (Nxd) containing all patterns used for training.

    Need to save all training patterns to perform kernel calculation at testing and prediction phase.

  • param_kernel_function (str) – kernel function that will be used for training.
  • param_c (float) – regularization coefficient (C) used for training.
  • param_kernel_params (list of float) – kernel function parameters that will be used for training.
Other Parameters:
 
  • regressor_name (str) – The name of classifier/regressor.
  • available_kernel_functions (list of str) – List with all available kernel functions.
  • default_param_kernel_function (str) – Default kernel function if not set at class constructor.
  • default_param_c (float) – Default parameter c value if not set at class constructor.
  • default_param_kernel_params (list of float) – Default kernel function parameters if not set at class constructor.

Note

  • regressor_name: defaults to “elmk”.
  • default_param_kernel_function: defaults to “rbf”.
  • default_param_c: defaults to 9.
  • default_param_kernel_params: defaults to [-15].
__init__(params=[])[source]

Class constructor.

Parameters:params (list) – first argument (str) is an available kernel function, second argument (float) is the coefficient C of regularization and the third and last argument is a list of arguments for the kernel function.

Example

>>> import elm
>>> params = ["linear", 5, []]
>>> elmk = elm.ELMKernel(params)
get_available_kernel_functions()[source]

Return available kernel functions.

predict(horizon=1)[source]

Predict next targets based on previous training.

Parameters:horizon (int) – number of predictions.
Returns:numpy.ndarray: a column vector containing all predicted targets.
print_parameters()[source]

Print parameters values.

search_param(database, dataprocess=None, path_filename=(u'', u''), save=False, cv=u'ts', of=u'rmse', kf=None, eval=50)[source]

Search best hyperparameters for classifier/regressor based on optunity algorithms.

Parameters:
  • database (numpy.ndarray) – a matrix containing all patterns that will be used for training/testing at some cross-validation method.
  • dataprocess (DataProcess) – an object that will pre-process database before training. Defaults to None.
  • path_filename (tuple) – TODO.
  • save (bool) – TODO.
  • cv (str) – Cross-validation method. Defaults to “ts”.
  • of (str) – Objective function to be minimized at optunity.minimize. Defaults to “rmse”.
  • kf (list of str) – a list of kernel functions to be used by the search. Defaults to None, this set all available functions.
  • eval (int) – Number of steps (evaluations) to optunity algorithm.

Each set of hyperparameters will perform a cross-validation method chosen by param cv.

Available cv methods:
  • “ts” mltools.time_series_cross_validation()

    Perform a time-series cross-validation suggested by Hydman.

  • “kfold” mltools.kfold_cross_validation()

    Perform a k-fold cross-validation.

Available of function:
  • “accuracy”, “rmse”, “mape”, “me”.
test(testing_matrix, predicting=False)[source]

Calculate test predicted values based on previous training.

Parameters:
  • testing_matrix (numpy.ndarray) – a matrix containing all patterns that will be used for testing.
  • predicting (bool) – Don’t set.
Returns:

Error: testing error object containing expected, predicted targets and all error metrics.

Note

Testing matrix must have target variables as the first column.

train(training_matrix, params=[])[source]

Calculate output_weight values needed to test/predict data.

If params is provided, this method will use at training phase. Else, it will use the default value provided at object initialization.

Parameters:
  • training_matrix (numpy.ndarray) – a matrix containing all patterns that will be used for training.
  • params (list) – a list of parameters defined at ELMKernel.__init__()
Returns:

Error: training error object containing expected, predicted targets and all error metrics.

Note

Training matrix must have target variables as the first column.

train_iterative(database_matrix, params=, []sliding_window=168, k=1)[source]

Training method used by Fred 09 paper.

elm.elmr Module

This file contains ELMKernel classes and all developed methods.

class elm.elmr.ELMRandom(params=[])[source]

Bases: elm.mltools.MLTools

A Python implementation of ELM Random Neurons defined by Huang[1].

An ELM is a single-hidden layer feedforward network (SLFN) proposed by Huang back in 2006, in 2012 the author revised and introduced a new concept of using kernel functions to his previous work.

This implementation currently accepts both methods proposed at 2012, random neurons and kernel functions to estimate classifier/regression functions.

Let the dimensionality “d” of the problem be the sum of “t” size (number of targets per pattern) and “f” size (number of features per pattern). So, d = t + f

The data will be set as Pattern = (Target | Features).

If database has N patterns, its size follows Nxd.

Note

[1] Paper reference: Huang, 2012, “Extreme Learning Machine for Regression and Multiclass Classification”

Variables:
  • input_weight (numpy.ndarray) – a random matrix (Lxd-1) needed to calculate H(x).
  • output_weight (numpy.ndarray) – a column vector (Nx1) calculated after training, represent :math:beta.
  • bias_of_hidden_neurons (numpy.ndarray) – a random column vector (Lx1) needed to calculate H(x).
  • param_function (str) – function that will be used for training.
  • param_c (float) – regularization coefficient (C) used for training.
  • param_l (list of float) – number of neurons that will be used for training.
  • param_opt (bool) – a boolean used to calculate an optimization when number of training patterns are much larger than neurons (N >> L).
Other Parameters:
 
  • regressor_name (str) – The name of classifier/regressor.
  • available_functions (list of str) – List with all available functions.
  • default_param_function (str) – Default function if not set at class constructor.
  • default_param_c (float) – Default parameter c value if not set at class constructor.
  • default_param_l (integer) – Default number of neurons if not set at class constructor.
  • default_param_opt (bool) – Default boolean optimization flag.

Note

  • regressor_name: defaults to “elmr”.
  • default_param_function: defaults to “sigmoid”.
  • default_param_c: defaults to 2 ** -6.
  • default_param_l: defaults to 500.
  • default_param_opt: defaults to False.
__init__(params=[])[source]

Class constructor.

Parameters:params (list) – first argument (str) is an available function, second argument (float) is the coefficient C of regularization, the third is the number of hidden neurons and the last argument is an optimization boolean.

Example

>>> import elm
>>> params = ["sigmoid", 1, 500, False]
>>> elmr = elm.ELMRandom(params)
get_available_functions()[source]

Return available functions.

predict(horizon=1)[source]

Predict next targets based on previous training.

Parameters:horizon (int) – number of predictions.
Returns:numpy.ndarray: a column vector containing all predicted targets.
print_parameters()[source]

Print current parameters.

search_param(database, dataprocess=None, path_filename=(u'', u''), save=False, cv=u'ts', of=u'rmse', f=None, eval=50)[source]

Search best hyperparameters for classifier/regressor based on optunity algorithms.

Parameters:
  • database (numpy.ndarray) – a matrix containing all patterns that will be used for training/testing at some cross-validation method.
  • dataprocess (DataProcess) – an object that will pre-process database before training. Defaults to None.
  • path_filename (tuple) – TODO.
  • save (bool) – TODO.
  • cv (str) – Cross-validation method. Defaults to “ts”.
  • of (str) – Objective function to be minimized at optunity.minimize. Defaults to “rmse”.
  • f (list of str) – a list of functions to be used by the search. Defaults to None, this set all available functions.
  • eval (int) – Number of steps (evaluations) to optunity algorithm.

Each set of hyperparameters will perform a cross-validation method chosen by param cv.

Available cv methods:
  • “ts” mltools.time_series_cross_validation()

    Perform a time-series cross-validation suggested by Hydman.

  • “kfold” mltools.kfold_cross_validation()

    Perform a k-fold cross-validation.

Available of function:
  • “accuracy”, “rmse”, “mape”, “me”.
test(testing_matrix, predicting=False)[source]

Calculate test predicted values based on previous training.

Parameters:
  • testing_matrix (numpy.ndarray) – a matrix containing all patterns that will be used for testing.
  • predicting (bool) – Don’t set.
Returns:

Error: testing error object containing expected, predicted targets and all error metrics.

Note

Testing matrix must have target variables as the first column.

train(training_matrix, params=[])[source]

Calculate output_weight values needed to test/predict data.

If params is provided, this method will use at training phase. Else, it will use the default value provided at object initialization.

Parameters:
  • training_matrix (numpy.ndarray) – a matrix containing all patterns that will be used for training.
  • params (list) – a list of parameters defined at ELMKernel.__init__()
Returns:

Error: training error object containing expected, predicted targets and all error metrics.

Note

Training matrix must have target variables as the first column.

train_iterative(database_matrix, params=, []sliding_window=168, k=1)[source]

Training method used by Fred 09 paper.

elm.mltools Module

This file contains MLTools class and all developed methods.

class elm.mltools.CVError(fold_errors)[source]

Bases: object

CVError is a class that saves Error objects from all folds of a cross-validation method.

Variables:
  • fold_errors (list of Error) – a list of all Error objects created through cross-validation process.
  • all_fold_errors (dict) – a dictionary containing lists of error values of all folds.
  • all_fold_mean_errors (dict) – a dictionary containing the mean of all_fold_errors lists.
calc_metrics()[source]

Calculate a folds mean of all error metrics.

Available error metrics are “rmse”, “mse”, “mae”, “me”, “mpe”, “mape”, “std”, “hr”, “hr+”, “hr-” and “accuracy”.

print_errors()[source]

Print a mean of all error through all folds.

class elm.mltools.Error(expected, predicted, regressor_name=u'')[source]

Bases: object

Error is a class that saves expected and predicted values to calculate error metrics.

Variables:
  • regressor_name (str) – Deprecated.
  • expected_targets (numpy.ndarray) – array of expected values.
  • predicted_targets (numpy.ndarray) – array of predicted values.
  • dict_errors (dict) – a dictionary containing all calculated errors and their values.
calc_metrics()[source]

Calculate all error metrics.

Available error metrics are “rmse”, “mse”, “mae”, “me”, “mpe”, “mape”, “std”, “hr”, “hr+”, “hr-” and “accuracy”.

get(error)[source]

Calculate and return value of an error.

Parameters:error (str) – Error to be calculated.
Returns:value of desired error.
Return type:float
get_anderson()[source]

Anderson-Darling test for data coming from a particular distribution.

Returns:statistic value, critical values and significance values.
Return type:tuple

Note

Need scipy.stats module to perform Anderson-Darling test.

get_shapiro()[source]

Perform the Shapiro-Wilk test for normality.

Returns:statistic value and p-value.
Return type:tuple

Note

Need scipy.stats module to perform Shapiro-Wilk test.

print_errors()[source]

Print all errors metrics.

Note

For better printing format, install prettytable.

print_values()[source]

Print expected and predicted values.

class elm.mltools.MLTools[source]

Bases: object

A Python implementation of several methods needed for machine learning classification/regression.

Variables:
  • last_training_pattern (numpy.ndarray) – Full path to the package to test.
  • has_trained (boolean) – package_name str
  • cv_best_rmse (float) – package_name str
load_regressor(file_name)[source]

Load classifier/regressor to memory.

save_regressor(file_name)[source]

Save current classifier/regressor to file_name file.

elm.mltools.kfold_cross_validation(ml, database, params, number_folds=10, dataprocess=None)[source]

Performs a k-fold cross-validation.

Parameters:
  • ml (ELMKernel or ELMRandom) –
  • database (numpy.ndarray) – uses ‘data’ matrix to perform cross-validation.
  • params (list) – list of parameters from ml to train/test.
  • number_folds (int) – number of folds to be created from training and testing matrices.
  • dataprocess (DataProcess) – an object that will pre-process database before training. Defaults to None.
Returns:

tuple of CVError from training and testing.

Return type:

tuple

elm.mltools.read(file_name)[source]

Read data from txt file.

Parameters:file_name (str) – path and file name.
Returns:numpy.ndarray: a matrix containing all read data.
elm.mltools.split_sets(data, training_percent=None, n_test_samples=None, perm=False)[source]

Split data matrix into training and test matrices.

Training matrix size will be set using the training_percent parameter, so its samples are the firsts samples found at data matrix, the rest of samples will be testing matrix.

If neither training_percent or number_test_samples are set, an error will happen, only one of the parameters can be set at a time.

Parameters:
  • data (numpy.ndarray) – A matrix containing nxf patterns features.
  • training_percent (float) – An optional parameter used to calculate the number of patterns of training matrix.
  • n_test_samples (int) – An optional parameter used to set the number of patterns of testing matrix.
  • perm (bool) – A flag to choose if should permute(shuffle) database before splitting sets.
Returns:

Both training and test matrices.

Return type:

tuple

elm.mltools.time_series_cross_validation(ml, database, params, number_folds=10, dataprocess=None)[source]

Performs a k-fold cross-validation on a Time Series as described by Rob Hyndman.

Parameters:
  • ml (ELMKernel or ELMRandom) –
  • database (numpy.ndarray) – uses ‘data’ matrix to perform cross-validation.
  • params (list) – list of parameters from ml to train/test.
  • number_folds (int) – number of folds to be created from training and testing matrices.
  • dataprocess (DataProcess) – an object that will pre-process database before training. Defaults to None.
Returns:

tuple of CVError from training and testing.

Return type:

tuple

elm.mltools.write(file_name, data)[source]

Write data to txt file.

Parameters:
  • file_name (str) – path and file name.
  • data (numpy.ndarray) – data to be written.