Module: learning_curves¶

do_learning_curve(model, features_set, sub_sample_sizes_ratios_array, scoring=None, cv_n_folds=3, sampling_array=None)[source]¶

Produces a learning curve using a cross-fold validation, and train/test scoring. If train_sizes_ratios_array=np.array([0.1,0.5,0.8]) a sub features sets are extracted according to sub_sizes_cv=train_sizes_ratios_array*features_set.features_N_rows and cv is performed on each sub features_set with sizes sub_sizes_cv A train/test score is also extracted for a train/test extracted with train_sizes_ratios_array

Args:

model:

features_set:

sub_sample_sizes_ratios_array: array-like, shape (n_ticks,), dtype float or int.

Relative (for float) or absolute (for int) numbers of training examples that will be used to generate the learning curve

scoring: A string (see model evaluation documentation in skleran) or a scorer callable object / function with signature scorer(estimator, X, y).

cv_n_folds: int, default 3

sampling_array: array-like, optional, a stratified KFold is performed stratifying on sampling_array

Returns:

sub_sizes: array, shape = (n_unique_ticks,), dtype int Numbers of training examples that has been used to generate the learning curve.

Note that the number of ticks might be less than n_ticks because duplicate entries will be removed.

cv_train_scores: array, shape (n_ticks, n_cv_folds) Scores on training sets

cv_test_scores: array, shape (n_ticks, n_cv_folds) Scores on training sets

train_score:

test_score:

do_validation_curve(model, features_set, param_name, param_values_array, scoring, cv_n_folds=3, sampling_array=None, train_test_split_ratio=0.5)[source]¶

Parameters:

model

features_set

param_name

param_values_array

scoring

cv_n_folds

sampling_array

train_test_split_ratio

plot_learning_curve(cv_train_scores, cv_test_scores, cv_train_sizes, full_size, train_sizes=None, train_score=None, test_score=None, plot=False, scorer_name='score')[source]¶

Plots the learning_curves

Args:

cv_train_scores: cross validation train scores

cv_test_scores: cross validation test scores

sub_sizes: array-like, shape (n_ticks,) Numbers of training examples that has been used to generate the learning curve.

full_size: Num of rows in the full features set, i.e. number of the full featues set training examples

train_score: optional, the array of train set scores

test_score: optional, the array of test set scores

plot: boolean, optional

Returns:

plot_validation_curve(cv_train_scores, cv_test_scores, score_test_array, param_values_array, par_name, scorer_name)[source]¶