Module: learning_curves¶
-
do_learning_curve
(model, features_set, sub_sample_sizes_ratios_array, scoring=None, cv_n_folds=3, sampling_array=None)[source]¶ Produces a learning curve using a cross-fold validation, and train/test scoring. If train_sizes_ratios_array=np.array([0.1,0.5,0.8]) a sub features sets are extracted according to sub_sizes_cv=train_sizes_ratios_array*features_set.features_N_rows and cv is performed on each sub features_set with sizes sub_sizes_cv A train/test score is also extracted for a train/test extracted with train_sizes_ratios_array
- Args:
model:
features_set:
sub_sample_sizes_ratios_array: array-like, shape (n_ticks,), dtype float or int.
Relative (for float) or absolute (for int) numbers of training examples that will be used to generate the learning curve- scoring : string, callable or None, optional, default: None
- A string (see model evaluation documentation in skleran) or
a scorer callable object / function with signature
scorer(estimator, X, y)
.
cv_n_folds: int, default 3
sampling_array: array-like, optional, a stratified KFold is performed stratifying on sampling_array
- Returns:
sub_sizes: array, shape = (n_unique_ticks,), dtype int Numbers of training examples that has been used to generate the learning curve.
Note that the number of ticks might be less than n_ticks because duplicate entries will be removed.
cv_train_scores: array, shape (n_ticks, n_cv_folds) Scores on training sets
cv_test_scores: array, shape (n_ticks, n_cv_folds) Scores on training sets
train_score:
test_score:
-
do_validation_curve
(model, features_set, param_name, param_values_array, scoring, cv_n_folds=3, sampling_array=None, train_test_split_ratio=0.5)[source]¶ Parameters: model
features_set
param_name
param_values_array
scoring
cv_n_folds
sampling_array
train_test_split_ratio
-
plot_learning_curve
(cv_train_scores, cv_test_scores, cv_train_sizes, full_size, train_sizes=None, train_score=None, test_score=None, plot=False, scorer_name='score')[source]¶ Plots the learning_curves
- Args:
cv_train_scores: cross validation train scores
cv_test_scores: cross validation test scores
sub_sizes: array-like, shape (n_ticks,) Numbers of training examples that has been used to generate the learning curve.
full_size: Num of rows in the full features set, i.e. number of the full featues set training examples
train_score: optional, the array of train set scores
test_score: optional, the array of test set scores
plot: boolean, optional
Returns: