BioBB ML Command Line Help
Generic usage:
biobb_command [-h] --config CONFIG --input_file(s) <input_file(s)> --output_file <output_file>
Wrapper of the scikit-learn DecisionTreeClassifier method.
Get help
decision_tree -h
usage: decision_tree [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_model_path OUTPUT_MODEL_PATH [--output_test_table_path OUTPUT_TEST_TABLE_PATH] [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn DecisionTreeClassifier method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_test_table_path OUTPUT_TEST_TABLE_PATH
Path to the test table file. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: pkl.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: PKL
output_test_table_path (string): Path to the test table file. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
independent_vars (object): ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
weight (object): ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
criterion (string): (gini) The function to measure the quality of a split. .
max_depth (integer): (4) The maximum depth of the model. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples..
normalize_cm (boolean): (False) Whether or not to normalize the confusion matrix..
random_state_method (integer): (5) Controls the randomness of the estimator..
random_state_train_test (integer): (5) Controls the shuffling applied to the data before applying the split..
test_size (number): (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
criterion: entropy
- interest_rate
- credit
- march
- previous
- duration
max_depth: 4
normalize_cm: false
scale: true
column: y
test_size: 0.2
Command line
decision_tree --config config_decision_tree.yml --input_dataset_path dataset_decision_tree.csv --output_model_path ref_output_model_decision_tree.pkl --output_test_table_path ref_output_test_decision_tree.csv --output_plot_path ref_output_plot_decision_tree.png
Common config file
"properties": {
"independent_vars": {
"columns": [
"target": {
"column": "y"
"criterion": "entropy",
"max_depth": 4,
"normalize_cm": false,
"test_size": 0.2,
"scale": true
Command line
decision_tree --config config_decision_tree.json --input_dataset_path dataset_decision_tree.csv --output_model_path ref_output_model_decision_tree.pkl --output_test_table_path ref_output_test_decision_tree.csv --output_plot_path ref_output_plot_decision_tree.png
Wrapper of the scikit-learn AgglomerativeClustering method.
Get help
agglomerative_clustering -h
usage: agglomerative_clustering [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_results_path OUTPUT_RESULTS_PATH [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn AgglomerativeClustering method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_plot_path OUTPUT_PLOT_PATH
Path to the clustering plot. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_results_path OUTPUT_RESULTS_PATH
Path to the clustered dataset. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Path to the clustered dataset. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the clustering plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
predictors (object): ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of multiple formats, the first one will be picked..
clusters (integer): (3) The number of clusters to form as well as the number of centroids to generate..
affinity (string): (euclidean) Metric used to compute the linkage. If linkage is “ward”, only “euclidean” is accepted. .
linkage (string): (ward) The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion. .
plots (array): (None) List of dictionaries with all plots you want to generate. Only 2D or 3D plots accepted. Format: [ { ‘title’: ‘Plot 1’, ‘features’: [’feat1’, ‘feat2’] } ]..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
clusters: 3
linkage: average
- features:
- sepal_length
- sepal_width
title: Plot 1
- features:
- petal_length
- petal_width
title: Plot 2
- features:
- sepal_length
- sepal_width
- petal_length
title: Plot 3
- features:
- petal_length
- petal_width
- sepal_width
title: Plot 4
- features:
- sepal_length
- petal_width
title: Plot 5
- sepal_length
- sepal_width
- petal_length
- petal_width
scale: true
Command line
agglomerative_clustering --config config_agglomerative_clustering.yml --input_dataset_path dataset_agglomerative_clustering.csv --output_results_path ref_output_results_agglomerative_clustering.csv --output_plot_path ref_output_plot_agglomerative_clustering.png
Common config file
"properties": {
"predictors": {
"columns": [
"clusters": 3,
"linkage": "average",
"plots": [
"title": "Plot 1",
"features": [
"title": "Plot 2",
"features": [
"title": "Plot 3",
"features": [
"title": "Plot 4",
"features": [
"title": "Plot 5",
"features": [
"scale": true
Command line
agglomerative_clustering --config config_agglomerative_clustering.json --input_dataset_path dataset_agglomerative_clustering.csv --output_results_path ref_output_results_agglomerative_clustering.csv --output_plot_path ref_output_plot_agglomerative_clustering.png
Wrapper of the scikit-learn LinearRegression method.
Get help
linear_regression -h
usage: linear_regression [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_model_path OUTPUT_MODEL_PATH [--output_test_table_path OUTPUT_TEST_TABLE_PATH] [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn LinearRegression method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_test_table_path OUTPUT_TEST_TABLE_PATH
Path to the test table file. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Residual plot checks the error between actual values and predicted values. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: pkl.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: PKL
output_test_table_path (string): Path to the test table file. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Residual plot checks the error between actual values and predicted values. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
independent_vars (object): ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
weight (object): ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
random_state_train_test (integer): (5) Controls the shuffling applied to the data before applying the split..
test_size (number): (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- size
- year
- view
scale: true
column: price
test_size: 0.2
Command line
linear_regression --config config_linear_regression.yml --input_dataset_path dataset_linear_regression.csv --output_model_path ref_output_model_linear_regression.pkl --output_test_table_path ref_output_test_linear_regression.csv --output_plot_path ref_output_plot_linear_regression.png
Common config file
"properties": {
"independent_vars": {
"columns": [
"target": {
"column": "price"
"test_size": 0.2,
"scale": true
Command line
linear_regression --config config_linear_regression.json --input_dataset_path dataset_linear_regression.csv --output_model_path ref_output_model_linear_regression.pkl --output_test_table_path ref_output_test_linear_regression.csv --output_plot_path ref_output_plot_linear_regression.png
Wrapper of the scikit-learn RandomForestClassifier method.
Get help
random_forest_classifier -h
usage: random_forest_classifier [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_model_path OUTPUT_MODEL_PATH [--output_test_table_path OUTPUT_TEST_TABLE_PATH] [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn RandomForestClassifier method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_test_table_path OUTPUT_TEST_TABLE_PATH
Path to the test table file. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: pkl.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: PKL
output_test_table_path (string): Path to the test table file. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
independent_vars (object): ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
weight (object): ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
n_estimators (integer): (100) The number of trees in the forest..
bootstrap (boolean): (True) Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree..
normalize_cm (boolean): (False) Whether or not to normalize the confusion matrix..
random_state_method (integer): (5) Controls the randomness of the estimator..
random_state_train_test (integer): (5) Controls the shuffling applied to the data before applying the split..
test_size (number): (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
bootstrap: true
- 0
- 1
- 2
- 3
- 4
n_estimators: 100
normalize_cm: false
scale: true
index: 5
test_size: 0.2
Command line
random_forest_classifier --config config_random_forest_classifier.yml --input_dataset_path dataset_random_forest_classifier.csv --output_model_path ref_output_model_random_forest_classifier.pkl --output_test_table_path ref_output_test_random_forest_classifier.csv --output_plot_path ref_output_plot_random_forest_classifier.png
Common config file
"properties": {
"independent_vars": {
"indexes": [
"target": {
"index": 5
"n_estimators": 100,
"bootstrap": true,
"normalize_cm": false,
"test_size": 0.2,
"scale": true
Command line
random_forest_classifier --config config_random_forest_classifier.json --input_dataset_path dataset_random_forest_classifier.csv --output_model_path ref_output_model_random_forest_classifier.pkl --output_test_table_path ref_output_test_random_forest_classifier.csv --output_plot_path ref_output_plot_random_forest_classifier.png
Wrapper of the imblearn.combine methods.
Get help
resampling -h
usage: resampling [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_dataset_path OUTPUT_DATASET_PATH
Wrapper of the imblearn.combine methods.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_dataset_path OUTPUT_DATASET_PATH
Path to the output dataset. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_dataset_path (string): Path to the output dataset. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
method (string): (None) Resampling method. It’s a mandatory property. .
type (string): (None) Type of oversampling. It’s a mandatory property. .
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
evaluate (boolean): (False) Whether or not to evaluate the dataset before and after applying the resampling..
evaluate_splits (integer): (3) Number of folds to be applied by the Repeated Stratified K-Fold evaluation method. Must be at least 2..
evaluate_repeats (integer): (3) Number of times Repeated Stratified K-Fold cross validator needs to be repeated..
n_bins (integer): (5) Only for regression resampling. The number of classes that the user wants to generate with the target data..
balanced_binning (boolean): (False) Only for regression resampling. Decides whether samples are to be distributed roughly equally across all classes..
sampling_strategy_over (object): ({’target’: ‘auto’}) Sampling information applied in the dataset oversampling process. Formats: { “target”: “auto” }, { “ratio”: 0.3 } or { “dict”: { 0: 300, 1: 200, 2: 100 } }. When “target”, specify the class targeted by the resampling; the number of samples in the different classes will be equalized; possible choices are: minority (resample only the minority class), not minority (resample all classes but the minority class), not majority (resample all classes but the majority class), all (resample all classes), auto (equivalent to ‘not majority’). When “ratio”, it corresponds to the desired ratio of the number of samples in the minority class over the number of samples in the majority class after resampling (ONLY IN CASE OF BINARY CLASSIFICATION). When “dict”, the keys correspond to the targeted classes and the values correspond to the desired number of samples for each targeted class..
sampling_strategy_under (object): ({’target’: ‘auto’}) Sampling information applied in the dataset cleaning process. Formats: { “target”: “auto” } or { “list”: [0, 2, 3] }. When “target”, specify the class targeted by the resampling; the number of samples in the different classes will be equalized; possible choices are: majority (resample only the majority class), not minority (resample all classes but the minority class), not majority (resample all classes but the majority class), all (resample all classes), auto (equivalent to ‘not minority’). When “list”, the list contains the classes targeted by the resampling..
random_state_method (integer): (5) Controls the randomization of the algorithm..
random_state_evaluate (integer): (5) Controls the shuffling applied to the Repeated Stratified K-Fold evaluation method..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
evaluate: true
method: smotenn
n_bins: 10
4: 1000
5: 1000
6: 1000
7: 1000
- 0
- 1
column: VALUE
type: regression
Command line
resampling --config config_resampling.yml --input_dataset_path dataset_resampling.csv --output_dataset_path ref_output_resampling.csv
Common config file
"properties": {
"method": "smotenn",
"type": "regression",
"target": {
"column": "VALUE"
"evaluate": true,
"n_bins": 10,
"sampling_strategy_over": {
"dict": {
"4": 1000,
"5": 1000,
"6": 1000,
"7": 1000
"sampling_strategy_under": {
"list": [
Command line
resampling --config config_resampling.json --input_dataset_path dataset_resampling.csv --output_dataset_path ref_output_resampling.csv
Makes predictions from an input dataset and a given classification model.
Get help
classification_predict -h
usage: classification_predict [-h] [--config CONFIG] --input_model_path INPUT_MODEL_PATH --output_results_path OUTPUT_RESULTS_PATH [--input_dataset_path INPUT_DATASET_PATH]
Makes predictions from an input dataset and a given classification model.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--input_dataset_path INPUT_DATASET_PATH
Path to the dataset to predict. Accepted formats: csv.
required arguments:
--input_model_path INPUT_MODEL_PATH
Path to the input model. Accepted formats: pkl.
--output_results_path OUTPUT_RESULTS_PATH
Path to the output results file. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_model_path (string): Path to the input model. File type: input. Sample file. Accepted formats: PKL
input_dataset_path (string): Path to the dataset to predict. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Path to the output results file. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
predictions (array): (None) List of dictionaries with all values you want to predict targets. It will be taken into account only in case input_dataset_path is not provided. Format: [{ ‘var1’: 1.0, ‘var2’: 2.0 }, { ‘var1’: 4.0, ‘var2’: 2.7 }] for datasets with headers and [[ 1.0, 2.0 ], [ 4.0, 2.7 ]] for datasets without headers..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
remove_tmp: false
Command line
classification_predict --config config_classification_predict.yml --input_model_path model_classification_predict.pkl --input_dataset_path input_classification_predict.csv --output_results_path ref_output_classification_predict.csv
Common config file
"properties": {
"remove_tmp": false
Command line
classification_predict --config config_classification_predict.json --input_model_path model_classification_predict.pkl --input_dataset_path input_classification_predict.csv --output_results_path ref_output_classification_predict.csv
Wrapper of the scikit-learn PCA method.
Get help
principal_component -h
usage: principal_component [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_results_path OUTPUT_RESULTS_PATH [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn PCA method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_plot_path OUTPUT_PLOT_PATH
Path to the Principal Component plot, only if number of components is 2 or 3. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_results_path OUTPUT_RESULTS_PATH
Path to the analysed dataset. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Path to the analysed dataset. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the Principal Component plot, only if number of components is 2 or 3. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
features (object): ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
n_components (object): ({}) Dictionary containing the number of components to keep (int) or the minimum number of principal components such the 0 to 1 range of the variance (float) is retained. If not set ({}) all components are kept. Formats for integer values: { “value”: 2 } or for float values: { “value”: 0.3 }.
random_state_method (integer): (5) Controls the randomness of the estimator..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- sepal_length
- sepal_width
- petal_length
- petal_width
value: 2
scale: true
column: target
Command line
principal_component --config config_principal_component.yml --input_dataset_path dataset_principal_component.csv --output_results_path ref_output_results_principal_component.csv --output_plot_path ref_output_plot_principal_component.png
Common config file
"properties": {
"features": {
"columns": [
"target": {
"column": "target"
"n_components": {
"value": 2
"scale": true
Command line
principal_component --config config_principal_component.json --input_dataset_path dataset_principal_component.csv --output_results_path ref_output_results_principal_component.csv --output_plot_path ref_output_plot_principal_component.png
Wrapper of the scikit-learn SpectralClustering method.
Get help
spectral_clustering -h
usage: spectral_clustering [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_results_path OUTPUT_RESULTS_PATH [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn SpectralClustering method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_plot_path OUTPUT_PLOT_PATH
Path to the clustering plot. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_results_path OUTPUT_RESULTS_PATH
Path to the clustered dataset. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Path to the clustered dataset. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the clustering plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
predictors (object): ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
clusters (integer): (3) The number of clusters to form as well as the number of centroids to generate..
affinity (string): (rbf) How to construct the affinity matrix. .
plots (array): (None) List of dictionaries with all plots you want to generate. Only 2D or 3D plots accepted. Format: [ { ‘title’: ‘Plot 1’, ‘features’: [’feat1’, ‘feat2’] } ]..
random_state_method (integer): (5) A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when eigen_solver=’amg’ and by the K-Means initialization..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
affinity: nearest_neighbors
clusters: 3
- features:
- sepal_length
- sepal_width
title: Plot 1
- features:
- petal_length
- petal_width
title: Plot 2
- features:
- sepal_length
- sepal_width
- petal_length
title: Plot 3
- features:
- petal_length
- petal_width
- sepal_width
title: Plot 4
- features:
- sepal_length
- petal_width
title: Plot 5
- sepal_length
- sepal_width
- petal_length
- petal_width
scale: true
Command line
spectral_clustering --config config_spectral_clustering.yml --input_dataset_path dataset_spectral_clustering.csv --output_results_path ref_output_results_spectral_clustering.csv --output_plot_path ref_output_plot_spectral_clustering.png
Common config file
"properties": {
"predictors": {
"columns": [
"clusters": 3,
"affinity": "nearest_neighbors",
"plots": [
"title": "Plot 1",
"features": [
"title": "Plot 2",
"features": [
"title": "Plot 3",
"features": [
"title": "Plot 4",
"features": [
"title": "Plot 5",
"features": [
"scale": true
Command line
spectral_clustering --config config_spectral_clustering.json --input_dataset_path dataset_spectral_clustering.csv --output_results_path ref_output_results_spectral_clustering.csv --output_plot_path ref_output_plot_spectral_clustering.png
Wrapper of the scikit-learn RandomForestRegressor method.
Get help
random_forest_regressor -h
usage: random_forest_regressor [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_model_path OUTPUT_MODEL_PATH [--output_test_table_path OUTPUT_TEST_TABLE_PATH] [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn RandomForestRegressor method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_test_table_path OUTPUT_TEST_TABLE_PATH
Path to the test table file. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Residual plot checks the error between actual values and predicted values. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: pkl.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: PKL
output_test_table_path (string): Path to the test table file. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Residual plot checks the error between actual values and predicted values. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
independent_vars (object): ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
weight (object): ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
n_estimators (integer): (10) The number of trees in the forest..
max_depth (integer): (None) The maximum depth of the tree..
random_state_method (integer): (5) Controls the randomness of the estimator..
random_state_train_test (integer): (5) Controls the shuffling applied to the data before applying the split..
test_size (number): (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- - 0
- 5
- - 7
- 12
max_depth: 5
n_estimators: 10
scale: true
index: 13
test_size: 0.2
Command line
random_forest_regressor --config config_random_forest_regressor.yml --input_dataset_path dataset_random_forest_regressor.csv --output_model_path ref_output_model_random_forest_regressor.pkl --output_test_table_path ref_output_test_random_forest_regressor.csv --output_plot_path ref_output_plot_random_forest_regressor.png
Common config file
"properties": {
"independent_vars": {
"range": [
"target": {
"index": 13
"n_estimators": 10,
"max_depth": 5,
"test_size": 0.2,
"scale": true
Command line
random_forest_regressor --config config_random_forest_regressor.json --input_dataset_path dataset_random_forest_regressor.csv --output_model_path ref_output_model_random_forest_regressor.pkl --output_test_table_path ref_output_test_random_forest_regressor.csv --output_plot_path ref_output_plot_random_forest_regressor.png
Wrapper of the scikit-learn LogisticRegression method.
Get help
logistic_regression -h
usage: logistic_regression [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_model_path OUTPUT_MODEL_PATH [--output_test_table_path OUTPUT_TEST_TABLE_PATH] [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn LogisticRegression method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_test_table_path OUTPUT_TEST_TABLE_PATH
Path to the test table file. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: pkl.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: PKL
output_test_table_path (string): Path to the test table file. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
independent_vars (object): ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
weight (object): ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
solver (string): (liblinear) Numerical optimizer to find parameters. .
c_parameter (number): (0.01) Inverse of regularization strength; must be a positive float. Smaller values specify stronger regularization..
normalize_cm (boolean): (False) Whether or not to normalize the confusion matrix..
random_state_method (integer): (5) Controls the randomness of the estimator..
random_state_train_test (integer): (5) Controls the shuffling applied to the data before applying the split..
test_size (number): (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
c_parameter: 0.01
- mean area
- mean compactness
normalize_cm: false
scale: true
solver: liblinear
column: benign
test_size: 0.2
Command line
logistic_regression --config config_logistic_regression.yml --input_dataset_path dataset_logistic_regression.csv --output_model_path ref_output_model_logistic_regression.pkl --output_test_table_path ref_output_test_logistic_regression.csv --output_plot_path ref_output_plot_logistic_regression.png
Common config file
"properties": {
"independent_vars": {
"columns": [
"mean area",
"mean compactness"
"target": {
"column": "benign"
"solver": "liblinear",
"c_parameter": 0.01,
"normalize_cm": false,
"test_size": 0.2,
"scale": true
Command line
logistic_regression --config config_logistic_regression.json --input_dataset_path dataset_logistic_regression.csv --output_model_path ref_output_model_logistic_regression.pkl --output_test_table_path ref_output_test_logistic_regression.csv --output_plot_path ref_output_plot_logistic_regression.png
Wrapper of the scikit-learn PLSRegression method.
Get help
pls_components -h
usage: pls_components [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_results_path OUTPUT_RESULTS_PATH [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn PLSRegression method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_plot_path OUTPUT_PLOT_PATH
Path to the Mean Square Error plot. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_results_path OUTPUT_RESULTS_PATH
Table with R2 and MSE for calibration and cross-validation data for the best number of components. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Table with R2 and MSE for calibration and cross-validation data for the best number of components. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the Mean Square Error plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
features (object): ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
optimise (boolean): (False) Whether or not optimise the process of MSE calculation. Beware, if True selected, the process can take a long time depending on the max_components value..
max_components (integer): (10) Maximum number of components to use by default for PLS queries..
cv (integer): (10) Specify the number of folds in the cross-validation splitting strategy. Value must be between 2 and number of samples in the dataset..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
cv: 10
- - 0
- 29
max_components: 30
optimise: false
scale: true
index: 30
Command line
pls_components --config config_pls_components.yml --input_dataset_path dataset_pls_components.csv --output_results_path ref_output_results_pls_components.csv --output_plot_path ref_output_plot_pls_components.png
Common config file
"properties": {
"features": {
"range": [
"target": {
"index": 30
"optimise": false,
"max_components": 30,
"cv": 10,
"scale": true
Command line
pls_components --config config_pls_components.json --input_dataset_path dataset_pls_components.csv --output_results_path ref_output_results_pls_components.csv --output_plot_path ref_output_plot_pls_components.png
Generates a pairwise comparison from a given dataset.
Get help
pairwise_comparison -h
usage: pairwise_comparison [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_plot_path OUTPUT_PLOT_PATH
Generates a pairwise comparison from a given dataset
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Path to the pairwise comparison plot. Accepted formats: png.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the pairwise comparison plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
features (object): ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- 0
- 1
- 2
- 3
Command line
pairwise_comparison --config config_pairwise_comparison.yml --input_dataset_path dataset_pairwise_comparison.csv --output_plot_path ref_output_plot_pairwise_comparison.png
Common config file
"properties": {
"features": {
"indexes": [
Command line
pairwise_comparison --config config_pairwise_comparison.json --input_dataset_path dataset_pairwise_comparison.csv --output_plot_path ref_output_plot_pairwise_comparison.png
Wrapper of the scikit-learn KNeighborsClassifier method.
Get help
k_neighbors -h
usage: k_neighbors [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_model_path OUTPUT_MODEL_PATH [--output_test_table_path OUTPUT_TEST_TABLE_PATH] [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn KNeighborsClassifier method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_test_table_path OUTPUT_TEST_TABLE_PATH
Path to the test table file. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: pkl.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: PKL
output_test_table_path (string): Path to the test table file. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
independent_vars (object): ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
weight (object): ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
metric (string): (minkowski) The distance metric to use for the tree. .
n_neighbors (integer): (6) Number of neighbors to use by default for kneighbors queries..
normalize_cm (boolean): (False) Whether or not to normalize the confusion matrix..
random_state_train_test (integer): (5) Controls the shuffling applied to the data before applying the split..
test_size (number): (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- interest_rate
- credit
- march
- previous
- duration
metric: minkowski
n_neighbors: 5
normalize_cm: false
scale: true
column: y
test_size: 0.2
Command line
k_neighbors --config config_k_neighbors.yml --input_dataset_path dataset_k_neighbors.csv --output_model_path ref_output_model_k_neighbors.pkl --output_test_table_path ref_output_test_k_neighbors.csv --output_plot_path ref_output_plot_k_neighbors.png
Common config file
"properties": {
"independent_vars": {
"columns": [
"target": {
"column": "y"
"metric": "minkowski",
"n_neighbors": 5,
"normalize_cm": false,
"test_size": 0.2,
"scale": true
Command line
k_neighbors --config config_k_neighbors.json --input_dataset_path dataset_k_neighbors.csv --output_model_path ref_output_model_k_neighbors.pkl --output_test_table_path ref_output_test_k_neighbors.csv --output_plot_path ref_output_plot_k_neighbors.png
Wrapper of the scikit-learn SupportVectorMachine method.
Get help
support_vector_machine -h
usage: support_vector_machine [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_model_path OUTPUT_MODEL_PATH [--output_test_table_path OUTPUT_TEST_TABLE_PATH] [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn SupportVectorMachine method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_test_table_path OUTPUT_TEST_TABLE_PATH
Path to the test table file. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: pkl.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: PKL
output_test_table_path (string): Path to the test table file. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
independent_vars (object): ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
weight (object): ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
kernel (string): (rbf) Specifies the kernel type to be used in the algorithm. .
normalize_cm (boolean): (False) Whether or not to normalize the confusion matrix..
random_state_method (integer): (5) Controls the randomness of the estimator..
random_state_train_test (integer): (5) Controls the shuffling applied to the data before applying the split..
test_size (number): (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- - 0
- 2
- - 4
- 5
kernel: rbf
normalize_cm: false
scale: true
index: 6
test_size: 0.2
Command line
support_vector_machine --config config_support_vector_machine.yml --input_dataset_path dataset_support_vector_machine.csv --output_model_path ref_output_model_support_vector_machine.pkl --output_test_table_path ref_output_test_support_vector_machine.csv --output_plot_path ref_output_plot_support_vector_machine.png
Common config file
"properties": {
"independent_vars": {
"range": [
"target": {
"index": 6
"kernel": "rbf",
"normalize_cm": false,
"test_size": 0.2,
"scale": true
Command line
support_vector_machine --config config_support_vector_machine.json --input_dataset_path dataset_support_vector_machine.csv --output_model_path ref_output_model_support_vector_machine.pkl --output_test_table_path ref_output_test_support_vector_machine.csv --output_plot_path ref_output_plot_support_vector_machine.png
Wrapper of the TensorFlow Keras Sequential method for regression.
Get help
regression_neural_network -h
usage: regression_neural_network [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_model_path OUTPUT_MODEL_PATH [--output_test_table_path OUTPUT_TEST_TABLE_PATH] [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the TensorFlow Keras Sequential method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_test_table_path OUTPUT_TEST_TABLE_PATH
Path to the test table file. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Loss, MAE and MSE plots. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: h5.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: H5
output_test_table_path (string): Path to the test table file. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Loss, MAE and MSE plots. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
features (object): ({}) Independent variables or columns from your dataset you want to train. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
weight (object): ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
validation_size (number): (0.2) Represents the proportion of the dataset to include in the validation split. It should be between 0.0 and 1.0..
test_size (number): (0.1) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0..
hidden_layers (array): (None) List of dictionaries with hidden layers values. Format: [ { ‘size’: 50, ‘activation’: ‘relu’ } ]..
output_layer_activation (string): (softmax) Activation function to use in the output layer. .
optimizer (string): (Adam) Name of optimizer instance. .
learning_rate (number): (0.02) Determines the step size at each iteration while moving toward a minimum of a loss function.
batch_size (integer): (100) Number of samples per gradient update..
max_epochs (integer): (100) Number of epochs to train the model. As the early stopping is enabled, this is a maximum..
random_state (integer): (5) Controls the shuffling applied to the data before applying the split. ..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
batch_size: 32
- ZN
- RM
- activation: relu
size: 10
- activation: relu
size: 8
learning_rate: 0.01
max_epochs: 150
optimizer: Adam
column: MEDV
test_size: 0.2
validation_size: 0.2
Command line
regression_neural_network --config config_regression_neural_network.yml --input_dataset_path dataset_regression.csv --output_model_path ref_output_model_regression.h5 --output_test_table_path ref_output_test_regression.csv --output_plot_path ref_output_plot_regression.png
Common config file
"properties": {
"features": {
"columns": [
"target": {
"column": "MEDV"
"validation_size": 0.2,
"test_size": 0.2,
"hidden_layers": [
"size": 10,
"activation": "relu"
"size": 8,
"activation": "relu"
"optimizer": "Adam",
"learning_rate": 0.01,
"batch_size": 32,
"max_epochs": 150
Command line
regression_neural_network --config config_regression_neural_network.json --input_dataset_path dataset_regression.csv --output_model_path ref_output_model_regression.h5 --output_test_table_path ref_output_test_regression.csv --output_plot_path ref_output_plot_regression.png
Wrapper of the scikit-learn LinearRegression method with PolynomialFeatures.
Get help
polynomial_regression -h
usage: polynomial_regression [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_model_path OUTPUT_MODEL_PATH [--output_test_table_path OUTPUT_TEST_TABLE_PATH] [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn LinearRegression method with PolynomialFeatures.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_test_table_path OUTPUT_TEST_TABLE_PATH
Path to the test table file. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Residual plot checks the error between actual values and predicted values. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: pkl.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: PKL
output_test_table_path (string): Path to the test table file. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Residual plot checks the error between actual values and predicted values. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
independent_vars (object): ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
weight (object): ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
random_state_train_test (integer): (5) Controls the shuffling applied to the data before applying the split..
degree (integer): (2) Polynomial degree..
test_size (number): (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
degree: 2
- RM
- ZN
scale: true
column: MEDV
test_size: 0.2
Command line
polynomial_regression --config config_polynomial_regression.yml --input_dataset_path dataset_polynomial_regression.csv --output_model_path ref_output_model_polynomial_regression.pkl --output_test_table_path ref_output_test_polynomial_regression.csv --output_plot_path ref_output_plot_polynomial_regression.png
Common config file
"properties": {
"independent_vars": {
"columns": [
"target": {
"column": "MEDV"
"degree": 2,
"test_size": 0.2,
"scale": true
Command line
polynomial_regression --config config_polynomial_regression.json --input_dataset_path dataset_polynomial_regression.csv --output_model_path ref_output_model_polynomial_regression.pkl --output_test_table_path ref_output_test_polynomial_regression.csv --output_plot_path ref_output_plot_polynomial_regression.png
Scales columns from a given dataset.
Get help
scale_columns -h
usage: scale_columns [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_dataset_path OUTPUT_DATASET_PATH
Scales columns from a given dataset
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_dataset_path OUTPUT_DATASET_PATH
Path to the output dataset. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_dataset_path (string): Path to the output dataset. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
targets (object): ({}) Independent variables or columns from your dataset you want to scale. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
Command line
scale_columns --config config_scale_columns.yml --input_dataset_path dataset_scale.csv --output_dataset_path ref_output_scale.csv
Common config file
"properties": {
"targets": {
"columns": [
Command line
scale_columns --config config_scale_columns.json --input_dataset_path dataset_scale.csv --output_dataset_path ref_output_scale.csv
Wrapper of the TensorFlow Keras LSTM method for encoding.
Get help
autoencoder_neural_network -h
usage: autoencoder_neural_network [-h] [--config CONFIG] --input_decode_path INPUT_DECODE_PATH [--input_predict_path INPUT_PREDICT_PATH] --output_model_path OUTPUT_MODEL_PATH [--output_test_decode_path OUTPUT_TEST_DECODE_PATH] [--output_test_predict_path OUTPUT_TEST_PREDICT_PATH]
Wrapper of the TensorFlow Keras LSTM method for encoding.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--input_predict_path INPUT_PREDICT_PATH
Path to the input predict dataset. Accepted formats: csv.
--output_test_decode_path OUTPUT_TEST_DECODE_PATH
Path to the test decode table file. Accepted formats: csv.
--output_test_predict_path OUTPUT_TEST_PREDICT_PATH
Path to the test predict table file. Accepted formats: csv.
required arguments:
--input_decode_path INPUT_DECODE_PATH
Path to the input decode dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: h5.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_decode_path (string): Path to the input decode dataset. File type: input. Sample file. Accepted formats: CSV
input_predict_path (string): Path to the input predict dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: H5
output_test_decode_path (string): Path to the test decode table file. File type: output. Sample file. Accepted formats: CSV
output_test_predict_path (string): Path to the test predict table file. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
optimizer (string): (Adam) Name of optimizer instance. .
learning_rate (number): (0.02) Determines the step size at each iteration while moving toward a minimum of a loss function.
batch_size (integer): (100) Number of samples per gradient update..
max_epochs (integer): (100) Number of epochs to train the model. As the early stopping is enabled, this is a maximum..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
batch_size: 32
learning_rate: 0.01
max_epochs: 300
optimizer: Adam
Command line
autoencoder_neural_network --config config_autoencoder_neural_network.yml --input_decode_path dataset_autoencoder_decode.csv --input_predict_path dataset_autoencoder_predict.csv --output_model_path ref_output_model_autoencoder.h5 --output_test_decode_path ref_output_test_decode_autoencoder.csv --output_test_predict_path ref_output_test_predict_autoencoder.csv
Common config file
"properties": {
"optimizer": "Adam",
"learning_rate": 0.01,
"batch_size": 32,
"max_epochs": 300
Command line
autoencoder_neural_network --config config_autoencoder_neural_network.json --input_decode_path dataset_autoencoder_decode.csv --input_predict_path dataset_autoencoder_predict.csv --output_model_path ref_output_model_autoencoder.h5 --output_test_decode_path ref_output_test_decode_autoencoder.csv --output_test_predict_path ref_output_test_predict_autoencoder.csv
Wrapper of the scikit-learn KMeans method.
Get help
k_means -h
usage: k_means [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_results_path OUTPUT_RESULTS_PATH --output_model_path OUTPUT_MODEL_PATH [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn KMeans method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_plot_path OUTPUT_PLOT_PATH
Path to the clustering plot. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_results_path OUTPUT_RESULTS_PATH
Path to the clustered dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: pkl.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Path to the clustered dataset. File type: output. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: PKL
output_plot_path (string): Path to the clustering plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
predictors (object): ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
clusters (integer): (3) The number of clusters to form as well as the number of centroids to generate..
plots (array): (None) List of dictionaries with all plots you want to generate. Only 2D or 3D plots accepted. Format: [ { ‘title’: ‘Plot 1’, ‘features’: [’feat1’, ‘feat2’] } ]..
random_state_method (integer): (5) Determines random number generation for centroid initialization..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
clusters: 3
- features:
- sepal_length
- sepal_width
title: Plot 1
- features:
- petal_length
- petal_width
title: Plot 2
- features:
- sepal_length
- sepal_width
- petal_length
title: Plot 3
- features:
- petal_length
- petal_width
- sepal_width
title: Plot 4
- features:
- sepal_length
- petal_width
title: Plot 5
- sepal_length
- sepal_width
- petal_length
- petal_width
scale: true
Command line
k_means --config config_k_means.yml --input_dataset_path dataset_k_means.csv --output_results_path ref_output_results_k_means.csv --output_model_path ref_output_model_k_means.pkl --output_plot_path ref_output_plot_k_means.png
Common config file
"properties": {
"predictors": {
"columns": [
"clusters": 3,
"plots": [
"title": "Plot 1",
"features": [
"title": "Plot 2",
"features": [
"title": "Plot 3",
"features": [
"title": "Plot 4",
"features": [
"title": "Plot 5",
"features": [
"scale": true
Command line
k_means --config config_k_means.json --input_dataset_path dataset_k_means.csv --output_results_path ref_output_results_k_means.csv --output_model_path ref_output_model_k_means.pkl --output_plot_path ref_output_plot_k_means.png
Maps the values of a given dataset.
Get help
map_variables -h
usage: map_variables [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_dataset_path OUTPUT_DATASET_PATH
Maps the values of a given dataset.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_dataset_path OUTPUT_DATASET_PATH
Path to the output dataset. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_dataset_path (string): Path to the output dataset. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
targets (object): ({}) Independent variables or columns from your dataset you want to drop. If None given, all the columns will be taken. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- target
Command line
map_variables --config config_map_variables.yml --input_dataset_path dataset_map_variables.csv --output_dataset_path ref_output_dataset_map_variables.csv
Common config file
"properties": {
"targets": {
"columns": [
Command line
map_variables --config config_map_variables.json --input_dataset_path dataset_map_variables.csv --output_dataset_path ref_output_dataset_map_variables.csv
Makes predictions from an input dataset and a given regression model.
Get help
regression_predict -h
usage: regression_predict [-h] [--config CONFIG] --input_model_path INPUT_MODEL_PATH --output_results_path OUTPUT_RESULTS_PATH [--input_dataset_path INPUT_DATASET_PATH]
Makes predictions from an input dataset and a given regression model.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--input_dataset_path INPUT_DATASET_PATH
Path to the dataset to predict. Accepted formats: csv.
required arguments:
--input_model_path INPUT_MODEL_PATH
Path to the input model. Accepted formats: pkl.
--output_results_path OUTPUT_RESULTS_PATH
Path to the output results file. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_model_path (string): Path to the input model. File type: input. Sample file. Accepted formats: PKL
input_dataset_path (string): Path to the dataset to predict. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Path to the output results file. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
predictions (array): (None) List of dictionaries with all values you want to predict targets. It will be taken into account only in case input_dataset_path is not provided. Format: [{ ‘var1’: 1.0, ‘var2’: 2.0 }, { ‘var1’: 4.0, ‘var2’: 2.7 }] for datasets with headers and [[ 1.0, 2.0 ], [ 4.0, 2.7 ]] for datasets without headers..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- AGE: 65.2
LSTAT: 4.98
RM: 6.575
ZN: 18.0
- AGE: 78.9
LSTAT: 9.14
RM: 6.421
ZN: 0.0
Command line
regression_predict --config config_regression_predict.yml --input_model_path model_regression_predict.pkl --input_dataset_path input_regression_predict.csv --output_results_path ref_output_regression_predict.csv
Common config file
"properties": {
"predictions": [
"LSTAT": 4.98,
"ZN": 18.0,
"RM": 6.575,
"AGE": 65.2
"LSTAT": 9.14,
"ZN": 0.0,
"RM": 6.421,
"AGE": 78.9
Command line
regression_predict --config config_regression_predict.json --input_model_path model_regression_predict.pkl --input_dataset_path input_regression_predict.csv --output_results_path ref_output_regression_predict.csv
Generates a dendrogram from a given dataset.
Get help
dendrogram -h
usage: dendrogram [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_plot_path OUTPUT_PLOT_PATH
Generates a dendrogram from a given dataset
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Path to the dendrogram plot. Accepted formats: png.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the dendrogram plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
features (object): ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- Satisfaction
- Loyalty
Command line
dendrogram --config config_dendrogram.yml --input_dataset_path dataset_dendrogram.csv --output_plot_path ref_output_plot_dendrogram.png
Common config file
"properties": {
"features": {
"columns": [
Command line
dendrogram --config config_dendrogram.json --input_dataset_path dataset_dendrogram.csv --output_plot_path ref_output_plot_dendrogram.png
Wrapper of the scikit-learn AgglomerativeClustering method.
Get help
agglomerative_coefficient -h
usage: agglomerative_coefficient [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_results_path OUTPUT_RESULTS_PATH [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn AgglomerativeCoefficient method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_plot_path OUTPUT_PLOT_PATH
Path to the elbow and gap methods plot. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_results_path OUTPUT_RESULTS_PATH
Path to the gap values list. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Path to the gap values list. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the elbow method and gap statistics plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
predictors (object): ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
max_clusters (integer): (6) Maximum number of clusters to use by default for kmeans queries..
affinity (string): (euclidean) Metric used to compute the linkage. If linkage is “ward”, only “euclidean” is accepted. .
linkage (string): (ward) The linkage criterion determines which distance to use between sets of observation. The algorithm will merge the pairs of cluster that minimize this criterion. .
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
max_clusters: 10
- sepal_length
- sepal_width
scale: true
Command line
agglomerative_coefficient --config config_agglomerative_coefficient.yml --input_dataset_path dataset_agglomerative_coefficient.csv --output_results_path ref_output_results_agglomerative_coefficient.csv --output_plot_path ref_output_plot_agglomerative_coefficient.png
Common config file
"properties": {
"predictors": {
"columns": [
"max_clusters": 10,
"scale": true
Command line
agglomerative_coefficient --config config_agglomerative_coefficient.json --input_dataset_path dataset_agglomerative_coefficient.csv --output_results_path ref_output_results_agglomerative_coefficient.csv --output_plot_path ref_output_plot_agglomerative_coefficient.png
Wrapper of the TensorFlow Keras LSTM method using Recurrent Neural Networks.
Get help
recurrent_neural_network -h
usage: recurrent_neural_network [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_model_path OUTPUT_MODEL_PATH [--output_test_table_path OUTPUT_TEST_TABLE_PATH] [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the TensorFlow Keras LSTM method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_test_table_path OUTPUT_TEST_TABLE_PATH
Path to the test table file. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Loss, accuracy and MSE plots. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: h5.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: H5
output_test_table_path (string): Path to the test table file. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Loss, accuracy and MSE plots. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
validation_size (number): (0.2) Represents the proportion of the dataset to include in the validation split. It should be between 0.0 and 1.0..
window_size (integer): (5) Number of steps for each window of training model..
test_size (integer): (5) Represents the number of samples of the dataset to include in the test split..
hidden_layers (array): (None) List of dictionaries with hidden layers values. Format: [ { ‘size’: 50, ‘activation’: ‘relu’ } ]..
optimizer (string): (Adam) Name of optimizer instance. .
learning_rate (number): (0.02) Determines the step size at each iteration while moving toward a minimum of a loss function.
batch_size (integer): (100) Number of samples per gradient update..
max_epochs (integer): (100) Number of epochs to train the model. As the early stopping is enabled, this is a maximum..
normalize_cm (boolean): (False) Whether or not to normalize the confusion matrix..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
batch_size: 32
- activation: relu
size: 100
- activation: relu
size: 50
- activation: relu
size: 50
learning_rate: 0.01
max_epochs: 50
optimizer: Adam
index: 1
test_size: 12
validation_size: 0.2
window_size: 5
Command line
recurrent_neural_network --config config_recurrent_neural_network.yml --input_dataset_path dataset_recurrent.csv --output_model_path ref_output_model_recurrent.h5 --output_test_table_path ref_output_test_recurrent.csv --output_plot_path ref_output_plot_recurrent.png
Common config file
"properties": {
"target": {
"index": 1
"window_size": 5,
"validation_size": 0.2,
"test_size": 12,
"hidden_layers": [
"size": 100,
"activation": "relu"
"size": 50,
"activation": "relu"
"size": 50,
"activation": "relu"
"optimizer": "Adam",
"learning_rate": 0.01,
"batch_size": 32,
"max_epochs": 50
Command line
recurrent_neural_network --config config_recurrent_neural_network.json --input_dataset_path dataset_recurrent.csv --output_model_path ref_output_model_recurrent.h5 --output_test_table_path ref_output_test_recurrent.csv --output_plot_path ref_output_plot_recurrent.png
Converts categorical variables into dummy/indicator variables (binaries).
Get help
dummy_variables -h
usage: dummy_variables [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_dataset_path OUTPUT_DATASET_PATH
Maps dummy variables from a given dataset.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_dataset_path OUTPUT_DATASET_PATH
Path to the output dataset. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_dataset_path (string): Path to the output dataset. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
targets (object): ({}) Independent variables or columns from your dataset you want to drop. If None given, all the columns will be taken. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- view
Command line
dummy_variables --config config_dummy_variables.yml --input_dataset_path dataset_dummy_variables.csv --output_dataset_path ref_output_dataset_dummy_variables.csv
Common config file
"properties": {
"targets": {
"columns": [
Command line
dummy_variables --config config_dummy_variables.json --input_dataset_path dataset_dummy_variables.csv --output_dataset_path ref_output_dataset_dummy_variables.csv
Makes predictions from an input dataset and a given clustering model.
Get help
clustering_predict -h
usage: clustering_predict [-h] [--config CONFIG] --input_model_path INPUT_MODEL_PATH --output_results_path OUTPUT_RESULTS_PATH [--input_dataset_path INPUT_DATASET_PATH]
Makes predictions from an input dataset and a given clustering model.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--input_dataset_path INPUT_DATASET_PATH
Path to the dataset to predict. Accepted formats: csv.
required arguments:
--input_model_path INPUT_MODEL_PATH
Path to the input model. Accepted formats: pkl.
--output_results_path OUTPUT_RESULTS_PATH
Path to the output results file. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_model_path (string): Path to the input model. File type: input. Sample file. Accepted formats: PKL
input_dataset_path (string): Path to the dataset to predict. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Path to the output results file. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
predictions (array): (None) List of dictionaries with all values you want to predict targets. It will be taken into account only in case input_dataset_path is not provided. Format: [{ ‘var1’: 1.0, ‘var2’: 2.0 }, { ‘var1’: 4.0, ‘var2’: 2.7 }] for datasets with headers and [[ 1.0, 2.0 ], [ 4.0, 2.7 ]] for datasets without headers..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- petal_length: 1.4
petal_width: 0.2
sepal_length: 5.1
sepal_width: 3.5
- petal_length: 5.2
petal_width: 2.3
sepal_length: 6.7
sepal_width: 3.0
- petal_length: 5.0
petal_width: 1.9
sepal_length: 6.3
sepal_width: 2.5
Command line
clustering_predict --config config_clustering_predict.yml --input_model_path model_clustering_predict.pkl --input_dataset_path input_clustering_predict.csv --output_results_path ref_output_results_clustering_predict.csv
Common config file
"properties": {
"predictions": [
"sepal_length": 5.1,
"sepal_width": 3.5,
"petal_length": 1.4,
"petal_width": 0.2
"sepal_length": 6.7,
"sepal_width": 3.0,
"petal_length": 5.2,
"petal_width": 2.3
"sepal_length": 6.3,
"sepal_width": 2.5,
"petal_length": 5.0,
"petal_width": 1.9
Command line
clustering_predict --config config_clustering_predict.json --input_model_path model_clustering_predict.pkl --input_dataset_path input_clustering_predict.csv --output_results_path ref_output_results_clustering_predict.csv
Wrapper of most of the imblearn.under_sampling methods.
Get help
undersampling -h
usage: undersampling [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_dataset_path OUTPUT_DATASET_PATH
Wrapper of most of the imblearn.under_sampling methods.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_dataset_path OUTPUT_DATASET_PATH
Path to the output dataset. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_dataset_path (string): Path to the output dataset. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
method (string): (None) Undersampling method. It’s a mandatory property. .
type (string): (None) Type of oversampling. It’s a mandatory property. .
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
evaluate (boolean): (False) Whether or not to evaluate the dataset before and after applying the resampling..
evaluate_splits (integer): (3) Number of folds to be applied by the Repeated Stratified K-Fold evaluation method. Must be at least 2..
evaluate_repeats (integer): (3) Number of times Repeated Stratified K-Fold cross validator needs to be repeated..
n_bins (integer): (5) Only for regression undersampling. The number of classes that the user wants to generate with the target data..
balanced_binning (boolean): (False) Only for regression undersampling. Decides whether samples are to be distributed roughly equally across all classes..
sampling_strategy (object): ({’target’: ‘auto’}) Sampling information to sample the data set. Formats: { “target”: “auto” }, { “ratio”: 0.3 }, { “dict”: { 0: 300, 1: 200, 2: 100 } } or { “list”: [0, 2, 3] }. When “target”, specify the class targeted by the resampling; the number of samples in the different classes will be equalized; possible choices are: majority (resample only the majority class), not minority (resample all classes but the minority class), not majority (resample all classes but the majority class), all (resample all classes), auto (equivalent to ‘not minority’). When “ratio”, it corresponds to the desired ratio of the number of samples in the minority class over the number of samples in the majority class after resampling (ONLY IN CASE OF BINARY CLASSIFICATION). When “dict”, the keys correspond to the targeted classes, the values correspond to the desired number of samples for each targeted class. When “list”, the list contains the classes targeted by the resampling..
version (integer): (1) Only for NearMiss method. Version of the NearMiss to use. .
n_neighbors (integer): (1) Only for NearMiss, CondensedNearestNeighbour, EditedNearestNeighbours and NeighbourhoodCleaningRule methods. Size of the neighbourhood to consider to compute the average distance to the minority point samples..
threshold_cleaning (number): (0.5) Only for NeighbourhoodCleaningRule method. Threshold used to whether consider a class or not during the cleaning after applying ENN..
random_state_method (integer): (5) Only for RandomUnderSampler and ClusterCentroids methods. Controls the randomization of the algorithm..
random_state_evaluate (integer): (5) Controls the shuffling applied to the Repeated Stratified K-Fold evaluation method..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
evaluate: true
method: enn
n_bins: 10
n_neighbors: 3
column: VALUE
type: regression
Command line
undersampling --config config_undersampling.yml --input_dataset_path dataset_resampling.csv --output_dataset_path ref_output_undersampling.csv
Common config file
"properties": {
"method": "enn",
"type": "regression",
"target": {
"column": "VALUE"
"evaluate": true,
"n_bins": 10,
"n_neighbors": 3
Command line
undersampling --config config_undersampling.json --input_dataset_path dataset_resampling.csv --output_dataset_path ref_output_undersampling.csv
Generates a correlation matrix from a given dataset.
Get help
correlation_matrix -h
usage: correlation_matrix [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_plot_path OUTPUT_PLOT_PATH
Generates a correlation matrix from a given dataset
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Path to the correlation matrix plot. Accepted formats: png.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the correlation matrix plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
features (object): ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- sepal_length
- sepal_width
- petal_length
- petal_width
Command line
correlation_matrix --config config_correlation_matrix.yml --input_dataset_path dataset_correlation_matrix.csv --output_plot_path ref_output_plot_correlation_matrix.png
Common config file
"properties": {
"features": {
"columns": [
Command line
correlation_matrix --config config_correlation_matrix.json --input_dataset_path dataset_correlation_matrix.csv --output_plot_path ref_output_plot_correlation_matrix.png
Wrapper of most of the imblearn.over_sampling methods.
Get help
oversampling -h
usage: oversampling [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_dataset_path OUTPUT_DATASET_PATH
Wrapper of most of the imblearn.over_sampling methods.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_dataset_path OUTPUT_DATASET_PATH
Path to the output dataset. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_dataset_path (string): Path to the output dataset. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
method (string): (None) Oversampling method. It’s a mandatory property. .
type (string): (None) Type of oversampling. It’s a mandatory property. .
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
evaluate (boolean): (False) Whether or not to evaluate the dataset before and after applying the resampling..
evaluate_splits (integer): (3) Number of folds to be applied by the Repeated Stratified K-Fold evaluation method. Must be at least 2..
evaluate_repeats (integer): (3) Number of times Repeated Stratified K-Fold cross validator needs to be repeated..
n_bins (integer): (5) Only for regression oversampling. The number of classes that the user wants to generate with the target data..
balanced_binning (boolean): (False) Only for regression oversampling. Decides whether samples are to be distributed roughly equally across all classes..
sampling_strategy (object): ({’target’: ‘auto’}) Sampling information to sample the data set. Formats: { “target”: “auto” }, { “ratio”: 0.3 }, { “dict”: { 0: 300, 1: 200, 2: 100 } } or { “list”: [0, 2, 3] }. When “target”, specify the class targeted by the resampling; the number of samples in the different classes will be equalized; possible choices are: minority (resample only the minority class), not minority (resample all classes but the minority class), not majority (resample all classes but the majority class), all (resample all classes), auto (equivalent to ‘not majority’). When “ratio”, it corresponds to the desired ratio of the number of samples in the minority class over the number of samples in the majority class after resampling (ONLY IN CASE OF BINARY CLASSIFICATION). When “dict”, the keys correspond to the targeted classes, the values correspond to the desired number of samples for each targeted class. When “list”, the list contains the classes targeted by the resampling..
k_neighbors (integer): (5) Only for SMOTE, BorderlineSMOTE, SVMSMOTE, ADASYN. The number of nearest neighbours used to construct synthetic samples..
random_state_method (integer): (5) Controls the randomization of the algorithm..
random_state_evaluate (integer): (5) Controls the shuffling applied to the Repeated Stratified K-Fold evaluation method..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
evaluate: true
method: random
n_bins: 10
target: minority
column: VALUE
type: regression
Command line
oversampling --config config_oversampling.yml --input_dataset_path dataset_resampling.csv --output_dataset_path ref_output_oversampling.csv
Common config file
"properties": {
"method": "random",
"type": "regression",
"target": {
"column": "VALUE"
"evaluate": true,
"n_bins": 10,
"sampling_strategy": {
"target": "minority"
Command line
oversampling --config config_oversampling.json --input_dataset_path dataset_resampling.csv --output_dataset_path ref_output_oversampling.csv
Wrapper of the TensorFlow Keras LSTM method for decoding.
Get help
neural_network_decode -h
usage: neural_network_decode [-h] [--config CONFIG] --input_decode_path INPUT_DECODE_PATH --input_model_path INPUT_MODEL_PATH --output_decode_path OUTPUT_DECODE_PATH [--output_predict_path OUTPUT_PREDICT_PATH]
Wrapper of the TensorFlow Keras LSTM method for decoding.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_predict_path OUTPUT_PREDICT_PATH
Path to the output predict file. Accepted formats: csv.
required arguments:
--input_decode_path INPUT_DECODE_PATH
Path to the input decode dataset. Accepted formats: csv.
--input_model_path INPUT_MODEL_PATH
Path to the input model. Accepted formats: h5.
--output_decode_path OUTPUT_DECODE_PATH
Path to the output decode file. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_decode_path (string): Path to the input decode dataset. File type: input. Sample file. Accepted formats: CSV
input_model_path (string): Path to the input model. File type: input. Sample file. Accepted formats: H5
output_decode_path (string): Path to the output decode file. File type: output. Sample file. Accepted formats: CSV
output_predict_path (string): Path to the output predict file. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
remove_tmp: false
Command line
neural_network_decode --config config_neural_network_decode.yml --input_decode_path dataset_decoder.csv --input_model_path input_model_decoder.h5 --output_decode_path ref_output_decode_decoder.csv --output_predict_path ref_output_predict_decoder.csv
Common config file
"properties": {
"remove_tmp": false
Command line
neural_network_decode --config config_neural_network_decode.json --input_decode_path dataset_decoder.csv --input_model_path input_model_decoder.h5 --output_decode_path ref_output_decode_decoder.csv --output_predict_path ref_output_predict_decoder.csv
Wrapper of the scikit-learn PLSRegression method.
Get help
pls_regression -h
usage: pls_regression [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_results_path OUTPUT_RESULTS_PATH [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn PLSRegression method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_plot_path OUTPUT_PLOT_PATH
Path to the R2 cross-validation plot. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_results_path OUTPUT_RESULTS_PATH
Table with R2 and MSE for calibration and cross-validation data. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Table with R2 and MSE for calibration and cross-validation data. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the R2 cross-validation plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
features (object): ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
n_components (integer): (5) Maximum number of components to use by default for PLS queries..
cv (integer): (10) Specify the number of folds in the cross-validation splitting strategy. Value must be betwwen 2 and number of samples in the dataset..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
cv: 10
- - 0
- 29
n_components: 12
scale: true
index: 30
Command line
pls_regression --config config_pls_regression.yml --input_dataset_path dataset_pls_regression.csv --output_results_path ref_output_results_pls_regression.csv --output_plot_path ref_output_plot_pls_regression.png
Common config file
"properties": {
"features": {
"range": [
"target": {
"index": 30
"n_components": 12,
"cv": 10,
"scale": true
Command line
pls_regression --config config_pls_regression.json --input_dataset_path dataset_pls_regression.csv --output_results_path ref_output_results_pls_regression.csv --output_plot_path ref_output_plot_pls_regression.png
Wrapper of the scikit-learn KMeans method.
Get help
k_means_coefficient -h
usage: k_means_coefficient [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_results_path OUTPUT_RESULTS_PATH [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn KMeans method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_plot_path OUTPUT_PLOT_PATH
Path to the elbow and gap methods plot. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_results_path OUTPUT_RESULTS_PATH
Table with WCSS (elbow method), Gap and Silhouette coefficients for each cluster. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Table with WCSS (elbow method), Gap and Silhouette coefficients for each cluster. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the elbow method and gap statistics plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
predictors (object): ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
max_clusters (integer): (6) Maximum number of clusters to use by default for kmeans queries..
random_state_method (integer): (5) Determines random number generation for centroid initialization..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
max_clusters: 10
- sepal_length
- sepal_width
scale: true
Command line
k_means_coefficient --config config_k_means_coefficient.yml --input_dataset_path dataset_k_means_coefficient.csv --output_results_path ref_output_results_k_means_coefficient.csv --output_plot_path ref_output_plot_k_means_coefficient.png
Common config file
"properties": {
"predictors": {
"columns": [
"max_clusters": 10,
"scale": true
Command line
k_means_coefficient --config config_k_means_coefficient.json --input_dataset_path dataset_k_means_coefficient.csv --output_results_path ref_output_results_k_means_coefficient.csv --output_plot_path ref_output_plot_k_means_coefficient.png
Wrapper of the scikit-learn SpectralClustering method.
Get help
spectral_coefficient -h
usage: spectral_coefficient [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_results_path OUTPUT_RESULTS_PATH [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn SpectralClustering method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_plot_path OUTPUT_PLOT_PATH
Path to the elbow and gap methods plot. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_results_path OUTPUT_RESULTS_PATH
Table with WCSS (elbow method), Gap and Silhouette coefficients for each cluster. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Table with WCSS (elbow method), Gap and Silhouette coefficients for each cluster. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the elbow method and gap statistics plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
predictors (object): ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
max_clusters (integer): (6) Maximum number of clusters to use by default for kmeans queries..
random_state_method (integer): (5) A pseudo random number generator used for the initialization of the lobpcg eigen vectors decomposition when eigen_solver=’amg’ and by the K-Means initialization..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
max_clusters: 10
- sepal_length
- sepal_width
scale: true
Command line
spectral_coefficient --config config_spectral_coefficient.yml --input_dataset_path dataset_spectral_coefficient.csv --output_results_path ref_output_results_spectral_coefficient.csv --output_plot_path ref_output_plot_spectral_coefficient.png
Common config file
"properties": {
"predictors": {
"columns": [
"max_clusters": 10,
"scale": true
Command line
spectral_coefficient --config config_spectral_coefficient.json --input_dataset_path dataset_spectral_coefficient.csv --output_results_path ref_output_results_spectral_coefficient.csv --output_plot_path ref_output_plot_spectral_coefficient.png
Makes predictions from an input dataset and a given model.
Get help
neural_network_predict -h
usage: neural_network_predict [-h] [--config CONFIG] --input_model_path INPUT_MODEL_PATH --output_results_path OUTPUT_RESULTS_PATH [--input_dataset_path INPUT_DATASET_PATH]
Makes predictions from an input dataset and a given classification model.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--input_dataset_path INPUT_DATASET_PATH
Path to the dataset to predict. Accepted formats: csv.
required arguments:
--input_model_path INPUT_MODEL_PATH
Path to the input model. Accepted formats: h5.
--output_results_path OUTPUT_RESULTS_PATH
Path to the output results file. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_model_path (string): Path to the input model. File type: input. Sample file. Accepted formats: H5
input_dataset_path (string): Path to the dataset to predict. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Path to the output results file. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
predictions (array): (None) List of dictionaries with all values you want to predict targets. It will be taken into account only in case input_dataset_path is not provided. Format: [{ ‘var1’: 1.0, ‘var2’: 2.0 }, { ‘var1’: 4.0, ‘var2’: 2.7 }] for datasets with headers and [[ 1.0, 2.0 ], [ 4.0, 2.7 ]] for datasets without headers..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- AGE: 65.2
LSTAT: 4.98
RM: 6.575
ZN: 18.0
- AGE: 78.9
LSTAT: 9.14
RM: 6.421
ZN: 0.0
- AGE: 61.1
LSTAT: 4.03
RM: 7.185
ZN: 0.0
Command line
neural_network_predict --config config_neural_network_predict.yml --input_model_path input_model_predict.h5 --input_dataset_path dataset_predict.csv --output_results_path ref_output_predict.csv
Common config file
"properties": {
"predictions": [
"ZN": 18.0,
"RM": 6.575,
"AGE": 65.2,
"LSTAT": 4.98
"ZN": 0.0,
"RM": 6.421,
"AGE": 78.9,
"LSTAT": 9.14
"ZN": 0.0,
"RM": 7.185,
"AGE": 61.1,
"LSTAT": 4.03
Command line
neural_network_predict --config config_neural_network_predict.json --input_model_path input_model_predict.h5 --input_dataset_path dataset_predict.csv --output_results_path ref_output_predict.csv
Drops columns from a given dataset.
Get help
drop_columns -h
usage: drop_columns [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_dataset_path OUTPUT_DATASET_PATH
Drops columns from a given dataset.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_dataset_path OUTPUT_DATASET_PATH
Path to the output dataset. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_dataset_path (string): Path to the output dataset. File type: output. Sample file. Accepted formats: CSV
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
targets (object): ({}) Independent variables or columns from your dataset you want to drop. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
Command line
drop_columns --config config_drop_columns.yml --input_dataset_path dataset_drop.csv --output_dataset_path ref_output_drop.csv
Common config file
"properties": {
"targets": {
"columns": [
Command line
drop_columns --config config_drop_columns.json --input_dataset_path dataset_drop.csv --output_dataset_path ref_output_drop.csv
Wrapper of the scikit-learn DBSCAN method.
Get help
dbscan -h
usage: dbscan [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_results_path OUTPUT_RESULTS_PATH [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn DBSCAN method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_plot_path OUTPUT_PLOT_PATH
Path to the clustering plot. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_results_path OUTPUT_RESULTS_PATH
Path to the clustered dataset. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Path to the clustered dataset. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the clustering plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
predictors (object): ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
eps (number): (0.5) The maximum distance between two samples for one to be considered as in the neighborhood of the other..
min_samples (integer): (5) The number of samples (or total weight) in a neighborhood for a point to be considered as a core point. This includes the point itself..
metric (string): (euclidean) The metric to use when calculating distance between instances in a feature array. .
plots (array): (None) List of dictionaries with all plots you want to generate. Only 2D or 3D plots accepted. Format: [ { ‘title’: ‘Plot 1’, ‘features’: [’feat1’, ‘feat2’] } ]..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
eps: 1.4
min_samples: 3
- features:
- sepal_length
- sepal_width
title: Plot 1
- features:
- petal_length
- petal_width
title: Plot 2
- features:
- sepal_length
- sepal_width
- petal_length
title: Plot 3
- features:
- petal_length
- petal_width
- sepal_width
title: Plot 4
- features:
- sepal_length
- petal_width
title: Plot 5
- sepal_length
- sepal_width
- petal_length
- petal_width
scale: true
Command line
dbscan --config config_dbscan.yml --input_dataset_path dataset_dbscan.csv --output_results_path ref_output_results_dbscan.csv --output_plot_path ref_output_plot_dbscan.png
Common config file
"properties": {
"predictors": {
"columns": [
"eps": 1.4,
"min_samples": 3,
"plots": [
"title": "Plot 1",
"features": [
"title": "Plot 2",
"features": [
"title": "Plot 3",
"features": [
"title": "Plot 4",
"features": [
"title": "Plot 5",
"features": [
"scale": true
Command line
dbscan --config config_dbscan.json --input_dataset_path dataset_dbscan.csv --output_results_path ref_output_results_dbscan.csv --output_plot_path ref_output_plot_dbscan.png
Wrapper of the scikit-learn KNeighborsClassifier method.
Get help
k_neighbors_coefficient -h
usage: k_neighbors_coefficient [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_results_path OUTPUT_RESULTS_PATH [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the scikit-learn KNeighborsClassifier method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_plot_path OUTPUT_PLOT_PATH
Path to the accuracy plot. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_results_path OUTPUT_RESULTS_PATH
Path to the accuracy values list. Accepted formats: csv.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_results_path (string): Path to the accuracy values list. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Path to the accuracy plot. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
independent_vars (array): (None) Independent variables or columns from your dataset you want to train..
target (string): (None) Dependent variable or column from your dataset you want to predict..
metric (string): (minkowski) The distance metric to use for the tree. .
max_neighbors (integer): (6) Maximum number of neighbors to use by default for kneighbors queries..
random_state_train_test (integer): (5) Controls the shuffling applied to the data before applying the split..
test_size (number): (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
- region
- tenure
- age
- marital
- address
- income
- ed
- employ
- retire
- gender
- reside
max_neighbors: 15
metric: minkowski
scale: true
column: custcat
test_size: 0.2
Command line
k_neighbors_coefficient --config config_k_neighbors_coefficient.yml --input_dataset_path dataset_k_neighbors_coefficient.csv --output_results_path ref_output_test_k_neighbors_coefficient.csv --output_plot_path ref_output_plot_k_neighbors_coefficient.png
Common config file
"properties": {
"independent_vars": {
"columns": [
"target": {
"column": "custcat"
"metric": "minkowski",
"max_neighbors": 15,
"test_size": 0.2,
"scale": true
Command line
k_neighbors_coefficient --config config_k_neighbors_coefficient.json --input_dataset_path dataset_k_neighbors_coefficient.csv --output_results_path ref_output_test_k_neighbors_coefficient.csv --output_plot_path ref_output_plot_k_neighbors_coefficient.png
Wrapper of the TensorFlow Keras Sequential method for classification.
Get help
classification_neural_network -h
usage: classification_neural_network [-h] [--config CONFIG] --input_dataset_path INPUT_DATASET_PATH --output_model_path OUTPUT_MODEL_PATH [--output_test_table_path OUTPUT_TEST_TABLE_PATH] [--output_plot_path OUTPUT_PLOT_PATH]
Wrapper of the TensorFlow Keras Sequential method.
optional arguments:
-h, --help show this help message and exit
--config CONFIG Configuration file
--output_test_table_path OUTPUT_TEST_TABLE_PATH
Path to the test table file. Accepted formats: csv.
--output_plot_path OUTPUT_PLOT_PATH
Loss, accuracy and MSE plots. Accepted formats: png.
required arguments:
--input_dataset_path INPUT_DATASET_PATH
Path to the input dataset. Accepted formats: csv.
--output_model_path OUTPUT_MODEL_PATH
Path to the output model file. Accepted formats: h5.
I / O Arguments
Syntax: input_argument (datatype) : Definition
Config input / output arguments for this building block:
input_dataset_path (string): Path to the input dataset. File type: input. Sample file. Accepted formats: CSV
output_model_path (string): Path to the output model file. File type: output. Sample file. Accepted formats: H5
output_test_table_path (string): Path to the test table file. File type: output. Sample file. Accepted formats: CSV
output_plot_path (string): Loss, accuracy and MSE plots. File type: output. Sample file. Accepted formats: PNG
Syntax: input_parameter (datatype) - (default_value) Definition
Config parameters for this building block:
features (object): ({}) Independent variables or columns from your dataset you want to train. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [”column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked..
target (object): ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked..
weight (object): ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of multiple formats, the first one will be picked..
validation_size (number): (0.2) Represents the proportion of the dataset to include in the validation split. It should be between 0.0 and 1.0..
test_size (number): (0.1) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0..
hidden_layers (array): (None) List of dictionaries with hidden layers values. Format: [ { ‘size’: 50, ‘activation’: ‘relu’ } ]..
output_layer_activation (string): (softmax) Activation function to use in the output layer. .
optimizer (string): (Adam) Name of optimizer instance. .
learning_rate (number): (0.02) Determines the step size at each iteration while moving toward a minimum of a loss function.
batch_size (integer): (100) Number of samples per gradient update..
max_epochs (integer): (100) Number of epochs to train the model. As the early stopping is enabled, this is a maximum..
normalize_cm (boolean): (False) Whether or not to normalize the confusion matrix..
random_state (integer): (5) Controls the shuffling applied to the data before applying the split. ..
scale (boolean): (False) Whether or not to scale the input dataset..
remove_tmp (boolean): (True) Remove temporal files..
restart (boolean): (False) Do not execute if output files exist..
Common config file
batch_size: 100
- mean radius
- mean texture
- mean perimeter
- mean area
- mean smoothness
- mean compactness
- mean concavity
- mean concave points
- mean symmetry
- mean fractal dimension
- radius error
- texture error
- perimeter error
- area error
- smoothness error
- compactness error
- concavity error
- concave points error
- symmetry error
- fractal dimension error
- worst radius
- worst texture
- worst perimeter
- worst area
- worst smoothness
- worst compactness
- worst concavity
- worst concave points
- worst symmetry
- worst fractal dimension
- activation: relu
size: 50
- activation: relu
size: 50
learning_rate: 0.02
max_epochs: 100
optimizer: Adam
output_layer_activation: softmax
scale: true
column: benign
test_size: 0.1
validation_size: 0.2
Command line
classification_neural_network --config config_classification_neural_network.yml --input_dataset_path dataset_classification.csv --output_model_path ref_output_model_classification.h5 --output_test_table_path ref_output_test_classification.csv --output_plot_path ref_output_plot_classification.png
Common config file
"properties": {
"features": {
"columns": [
"mean radius",
"mean texture",
"mean perimeter",
"mean area",
"mean smoothness",
"mean compactness",
"mean concavity",
"mean concave points",
"mean symmetry",
"mean fractal dimension",
"radius error",
"texture error",
"perimeter error",
"area error",
"smoothness error",
"compactness error",
"concavity error",
"concave points error",
"symmetry error",
"fractal dimension error",
"worst radius",
"worst texture",
"worst perimeter",
"worst area",
"worst smoothness",
"worst compactness",
"worst concavity",
"worst concave points",
"worst symmetry",
"worst fractal dimension"
"target": {
"column": "benign"
"validation_size": 0.2,
"test_size": 0.1,
"hidden_layers": [
"size": 50,
"activation": "relu"
"size": 50,
"activation": "relu"
"output_layer_activation": "softmax",
"optimizer": "Adam",
"learning_rate": 0.02,
"batch_size": 100,
"max_epochs": 100,
"scale": true
Command line
classification_neural_network --config config_classification_neural_network.json --input_dataset_path dataset_classification.csv --output_model_path ref_output_model_classification.h5 --output_test_table_path ref_output_test_classification.csv --output_plot_path ref_output_plot_classification.png