regression package
Submodules
regression.linear_regression module
Module containing the LinearRegression class and the command line interface.
- class regression.linear_regression.LinearRegression(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml LinearRegressionWrapper of the scikit-learn LinearRegression method.Trains and tests a given dataset and saves the model and scaler. Visit the LinearRegression documentation page in the sklearn official website for further information.- Parameters:
input_dataset_path (str) – Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Residual plot checks the error between actual values and predicted values. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.regression.linear_regression import linear_regression prop = { 'independent_vars': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'test_size': 0.2 } linear_regression(input_dataset_path='/path/to/myDataset.csv', output_model_path='/path/to/newModel.pkl', output_test_table_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn LinearRegression
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
LinearRegression
regression.linear_regression.LinearRegression object.
- regression.linear_regression.linear_regression(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
LinearRegression
class and execute thelaunch()
method.
regression.polynomial_regression module
Module containing the PolynomialRegression class and the command line interface.
- class regression.polynomial_regression.PolynomialRegression(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml PolynomialRegressionWrapper of the scikit-learn LinearRegression method with PolynomialFeatures.Trains and tests a given dataset and saves the model and scaler. Visit the LinearRegression documentation page in the sklearn official website for further information.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Residual plot checks the error between actual values and predicted values. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
degree (int) - (2) [1~100|1] Polynomial degree.
test_size (float) - (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.regression.polynomial_regression import polynomial_regression prop = { 'independent_vars': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'degree': 2, 'test_size': 0.2 } polynomial_regression(input_dataset_path='/path/to/myDataset.csv', output_model_path='/path/to/newModel.pkl', output_test_table_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn LinearRegression
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
PolynomialRegression
regression.polynomial_regression.PolynomialRegression object.
- regression.polynomial_regression.main()[source]
Command line execution of this building block. Please check the command line documentation.
- regression.polynomial_regression.polynomial_regression(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
PolynomialRegression
class and execute thelaunch()
method.
regression.random_forest_regressor module
Module containing the RandomForestRegressor class and the command line interface.
- class regression.random_forest_regressor.RandomForestRegressor(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml RandomForestRegressorWrapper of the scikit-learn RandomForestRegressor method.Trains and tests a given dataset and saves the model and scaler. Visit the RandomForestRegressor documentation page.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Residual plot checks the error between actual values and predicted values. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
n_estimators (int) - (10) The number of trees in the forest.
max_depth (int) - (None) The maximum depth of the tree.
random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
test_size (float) - (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.regression.random_forest_regressor import random_forest_regressor prop = { 'independent_vars': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'n_estimators': 10, 'max_depth': 5, 'test_size': 0.2 } random_forest_regressor(input_dataset_path='/path/to/myDataset.csv', output_model_path='/path/to/newModel.pkl', output_test_table_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn RandomForestRegressor
version: >0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
RandomForestRegressor
regression.random_forest_regressor.RandomForestRegressor object.
- regression.random_forest_regressor.main()[source]
Command line execution of this building block. Please check the command line documentation.
- regression.random_forest_regressor.random_forest_regressor(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
RandomForestRegressor
class and execute thelaunch()
method.
regression.regression_predict module
Module containing the RegressionPredict class and the command line interface.
- class regression.regression_predict.RegressionPredict(input_model_path, output_results_path, input_dataset_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml RegressionPredictMakes predictions from an input dataset and a given regression model.Makes predictions from an input dataset (provided either as a file or as a dictionary property) and a given regression model trained with LinearRegression, RandomForestRegressor methods.- Parameters:
input_model_path (str) –
Path to the input model. File type: input. Sample file. Accepted formats: pkl (edam:format_3653).
input_dataset_path (str) (Optional) –
Path to the dataset to predict. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_results_path (str) –
Path to the output results file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
predictions (list) - (None) List of dictionaries with all values you want to predict targets. It will be taken into account only in case input_dataset_path is not provided. Format: [{ ‘var1’: 1.0, ‘var2’: 2.0 }, { ‘var1’: 4.0, ‘var2’: 2.7 }] for datasets with headers and [[ 1.0, 2.0 ], [ 4.0, 2.7 ]] for datasets without headers.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.regression.regression_predict import regression_predict prop = { 'predictions': [ { 'var1': 1.0, 'var2': 2.0 }, { 'var1': 4.0, 'var2': 2.7 } ] } regression_predict(input_model_path='/path/to/myModel.pkl', output_results_path='/path/to/newPredictedResults.csv', input_dataset_path='/path/to/myDataset.csv', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
RegressionPredict
regression.regression_predict.RegressionPredict object.
- regression.regression_predict.main()[source]
Command line execution of this building block. Please check the command line documentation.
- regression.regression_predict.regression_predict(input_model_path: str, output_results_path: str, input_dataset_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
RegressionPredict
class and execute thelaunch()
method.