regression package

Submodules

regression.linear_regression module

Module containing the LinearRegression class and the command line interface.

class regression.linear_regression.LinearRegression(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml LinearRegression
Wrapper of the scikit-learn LinearRegression method.
Trains and tests a given dataset and saves the model and scaler. Visit the LinearRegression documentation page in the sklearn official website for further information.
Parameters:
  • input_dataset_path (str) – Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).

  • output_model_path (str) –

    Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).

  • output_test_table_path (str) (Optional) –

    Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).

  • output_plot_path (str) (Optional) –

    Residual plot checks the error between actual values and predicted values. File type: output. Sample file. Accepted formats: png (edam:format_3603).

  • properties (dic - Python dictionary object containing the tool parameters, not input/output files) –

    • independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.

    • target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.

    • weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.

    • random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.

    • test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.

    • scale (bool) - (False) Whether or not to scale the input dataset.

    • remove_tmp (bool) - (True) [WF property] Remove temporal files.

    • restart (bool) - (False) [WF property] Do not execute if output files exist.

    • sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.regression.linear_regression import linear_regression
prop = {
    'independent_vars': {
        'columns': [ 'column1', 'column2', 'column3' ]
    },
    'target': {
        'column': 'target'
    },
    'test_size': 0.2
}
linear_regression(input_dataset_path='/path/to/myDataset.csv',
                output_model_path='/path/to/newModel.pkl',
                output_test_table_path='/path/to/newTable.csv',
                output_plot_path='/path/to/newPlot.png',
                properties=prop)
Info:
check_data_params(out_log, err_log)[source]

Checks all the input/output paths and parameters

launch() int[source]

Execute the LinearRegression regression.linear_regression.LinearRegression object.

regression.linear_regression.linear_regression(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int[source]

Execute the LinearRegression class and execute the launch() method.

regression.linear_regression.main()[source]

Command line execution of this building block. Please check the command line documentation.

regression.polynomial_regression module

Module containing the PolynomialRegression class and the command line interface.

class regression.polynomial_regression.PolynomialRegression(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml PolynomialRegression
Wrapper of the scikit-learn LinearRegression method with PolynomialFeatures.
Trains and tests a given dataset and saves the model and scaler. Visit the LinearRegression documentation page in the sklearn official website for further information.
Parameters:
  • input_dataset_path (str) –

    Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).

  • output_model_path (str) –

    Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).

  • output_test_table_path (str) (Optional) –

    Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).

  • output_plot_path (str) (Optional) –

    Residual plot checks the error between actual values and predicted values. File type: output. Sample file. Accepted formats: png (edam:format_3603).

  • properties (dic - Python dictionary object containing the tool parameters, not input/output files) –

    • independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.

    • target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.

    • weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.

    • random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.

    • degree (int) - (2) [1~100|1] Polynomial degree.

    • test_size (float) - (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.

    • scale (bool) - (False) Whether or not to scale the input dataset.

    • remove_tmp (bool) - (True) [WF property] Remove temporal files.

    • restart (bool) - (False) [WF property] Do not execute if output files exist.

    • sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.regression.polynomial_regression import polynomial_regression
prop = {
    'independent_vars': {
        'columns': [ 'column1', 'column2', 'column3' ]
    },
    'target': {
        'column': 'target'
    },
    'degree': 2,
    'test_size': 0.2
}
polynomial_regression(input_dataset_path='/path/to/myDataset.csv',
                    output_model_path='/path/to/newModel.pkl',
                    output_test_table_path='/path/to/newTable.csv',
                    output_plot_path='/path/to/newPlot.png',
                    properties=prop)
Info:
check_data_params(out_log, err_log)[source]

Checks all the input/output paths and parameters

launch() int[source]

Execute the PolynomialRegression regression.polynomial_regression.PolynomialRegression object.

regression.polynomial_regression.main()[source]

Command line execution of this building block. Please check the command line documentation.

regression.polynomial_regression.polynomial_regression(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int[source]

Execute the PolynomialRegression class and execute the launch() method.

regression.random_forest_regressor module

Module containing the RandomForestRegressor class and the command line interface.

class regression.random_forest_regressor.RandomForestRegressor(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml RandomForestRegressor
Wrapper of the scikit-learn RandomForestRegressor method.
Trains and tests a given dataset and saves the model and scaler. Visit the RandomForestRegressor documentation page.
Parameters:
  • input_dataset_path (str) –

    Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).

  • output_model_path (str) –

    Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).

  • output_test_table_path (str) (Optional) –

    Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).

  • output_plot_path (str) (Optional) –

    Residual plot checks the error between actual values and predicted values. File type: output. Sample file. Accepted formats: png (edam:format_3603).

  • properties (dic - Python dictionary object containing the tool parameters, not input/output files) –

    • independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.

    • target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.

    • weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.

    • n_estimators (int) - (10) The number of trees in the forest.

    • max_depth (int) - (None) The maximum depth of the tree.

    • random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.

    • random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.

    • test_size (float) - (0.2) Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.

    • scale (bool) - (False) Whether or not to scale the input dataset.

    • remove_tmp (bool) - (True) [WF property] Remove temporal files.

    • restart (bool) - (False) [WF property] Do not execute if output files exist.

    • sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.regression.random_forest_regressor import random_forest_regressor
prop = {
    'independent_vars': {
        'columns': [ 'column1', 'column2', 'column3' ]
    },
    'target': {
        'column': 'target'
    },
    'n_estimators': 10,
    'max_depth': 5,
    'test_size': 0.2
}
random_forest_regressor(input_dataset_path='/path/to/myDataset.csv',
                        output_model_path='/path/to/newModel.pkl',
                        output_test_table_path='/path/to/newTable.csv',
                        output_plot_path='/path/to/newPlot.png',
                        properties=prop)
Info:
  • wrapped_software:
    • name: scikit-learn RandomForestRegressor

    • version: >0.24.2

    • license: BSD 3-Clause

  • ontology:
check_data_params(out_log, err_log)[source]

Checks all the input/output paths and parameters

launch() int[source]

Execute the RandomForestRegressor regression.random_forest_regressor.RandomForestRegressor object.

regression.random_forest_regressor.main()[source]

Command line execution of this building block. Please check the command line documentation.

regression.random_forest_regressor.random_forest_regressor(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int[source]

Execute the RandomForestRegressor class and execute the launch() method.

regression.regression_predict module

Module containing the RegressionPredict class and the command line interface.

class regression.regression_predict.RegressionPredict(input_model_path, output_results_path, input_dataset_path=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml RegressionPredict
Makes predictions from an input dataset and a given regression model.
Makes predictions from an input dataset (provided either as a file or as a dictionary property) and a given regression model trained with LinearRegression, RandomForestRegressor methods.
Parameters:
  • input_model_path (str) –

    Path to the input model. File type: input. Sample file. Accepted formats: pkl (edam:format_3653).

  • input_dataset_path (str) (Optional) –

    Path to the dataset to predict. File type: input. Sample file. Accepted formats: csv (edam:format_3752).

  • output_results_path (str) –

    Path to the output results file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).

  • properties (dic - Python dictionary object containing the tool parameters, not input/output files) –

    • predictions (list) - (None) List of dictionaries with all values you want to predict targets. It will be taken into account only in case input_dataset_path is not provided. Format: [{ ‘var1’: 1.0, ‘var2’: 2.0 }, { ‘var1’: 4.0, ‘var2’: 2.7 }] for datasets with headers and [[ 1.0, 2.0 ], [ 4.0, 2.7 ]] for datasets without headers.

    • remove_tmp (bool) - (True) [WF property] Remove temporal files.

    • restart (bool) - (False) [WF property] Do not execute if output files exist.

    • sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.regression.regression_predict import regression_predict
prop = {
    'predictions': [
        {
            'var1': 1.0,
            'var2': 2.0
        },
        {
            'var1': 4.0,
            'var2': 2.7
        }
    ]
}
regression_predict(input_model_path='/path/to/myModel.pkl',
                    output_results_path='/path/to/newPredictedResults.csv',
                    input_dataset_path='/path/to/myDataset.csv',
                    properties=prop)
Info:
check_data_params(out_log, err_log)[source]

Checks all the input/output paths and parameters

launch() int[source]

Execute the RegressionPredict regression.regression_predict.RegressionPredict object.

regression.regression_predict.main()[source]

Command line execution of this building block. Please check the command line documentation.

regression.regression_predict.regression_predict(input_model_path: str, output_results_path: str, input_dataset_path: str | None = None, properties: dict | None = None, **kwargs) int[source]

Execute the RegressionPredict class and execute the launch() method.