classification package

Submodules

classification.decision_tree module

Module containing the DecisionTree class and the command line interface.

class classification.decision_tree.DecisionTree(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml DecisionTree
Wrapper of the scikit-learn DecisionTreeClassifier method.
Trains and tests a given dataset and saves the model and scaler. Visit the DecisionTreeClassifier documentation page in the sklearn official website for further information.

Parameters:

input_dataset_path (str) – Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
- independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- criterion (string) - (“gini”) The function to measure the quality of a split. Values: gini (for the Gini impurity), entropy (for the information gain).
- max_depth (int) - (4) [1~100|1] The maximum depth of the model. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
- normalize_cm (bool) - (False) Whether or not to normalize the confusion matrix.
- random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
- random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
- test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
- scale (bool) - (False) Whether or not to scale the input dataset.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.classification.decision_tree import decision_tree
prop = {
    'independent_vars': {
        'columns': [ 'column1', 'column2', 'column3' ]
    },
    'target': {
        'column': 'target'
    },
    'criterion': 'entropy',
    'test_size': 0.2
}
decision_tree(input_dataset_path='/path/to/myDataset.csv',
                output_model_path='/path/to/newModel.pkl',
                output_test_table_path='/path/to/newTable.csv',
                output_plot_path='/path/to/newPlot.png',
                properties=prop)

Info:

wrapped_software:
- name: scikit-learn DecisionTreeClassifier
- version: >=0.24.2
- license: BSD 3-Clause
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the DecisionTree classification.decision_tree.DecisionTree object.

classification.decision_tree.decision_tree(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) → int[source]: Execute the DecisionTree class and execute the launch() method.

classification.decision_tree.main()[source]: Command line execution of this building block. Please check the command line documentation.

classification.k_neighbors_coefficient module

Module containing the KNeighborsCoefficient class and the command line interface.

class classification.k_neighbors_coefficient.KNeighborsCoefficient(input_dataset_path, output_results_path, output_plot_path=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml KNeighborsCoefficient
Wrapper of the scikit-learn KNeighborsClassifier method.
Trains and tests a given dataset and calculates the best K coefficient. Visit the KNeighborsClassifier documentation page in the sklearn official website for further information.

Parameters:

input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_results_path (str) –
Path to the accuracy values list. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the accuracy plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
- independent_vars (list) - (None) Independent variables or columns from your dataset you want to train.
- target (string) - (None) Dependent variable or column from your dataset you want to predict.
- metric (string) - (“minkowski”) The distance metric to use for the tree. Values: euclidean (Computes the Euclidean distance between two 1-D arrays), manhattan (Compute the Manhattan distance), chebyshev (Compute the Chebyshev distance), minkowski (Compute the Minkowski distance between two 1-D arrays), wminkowski (Compute the weighted Minkowski distance between two 1-D arrays), seuclidean (Return the standardized Euclidean distance between two 1-D arrays), mahalanobi (Compute the Mahalanobis distance between two 1-D arrays).
- max_neighbors (int) - (6) [1~100|1] Maximum number of neighbors to use by default for kneighbors queries.
- random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
- test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
- scale (bool) - (False) Whether or not to scale the input dataset.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.classification.k_neighbors_coefficient import k_neighbors_coefficient
prop = {
    'independent_vars': {
        'columns': [ 'column1', 'column2', 'column3' ]
    },
    'target': {
        'column': 'target'
    },
    'max_neighbors': 6,
    'test_size': 0.2
}
k_neighbors_coefficient(input_dataset_path='/path/to/myDataset.csv',
                        output_results_path='/path/to/newTable.csv',
                        output_plot_path='/path/to/newPlot.png',
                        properties=prop)

Info:

wrapped_software:
- name: scikit-learn KNeighborsClassifier
- version: >=0.24.2
- license: BSD 3-Clause
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the KNeighborsCoefficient classification.k_neighbors_coefficient.KNeighborsCoefficient object.

classification.k_neighbors_coefficient.k_neighbors_coefficient(input_dataset_path: str, output_results_path: str, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) → int[source]: Execute the KNeighborsCoefficient class and execute the launch() method.

classification.k_neighbors_coefficient.main()[source]: Command line execution of this building block. Please check the command line documentation.

classification.k_neighbors module

Module containing the KNeighborsTrain class and the command line interface.

class classification.k_neighbors.KNeighborsTrain(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml KNeighborsTrain
Wrapper of the scikit-learn KNeighborsClassifier method.
Trains and tests a given dataset and saves the model and scaler. Visit the KNeighborsClassifier documentation page in the sklearn official website for further information.

Parameters:

input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
- independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- metric (string) - (“minkowski”) The distance metric to use for the tree. Values: euclidean (Computes the Euclidean distance between two 1-D arrays), manhattan (Compute the Manhattan distance), chebyshev (Compute the Chebyshev distance), minkowski (Compute the Minkowski distance between two 1-D arrays), wminkowski (Compute the weighted Minkowski distance between two 1-D arrays), seuclidean (Return the standardized Euclidean distance between two 1-D arrays), mahalanobi (Compute the Mahalanobis distance between two 1-D arrays).
- n_neighbors (int) - (6) [1~100|1] Number of neighbors to use by default for kneighbors queries.
- normalize_cm (bool) - (False) Whether or not to normalize the confusion matrix.
- random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
- test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
- scale (bool) - (False) Whether or not to scale the input dataset.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.classification.k_neighbors import k_neighbors
prop = {
    'independent_vars': {
        'columns': [ 'column1', 'column2', 'column3' ]
    },
    'target': {
        'column': 'target'
    },
    'n_neighbors': 6,
    'test_size': 0.2
}
k_neighbors(input_dataset_path='/path/to/myDataset.csv',
            output_model_path='/path/to/newModel.pkl',
            output_test_table_path='/path/to/newTable.csv',
            output_plot_path='/path/to/newPlot.png',
            properties=prop)

Info:

wrapped_software:
- name: scikit-learn KNeighborsClassifier
- version: >=0.24.2
- license: BSD 3-Clause
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the KNeighborsTrain classification.k_neighbors.KNeighborsTrain object.

classification.k_neighbors.k_neighbors(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) → int[source]: Execute the KNeighborsTrain class and execute the launch() method.

classification.k_neighbors.main()[source]: Command line execution of this building block. Please check the command line documentation.

classification.logistic_regression module

Module containing the LogisticRegression class and the command line interface.

class classification.logistic_regression.LogisticRegression(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml LogisticRegression
Wrapper of the scikit-learn LogisticRegression method.
Trains and tests a given dataset and saves the model and scaler. Visit the LogisticRegression documentation page in the sklearn official website for further information.

Parameters:

input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
- independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- solver (string) - (“liblinear”) Numerical optimizer to find parameters. Values: newton-cg (Recall the motivation for gradient descent step at x: we minimize the quadratic function), lbfgs (It’s analogue of the Newton’s Method but here the Hessian matrix is approximated using updates specified by gradient evaluations), liblinear (It’s a linear classification that supports logistic regression and linear support vector machines), sag (SAG method optimizes the sum of a finite number of smooth convex functions), saga (It’s a variant of SAG that also supports the non-smooth penalty=l1 option).
- c_parameter (float) - (0.01) [0~100|0.01] Inverse of regularization strength; must be a positive float. Smaller values specify stronger regularization.
- normalize_cm (bool) - (False) Whether or not to normalize the confusion matrix.
- random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
- random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
- test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
- scale (bool) - (False) Whether or not to scale the input dataset.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.classification.logistic_regression import logistic_regression
prop = {
    'independent_vars': {
        'columns': [ 'column1', 'column2', 'column3' ]
    },
    'target': {
        'column': 'target'
    },
    'solver': 'liblinear',
    'c_parameter': 0.01,
    'test_size': 0.2
}
logistic_regression(input_dataset_path='/path/to/myDataset.csv',
                    output_model_path='/path/to/newModel.pkl',
                    output_test_table_path='/path/to/newTable.csv',
                    output_plot_path='/path/to/newPlot.png',
                    properties=prop)

Info:

wrapped_software:
- name: scikit-learn LogisticRegression
- version: >=0.24.2
- license: BSD 3-Clause
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the LogisticRegression classification.logistic_regression.LogisticRegression object.

classification.logistic_regression.logistic_regression(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) → int[source]: Execute the LogisticRegression class and execute the launch() method.

classification.logistic_regression.main()[source]: Command line execution of this building block. Please check the command line documentation.

classification.random_forest_classifier module

Module containing the RandomForestClassifier class and the command line interface.

class classification.random_forest_classifier.RandomForestClassifier(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml RandomForestClassifier
Wrapper of the scikit-learn RandomForestClassifier method.
Trains and tests a given dataset and saves the model and scaler. Visit the RandomForestClassifier documentation page in the sklearn official website for further information.

Parameters:

input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
- independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- n_estimators (int) - (100) The number of trees in the forest.
- bootstrap (bool) - (True) Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
- normalize_cm (bool) - (False) Whether or not to normalize the confusion matrix.
- random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
- random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
- test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
- scale (bool) - (False) Whether or not to scale the input dataset.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.classification.random_forest_classifier import random_forest_classifier
prop = {
    'independent_vars': {
        'columns': [ 'column1', 'column2', 'column3' ]
    },
    'target': {
        'column': 'target'
    },
    'n_estimators': 100,
    'test_size': 0.2
}
random_forest_classifier(input_dataset_path='/path/to/myDataset.csv',
                        output_model_path='/path/to/newModel.pkl',
                        output_test_table_path='/path/to/newTable.csv',
                        output_plot_path='/path/to/newPlot.png',
                        properties=prop)

Info:

wrapped_software:
- name: scikit-learn RandomForestClassifier
- version: >=0.24.2
- license: BSD 3-Clause
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the RandomForestClassifier classification.random_forest_classifier.RandomForestClassifier object.

classification.random_forest_classifier.main()[source]: Command line execution of this building block. Please check the command line documentation.

classification.random_forest_classifier.random_forest_classifier(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) → int[source]: Execute the RandomForestClassifier class and execute the launch() method.

classification.support_vector_machine module

Module containing the SupportVectorMachine class and the command line interface.

class classification.support_vector_machine.SupportVectorMachine(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml SupportVectorMachine
Wrapper of the scikit-learn SupportVectorMachine method.
Trains and tests a given dataset and saves the model and scaler. Visit the SupportVectorMachine documentation page in the sklearn official website for further information.

Parameters:

input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
- independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- kernel (string) - (“rbf”) Specifies the kernel type to be used in the algorithm. Values: linear (It’s used when the data is Linearly separable; that is; it can be separated using a single Line), poly (Represents the similarity of vectors -training samples- in a feature space over polynomials of the original variables; allowing learning of non-linear models), rbf (It’s a function whose value depends on the distance from the origin or from some point), sigmoid (In Neural Networks field the bipolar sigmoid function is often used as an activation function for artificial neurons), precomputed (Precomputed kernel).
- normalize_cm (bool) - (False) Whether or not to normalize the confusion matrix.
- random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
- random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
- test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
- scale (bool) - (False) Whether or not to scale the input dataset.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.classification.support_vector_machine import support_vector_machine
prop = {
    'independent_vars': {
        'columns': [ 'column1', 'column2', 'column3' ]
    },
    'target': {
        'column': 'target'
    },
    'kernel': 'rbf',
    'test_size': 0.2
}
support_vector_machine(input_dataset_path='/path/to/myDataset.csv',
                        output_model_path='/path/to/newModel.pkl',
                        output_test_table_path='/path/to/newTable.csv',
                        output_plot_path='/path/to/newPlot.png',
                        properties=prop)

Info:

wrapped_software:
- name: scikit-learn SupportVectorMachine
- version: >=0.24.2
- license: BSD 3-Clause
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the SupportVectorMachine classification.support_vector_machine.SupportVectorMachine object.

classification.support_vector_machine.main()[source]: Command line execution of this building block. Please check the command line documentation.

classification.support_vector_machine.support_vector_machine(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) → int[source]: Execute the SupportVectorMachine class and execute the launch() method.

classification.classification_predict module

Module containing the ClassificationPredict class and the command line interface.

class classification.classification_predict.ClassificationPredict(input_model_path, output_results_path, input_dataset_path=None, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml ClassificationPredict
Makes predictions from an input dataset and a given classification model.
Makes predictions from an input dataset (provided either as a file or as a dictionary property) and a given classification model trained with DecisionTreeClassifier, KNeighborsClassifier, LogisticRegression, RandomForestClassifier, Support Vector Machine methods.

Parameters:

input_model_path (str) –
Path to the input model. File type: input. Sample file. Accepted formats: pkl (edam:format_3653).
input_dataset_path (str) (Optional) –
Path to the dataset to predict. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_results_path (str) –
Path to the output results file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
- predictions (list) - (None) List of dictionaries with all values you want to predict targets. It will be taken into account only in case input_dataset_path is not provided. Format: [{ ‘var1’: 1.0, ‘var2’: 2.0 }, { ‘var1’: 4.0, ‘var2’: 2.7 }] for datasets with headers and [[ 1.0, 2.0 ], [ 4.0, 2.7 ]] for datasets without headers.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.classification.classification_predict import classification_predict
prop = {
    'predictions': [
        {
            'var1': 1.0,
            'var2': 2.0
        },
        {
            'var1': 4.0,
            'var2': 2.7
        }
    ]
}
classification_predict(input_model_path='/path/to/myModel.pkl',
                        output_results_path='/path/to/newPredictedResults.csv',
                        input_dataset_path='/path/to/myDataset.csv',
                        properties=prop)

Info:

wrapped_software:
- name: scikit-learn
- version: >=0.24.2
- license: BSD 3-Clause
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the ClassificationPredict classification.classification_predict.ClassificationPredict object.

classification.classification_predict.classification_predict(input_model_path: str, output_results_path: str, input_dataset_path: str | None = None, properties: dict | None = None, **kwargs) → int[source]: Execute the ClassificationPredict class and execute the launch() method.

classification.classification_predict.main()[source]: Command line execution of this building block. Please check the command line documentation.