classification package
Submodules
classification.decision_tree module
Module containing the DecisionTree class and the command line interface.
- class classification.decision_tree.DecisionTree(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml DecisionTreeWrapper of the scikit-learn DecisionTreeClassifier method.Trains and tests a given dataset and saves the model and scaler. Visit the DecisionTreeClassifier documentation page in the sklearn official website for further information.- Parameters:
input_dataset_path (str) – Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
criterion (string) - (“gini”) The function to measure the quality of a split. Values: gini (for the Gini impurity), entropy (for the information gain).
max_depth (int) - (4) [1~100|1] The maximum depth of the model. If None, then nodes are expanded until all leaves are pure or until all leaves contain less than min_samples_split samples.
normalize_cm (bool) - (False) Whether or not to normalize the confusion matrix.
random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.classification.decision_tree import decision_tree prop = { 'independent_vars': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'criterion': 'entropy', 'test_size': 0.2 } decision_tree(input_dataset_path='/path/to/myDataset.csv', output_model_path='/path/to/newModel.pkl', output_test_table_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn DecisionTreeClassifier
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
DecisionTree
classification.decision_tree.DecisionTree object.
- classification.decision_tree.decision_tree(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
DecisionTree
class and execute thelaunch()
method.
classification.k_neighbors_coefficient module
Module containing the KNeighborsCoefficient class and the command line interface.
- class classification.k_neighbors_coefficient.KNeighborsCoefficient(input_dataset_path, output_results_path, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml KNeighborsCoefficientWrapper of the scikit-learn KNeighborsClassifier method.Trains and tests a given dataset and calculates the best K coefficient. Visit the KNeighborsClassifier documentation page in the sklearn official website for further information.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_results_path (str) –
Path to the accuracy values list. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the accuracy plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
independent_vars (list) - (None) Independent variables or columns from your dataset you want to train.
target (string) - (None) Dependent variable or column from your dataset you want to predict.
metric (string) - (“minkowski”) The distance metric to use for the tree. Values: euclidean (Computes the Euclidean distance between two 1-D arrays), manhattan (Compute the Manhattan distance), chebyshev (Compute the Chebyshev distance), minkowski (Compute the Minkowski distance between two 1-D arrays), wminkowski (Compute the weighted Minkowski distance between two 1-D arrays), seuclidean (Return the standardized Euclidean distance between two 1-D arrays), mahalanobi (Compute the Mahalanobis distance between two 1-D arrays).
max_neighbors (int) - (6) [1~100|1] Maximum number of neighbors to use by default for kneighbors queries.
random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.classification.k_neighbors_coefficient import k_neighbors_coefficient prop = { 'independent_vars': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'max_neighbors': 6, 'test_size': 0.2 } k_neighbors_coefficient(input_dataset_path='/path/to/myDataset.csv', output_results_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn KNeighborsClassifier
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
KNeighborsCoefficient
classification.k_neighbors_coefficient.KNeighborsCoefficient object.
- classification.k_neighbors_coefficient.k_neighbors_coefficient(input_dataset_path: str, output_results_path: str, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
KNeighborsCoefficient
class and execute thelaunch()
method.
classification.k_neighbors module
Module containing the KNeighborsTrain class and the command line interface.
- class classification.k_neighbors.KNeighborsTrain(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml KNeighborsTrainWrapper of the scikit-learn KNeighborsClassifier method.Trains and tests a given dataset and saves the model and scaler. Visit the KNeighborsClassifier documentation page in the sklearn official website for further information.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
metric (string) - (“minkowski”) The distance metric to use for the tree. Values: euclidean (Computes the Euclidean distance between two 1-D arrays), manhattan (Compute the Manhattan distance), chebyshev (Compute the Chebyshev distance), minkowski (Compute the Minkowski distance between two 1-D arrays), wminkowski (Compute the weighted Minkowski distance between two 1-D arrays), seuclidean (Return the standardized Euclidean distance between two 1-D arrays), mahalanobi (Compute the Mahalanobis distance between two 1-D arrays).
n_neighbors (int) - (6) [1~100|1] Number of neighbors to use by default for kneighbors queries.
normalize_cm (bool) - (False) Whether or not to normalize the confusion matrix.
random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.classification.k_neighbors import k_neighbors prop = { 'independent_vars': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'n_neighbors': 6, 'test_size': 0.2 } k_neighbors(input_dataset_path='/path/to/myDataset.csv', output_model_path='/path/to/newModel.pkl', output_test_table_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn KNeighborsClassifier
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
KNeighborsTrain
classification.k_neighbors.KNeighborsTrain object.
- classification.k_neighbors.k_neighbors(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
KNeighborsTrain
class and execute thelaunch()
method.
classification.logistic_regression module
Module containing the LogisticRegression class and the command line interface.
- class classification.logistic_regression.LogisticRegression(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml LogisticRegressionWrapper of the scikit-learn LogisticRegression method.Trains and tests a given dataset and saves the model and scaler. Visit the LogisticRegression documentation page in the sklearn official website for further information.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
solver (string) - (“liblinear”) Numerical optimizer to find parameters. Values: newton-cg (Recall the motivation for gradient descent step at x: we minimize the quadratic function), lbfgs (It’s analogue of the Newton’s Method but here the Hessian matrix is approximated using updates specified by gradient evaluations), liblinear (It’s a linear classification that supports logistic regression and linear support vector machines), sag (SAG method optimizes the sum of a finite number of smooth convex functions), saga (It’s a variant of SAG that also supports the non-smooth penalty=l1 option).
c_parameter (float) - (0.01) [0~100|0.01] Inverse of regularization strength; must be a positive float. Smaller values specify stronger regularization.
normalize_cm (bool) - (False) Whether or not to normalize the confusion matrix.
random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.classification.logistic_regression import logistic_regression prop = { 'independent_vars': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'solver': 'liblinear', 'c_parameter': 0.01, 'test_size': 0.2 } logistic_regression(input_dataset_path='/path/to/myDataset.csv', output_model_path='/path/to/newModel.pkl', output_test_table_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn LogisticRegression
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
LogisticRegression
classification.logistic_regression.LogisticRegression object.
- classification.logistic_regression.logistic_regression(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
LogisticRegression
class and execute thelaunch()
method.
classification.random_forest_classifier module
Module containing the RandomForestClassifier class and the command line interface.
- class classification.random_forest_classifier.RandomForestClassifier(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml RandomForestClassifierWrapper of the scikit-learn RandomForestClassifier method.Trains and tests a given dataset and saves the model and scaler. Visit the RandomForestClassifier documentation page in the sklearn official website for further information.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
n_estimators (int) - (100) The number of trees in the forest.
bootstrap (bool) - (True) Whether bootstrap samples are used when building trees. If False, the whole dataset is used to build each tree.
normalize_cm (bool) - (False) Whether or not to normalize the confusion matrix.
random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.classification.random_forest_classifier import random_forest_classifier prop = { 'independent_vars': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'n_estimators': 100, 'test_size': 0.2 } random_forest_classifier(input_dataset_path='/path/to/myDataset.csv', output_model_path='/path/to/newModel.pkl', output_test_table_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn RandomForestClassifier
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
RandomForestClassifier
classification.random_forest_classifier.RandomForestClassifier object.
- classification.random_forest_classifier.main()[source]
Command line execution of this building block. Please check the command line documentation.
- classification.random_forest_classifier.random_forest_classifier(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
RandomForestClassifier
class and execute thelaunch()
method.
classification.support_vector_machine module
Module containing the SupportVectorMachine class and the command line interface.
- class classification.support_vector_machine.SupportVectorMachine(input_dataset_path, output_model_path, output_test_table_path=None, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml SupportVectorMachineWrapper of the scikit-learn SupportVectorMachine method.Trains and tests a given dataset and saves the model and scaler. Visit the SupportVectorMachine documentation page in the sklearn official website for further information.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_model_path (str) –
Path to the output model file. File type: output. Sample file. Accepted formats: pkl (edam:format_3653).
output_test_table_path (str) (Optional) –
Path to the test table file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the statistics plot. If target is binary it shows confusion matrix, distributions of the predicted probabilities of both classes and ROC curve. If target is non-binary it shows confusion matrix. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
independent_vars (dict) - ({}) Independent variables you want to train from your dataset. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
weight (dict) - ({}) Weight variable from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
kernel (string) - (“rbf”) Specifies the kernel type to be used in the algorithm. Values: linear (It’s used when the data is Linearly separable; that is; it can be separated using a single Line), poly (Represents the similarity of vectors -training samples- in a feature space over polynomials of the original variables; allowing learning of non-linear models), rbf (It’s a function whose value depends on the distance from the origin or from some point), sigmoid (In Neural Networks field the bipolar sigmoid function is often used as an activation function for artificial neurons), precomputed (Precomputed kernel).
normalize_cm (bool) - (False) Whether or not to normalize the confusion matrix.
random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
random_state_train_test (int) - (5) [1~1000|1] Controls the shuffling applied to the data before applying the split.
test_size (float) - (0.2) [0~1|0.05] Represents the proportion of the dataset to include in the test split. It should be between 0.0 and 1.0.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.classification.support_vector_machine import support_vector_machine prop = { 'independent_vars': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'kernel': 'rbf', 'test_size': 0.2 } support_vector_machine(input_dataset_path='/path/to/myDataset.csv', output_model_path='/path/to/newModel.pkl', output_test_table_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn SupportVectorMachine
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
SupportVectorMachine
classification.support_vector_machine.SupportVectorMachine object.
- classification.support_vector_machine.main()[source]
Command line execution of this building block. Please check the command line documentation.
- classification.support_vector_machine.support_vector_machine(input_dataset_path: str, output_model_path: str, output_test_table_path: str | None = None, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
SupportVectorMachine
class and execute thelaunch()
method.
classification.classification_predict module
Module containing the ClassificationPredict class and the command line interface.
- class classification.classification_predict.ClassificationPredict(input_model_path, output_results_path, input_dataset_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml ClassificationPredictMakes predictions from an input dataset and a given classification model.Makes predictions from an input dataset (provided either as a file or as a dictionary property) and a given classification model trained with DecisionTreeClassifier, KNeighborsClassifier, LogisticRegression, RandomForestClassifier, Support Vector Machine methods.- Parameters:
input_model_path (str) –
Path to the input model. File type: input. Sample file. Accepted formats: pkl (edam:format_3653).
input_dataset_path (str) (Optional) –
Path to the dataset to predict. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_results_path (str) –
Path to the output results file. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
predictions (list) - (None) List of dictionaries with all values you want to predict targets. It will be taken into account only in case input_dataset_path is not provided. Format: [{ ‘var1’: 1.0, ‘var2’: 2.0 }, { ‘var1’: 4.0, ‘var2’: 2.7 }] for datasets with headers and [[ 1.0, 2.0 ], [ 4.0, 2.7 ]] for datasets without headers.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.classification.classification_predict import classification_predict prop = { 'predictions': [ { 'var1': 1.0, 'var2': 2.0 }, { 'var1': 4.0, 'var2': 2.7 } ] } classification_predict(input_model_path='/path/to/myModel.pkl', output_results_path='/path/to/newPredictedResults.csv', input_dataset_path='/path/to/myDataset.csv', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
ClassificationPredict
classification.classification_predict.ClassificationPredict object.
- classification.classification_predict.classification_predict(input_model_path: str, output_results_path: str, input_dataset_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
ClassificationPredict
class and execute thelaunch()
method.