utils package

Submodules

utils.correlation_matrix module

Module containing the CorrelationMatrix class and the command line interface.

class utils.correlation_matrix.CorrelationMatrix(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml CorrelationMatrix

Generates a correlation matrix from a given dataset.

Parameters:

input_dataset_path (str) – Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) –
Path to the correlation matrix plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic) –
- features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.utils.correlation_matrix import correlation_matrix
prop = {
    'features': {
        'columns': [ 'column1', 'column2', 'column3' ]
    }
}
correlation_matrix(input_dataset_path='/path/to/myDataset.csv',
                output_plot_path='/path/to/newPlot.png',
                properties=prop)

Info:

wrapped_software:
- name: In house
- license: Apache-2.0
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the CorrelationMatrix utils.correlation_matrix.CorrelationMatrix object.

utils.correlation_matrix.correlation_matrix(input_dataset_path: str, output_plot_path: str, properties: dict | None = None, **kwargs) → int[source]: Execute the CorrelationMatrix class and execute the launch() method.

utils.correlation_matrix.main()[source]: Command line execution of this building block. Please check the command line documentation.

utils.dendrogram module

Module containing the Dendrogram class and the command line interface.

class utils.dendrogram.Dendrogram(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml Dendrogram

Generates a dendrogram from a given dataset.

Parameters:

input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) –
Path to the dendrogram plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic) –
- features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.utils.dendrogram import dendrogram
prop = {
    'features': {
        'columns': [ 'column1', 'column2', 'column3' ]
    }
}
dendrogram(input_dataset_path='/path/to/myDataset.csv',
                output_plot_path='/path/to/newPlot.png',
                properties=prop)

Info:

wrapped_software:
- name: In house
- license: Apache-2.0
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the Dendrogram utils.dendrogram.Dendrogram object.

utils.dendrogram.dendrogram(input_dataset_path: str, output_plot_path: str, properties: dict | None = None, **kwargs) → int[source]: Execute the Dendrogram class and execute the launch() method.

utils.dendrogram.main()[source]: Command line execution of this building block. Please check the command line documentation.

utils.drop_columns module

Module containing the DropColumns class and the command line interface.

class utils.drop_columns.DropColumns(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml DropColumns

Drops columns from a given dataset.

Parameters:

input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
properties (dic) –
- targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.utils.drop_columns import drop_columns
prop = {
    'targets': {
        'columns': [ 'column1', 'column2', 'column3' ]
    }
}
drop_columns(input_dataset_path='/path/to/myDataset.csv',
                output_dataset_path='/path/to/newDataset.csv',
                properties=prop)

Info:

wrapped_software:
- name: In house
- license: Apache-2.0
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the DropColumns utils.drop_columns.DropColumns object.

utils.drop_columns.drop_columns(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) → int[source]: Execute the DropColumns class and execute the launch() method.

utils.drop_columns.main()[source]: Command line execution of this building block. Please check the command line documentation.

utils.dummy_variables module

Module containing the DummyVariables class and the command line interface.

class utils.dummy_variables.DummyVariables(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml DummyVariables

Converts categorical variables into dummy/indicator variables (binaries).

Parameters:

input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
properties (dic) –
- targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. If None given, all the columns will be taken. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.utils.dummy_variables import dummy_variables
prop = {
    'targets': {
        'columns': [ 'column1', 'column2', 'column3' ]
    }
}
dummy_variables(input_dataset_path='/path/to/myDataset.csv',
                output_dataset_path='/path/to/newDataset.csv',
                properties=prop)

Info:

wrapped_software:
- name: In house
- license: Apache-2.0
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the DummyVariables utils.dummy_variables.DummyVariables object.

utils.dummy_variables.dummy_variables(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) → int[source]: Execute the DummyVariables class and execute the launch() method.

utils.dummy_variables.main()[source]: Command line execution of this building block. Please check the command line documentation.

utils.map_variables module

Module containing the MapVariables class and the command line interface.

class utils.map_variables.MapVariables(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml MapVariables
Maps the values of a given dataset.
Maps the values of a given dataset according to input correspondence, substituting each value in a series with another value, which may be derived from a function, a dictionary, or another series.

Parameters:

input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
properties (dic) –
- targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. If None given, all the columns will be taken. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.utils.map_variables import map_variables
prop = {
    'targets': {
        'columns': [ 'column1', 'column2', 'column3' ]
    }
}
map_variables(input_dataset_path='/path/to/myDataset.csv',
                output_dataset_path='/path/to/newDataset.csv',
                properties=prop)

Info:

wrapped_software:
- name: In house
- license: Apache-2.0
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the MapVariables utils.map_variables.MapVariables object.

utils.map_variables.main()[source]: Command line execution of this building block. Please check the command line documentation.

utils.map_variables.map_variables(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) → int[source]: Execute the MapVariables class and execute the launch() method.

utils.pairwise_comparison module

Module containing the PairwiseComparison class and the command line interface.

class utils.pairwise_comparison.PairwiseComparison(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml PairwiseComparison

Generates a pairwise comparison from a given dataset.

Parameters:

input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) –
Path to the pairwise comparison plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic) –
- features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.utils.pairwise_comparison import pairwise_comparison
prop = {
    'features': {
        'columns': [ 'column1', 'column2', 'column3' ]
    }
}
pairwise_comparison(input_dataset_path='/path/to/myDataset.csv',
                output_plot_path='/path/to/newPlot.png',
                properties=prop)

Info:

wrapped_software:
- name: In house
- license: Apache-2.0
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the PairwiseComparison utils.pairwise_comparison.PairwiseComparison object.

utils.pairwise_comparison.main()[source]: Command line execution of this building block. Please check the command line documentation.

utils.pairwise_comparison.pairwise_comparison(input_dataset_path: str, output_plot_path: str, properties: dict | None = None, **kwargs) → int[source]: Execute the PairwiseComparison class and execute the launch() method.

utils.scale_columns module

Module containing the ScaleColumns class and the command line interface.

class utils.scale_columns.ScaleColumns(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml ScaleColumns

Scales columns from a given dataset.

Parameters:

input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
properties (dic) –
- targets (dict) - ({}) Independent variables or columns from your dataset you want to scale. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
- sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.utils.scale_columns import scale_columns
prop = {
    'targets': {
        'columns': [ 'column1', 'column2', 'column3' ]
    }
}
scale_columns(input_dataset_path='/path/to/myDataset.csv',
                output_dataset_path='/path/to/newDataset.csv',
                properties=prop)

Info:

wrapped_software:
- name: In house
- license: Apache-2.0
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]: Checks all the input/output paths and parameters

launch() → int[source]: Execute the ScaleColumns utils.scale_columns.ScaleColumns object.

utils.scale_columns.main()[source]: Command line execution of this building block. Please check the command line documentation.

utils.scale_columns.scale_columns(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) → int[source]: Execute the ScaleColumns class and execute the launch() method.