utils package


utils.correlation_matrix module

Module containing the CorrelationMatrix class and the command line interface.

class utils.correlation_matrix.CorrelationMatrix(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml CorrelationMatrix
Generates a correlation matrix from a given dataset.
  • input_dataset_path (str) – Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).

  • output_plot_path (str) –

    Path to the correlation matrix plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).

  • properties (dic) –

    • features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.

    • remove_tmp (bool) - (True) [WF property] Remove temporal files.

    • restart (bool) - (False) [WF property] Do not execute if output files exist.

    • sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.


This is a use example of how to use the building block from Python:

from biobb_ml.utils.correlation_matrix import correlation_matrix
prop = {
    'features': {
        'columns': [ 'column1', 'column2', 'column3' ]
check_data_params(out_log, err_log)[source]

Checks all the input/output paths and parameters

launch() int[source]

Execute the CorrelationMatrix utils.correlation_matrix.CorrelationMatrix object.

utils.correlation_matrix.correlation_matrix(input_dataset_path: str, output_plot_path: str, properties: dict | None = None, **kwargs) int[source]

Execute the CorrelationMatrix class and execute the launch() method.


Command line execution of this building block. Please check the command line documentation.

utils.dendrogram module

Module containing the Dendrogram class and the command line interface.

class utils.dendrogram.Dendrogram(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml Dendrogram
Generates a dendrogram from a given dataset.
  • input_dataset_path (str) –

    Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).

  • output_plot_path (str) –

    Path to the dendrogram plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).

  • properties (dic) –

    • features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.

    • remove_tmp (bool) - (True) [WF property] Remove temporal files.

    • restart (bool) - (False) [WF property] Do not execute if output files exist.

    • sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.


This is a use example of how to use the building block from Python:

from biobb_ml.utils.dendrogram import dendrogram
prop = {
    'features': {
        'columns': [ 'column1', 'column2', 'column3' ]
check_data_params(out_log, err_log)[source]

Checks all the input/output paths and parameters

launch() int[source]

Execute the Dendrogram utils.dendrogram.Dendrogram object.

utils.dendrogram.dendrogram(input_dataset_path: str, output_plot_path: str, properties: dict | None = None, **kwargs) int[source]

Execute the Dendrogram class and execute the launch() method.


Command line execution of this building block. Please check the command line documentation.

utils.drop_columns module

Module containing the DropColumns class and the command line interface.

class utils.drop_columns.DropColumns(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml DropColumns
Drops columns from a given dataset.
  • input_dataset_path (str) –

    Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).

  • output_dataset_path (str) –

    Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).

  • properties (dic) –

    • targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.

    • remove_tmp (bool) - (True) [WF property] Remove temporal files.

    • restart (bool) - (False) [WF property] Do not execute if output files exist.

    • sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.


This is a use example of how to use the building block from Python:

from biobb_ml.utils.drop_columns import drop_columns
prop = {
    'targets': {
        'columns': [ 'column1', 'column2', 'column3' ]
check_data_params(out_log, err_log)[source]

Checks all the input/output paths and parameters

launch() int[source]

Execute the DropColumns utils.drop_columns.DropColumns object.

utils.drop_columns.drop_columns(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) int[source]

Execute the DropColumns class and execute the launch() method.


Command line execution of this building block. Please check the command line documentation.

utils.dummy_variables module

Module containing the DummyVariables class and the command line interface.

class utils.dummy_variables.DummyVariables(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml DummyVariables
Converts categorical variables into dummy/indicator variables (binaries).
  • input_dataset_path (str) –

    Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).

  • output_dataset_path (str) –

    Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).

  • properties (dic) –

    • targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. If None given, all the columns will be taken. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.

    • remove_tmp (bool) - (True) [WF property] Remove temporal files.

    • restart (bool) - (False) [WF property] Do not execute if output files exist.

    • sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.


This is a use example of how to use the building block from Python:

from biobb_ml.utils.dummy_variables import dummy_variables
prop = {
    'targets': {
        'columns': [ 'column1', 'column2', 'column3' ]
check_data_params(out_log, err_log)[source]

Checks all the input/output paths and parameters

launch() int[source]

Execute the DummyVariables utils.dummy_variables.DummyVariables object.

utils.dummy_variables.dummy_variables(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) int[source]

Execute the DummyVariables class and execute the launch() method.


Command line execution of this building block. Please check the command line documentation.

utils.map_variables module

Module containing the MapVariables class and the command line interface.

class utils.map_variables.MapVariables(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml MapVariables
Maps the values of a given dataset.
Maps the values of a given dataset according to input correspondence, substituting each value in a series with another value, which may be derived from a function, a dictionary, or another series.
  • input_dataset_path (str) –

    Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).

  • output_dataset_path (str) –

    Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).

  • properties (dic) –

    • targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. If None given, all the columns will be taken. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.

    • remove_tmp (bool) - (True) [WF property] Remove temporal files.

    • restart (bool) - (False) [WF property] Do not execute if output files exist.

    • sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.


This is a use example of how to use the building block from Python:

from biobb_ml.utils.map_variables import map_variables
prop = {
    'targets': {
        'columns': [ 'column1', 'column2', 'column3' ]
check_data_params(out_log, err_log)[source]

Checks all the input/output paths and parameters

launch() int[source]

Execute the MapVariables utils.map_variables.MapVariables object.


Command line execution of this building block. Please check the command line documentation.

utils.map_variables.map_variables(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) int[source]

Execute the MapVariables class and execute the launch() method.

utils.pairwise_comparison module

Module containing the PairwiseComparison class and the command line interface.

class utils.pairwise_comparison.PairwiseComparison(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml PairwiseComparison
Generates a pairwise comparison from a given dataset.
  • input_dataset_path (str) –

    Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).

  • output_plot_path (str) –

    Path to the pairwise comparison plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).

  • properties (dic) –

    • features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.

    • remove_tmp (bool) - (True) [WF property] Remove temporal files.

    • restart (bool) - (False) [WF property] Do not execute if output files exist.

    • sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.


This is a use example of how to use the building block from Python:

from biobb_ml.utils.pairwise_comparison import pairwise_comparison
prop = {
    'features': {
        'columns': [ 'column1', 'column2', 'column3' ]
check_data_params(out_log, err_log)[source]

Checks all the input/output paths and parameters

launch() int[source]

Execute the PairwiseComparison utils.pairwise_comparison.PairwiseComparison object.


Command line execution of this building block. Please check the command line documentation.

utils.pairwise_comparison.pairwise_comparison(input_dataset_path: str, output_plot_path: str, properties: dict | None = None, **kwargs) int[source]

Execute the PairwiseComparison class and execute the launch() method.

utils.scale_columns module

Module containing the ScaleColumns class and the command line interface.

class utils.scale_columns.ScaleColumns(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]

Bases: BiobbObject

biobb_ml ScaleColumns
Scales columns from a given dataset.
  • input_dataset_path (str) –

    Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).

  • output_dataset_path (str) –

    Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).

  • properties (dic) –

    • targets (dict) - ({}) Independent variables or columns from your dataset you want to scale. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.

    • remove_tmp (bool) - (True) [WF property] Remove temporal files.

    • restart (bool) - (False) [WF property] Do not execute if output files exist.

    • sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.


This is a use example of how to use the building block from Python:

from biobb_ml.utils.scale_columns import scale_columns
prop = {
    'targets': {
        'columns': [ 'column1', 'column2', 'column3' ]
check_data_params(out_log, err_log)[source]

Checks all the input/output paths and parameters

launch() int[source]

Execute the ScaleColumns utils.scale_columns.ScaleColumns object.


Command line execution of this building block. Please check the command line documentation.

utils.scale_columns.scale_columns(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) int[source]

Execute the ScaleColumns class and execute the launch() method.