utils package
Submodules
utils.correlation_matrix module
Module containing the CorrelationMatrix class and the command line interface.
- class utils.correlation_matrix.CorrelationMatrix(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml CorrelationMatrixGenerates a correlation matrix from a given dataset.- Parameters:
input_dataset_path (str) – Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) –
Path to the correlation matrix plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic) –
features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.correlation_matrix import correlation_matrix prop = { 'features': { 'columns': [ 'column1', 'column2', 'column3' ] } } correlation_matrix(input_dataset_path='/path/to/myDataset.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: In house
license: Apache-2.0
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
CorrelationMatrix
utils.correlation_matrix.CorrelationMatrix object.
- utils.correlation_matrix.correlation_matrix(input_dataset_path: str, output_plot_path: str, properties: dict | None = None, **kwargs) int [source]
Execute the
CorrelationMatrix
class and execute thelaunch()
method.
utils.dendrogram module
Module containing the Dendrogram class and the command line interface.
- class utils.dendrogram.Dendrogram(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml DendrogramGenerates a dendrogram from a given dataset.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) –
Path to the dendrogram plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic) –
features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.dendrogram import dendrogram prop = { 'features': { 'columns': [ 'column1', 'column2', 'column3' ] } } dendrogram(input_dataset_path='/path/to/myDataset.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: In house
license: Apache-2.0
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
Dendrogram
utils.dendrogram.Dendrogram object.
- utils.dendrogram.dendrogram(input_dataset_path: str, output_plot_path: str, properties: dict | None = None, **kwargs) int [source]
Execute the
Dendrogram
class and execute thelaunch()
method.
utils.drop_columns module
Module containing the DropColumns class and the command line interface.
- class utils.drop_columns.DropColumns(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml DropColumnsDrops columns from a given dataset.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
properties (dic) –
targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.drop_columns import drop_columns prop = { 'targets': { 'columns': [ 'column1', 'column2', 'column3' ] } } drop_columns(input_dataset_path='/path/to/myDataset.csv', output_dataset_path='/path/to/newDataset.csv', properties=prop)
- Info:
- wrapped_software:
name: In house
license: Apache-2.0
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
DropColumns
utils.drop_columns.DropColumns object.
- utils.drop_columns.drop_columns(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) int [source]
Execute the
DropColumns
class and execute thelaunch()
method.
utils.dummy_variables module
Module containing the DummyVariables class and the command line interface.
- class utils.dummy_variables.DummyVariables(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml DummyVariablesConverts categorical variables into dummy/indicator variables (binaries).- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
properties (dic) –
targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. If None given, all the columns will be taken. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.dummy_variables import dummy_variables prop = { 'targets': { 'columns': [ 'column1', 'column2', 'column3' ] } } dummy_variables(input_dataset_path='/path/to/myDataset.csv', output_dataset_path='/path/to/newDataset.csv', properties=prop)
- Info:
- wrapped_software:
name: In house
license: Apache-2.0
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
DummyVariables
utils.dummy_variables.DummyVariables object.
- utils.dummy_variables.dummy_variables(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) int [source]
Execute the
DummyVariables
class and execute thelaunch()
method.
utils.map_variables module
Module containing the MapVariables class and the command line interface.
- class utils.map_variables.MapVariables(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml MapVariablesMaps the values of a given dataset.Maps the values of a given dataset according to input correspondence, substituting each value in a series with another value, which may be derived from a function, a dictionary, or another series.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
properties (dic) –
targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. If None given, all the columns will be taken. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.map_variables import map_variables prop = { 'targets': { 'columns': [ 'column1', 'column2', 'column3' ] } } map_variables(input_dataset_path='/path/to/myDataset.csv', output_dataset_path='/path/to/newDataset.csv', properties=prop)
- Info:
- wrapped_software:
name: In house
license: Apache-2.0
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
MapVariables
utils.map_variables.MapVariables object.
- utils.map_variables.main()[source]
Command line execution of this building block. Please check the command line documentation.
- utils.map_variables.map_variables(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) int [source]
Execute the
MapVariables
class and execute thelaunch()
method.
utils.pairwise_comparison module
Module containing the PairwiseComparison class and the command line interface.
- class utils.pairwise_comparison.PairwiseComparison(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml PairwiseComparisonGenerates a pairwise comparison from a given dataset.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) –
Path to the pairwise comparison plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic) –
features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.pairwise_comparison import pairwise_comparison prop = { 'features': { 'columns': [ 'column1', 'column2', 'column3' ] } } pairwise_comparison(input_dataset_path='/path/to/myDataset.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: In house
license: Apache-2.0
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
PairwiseComparison
utils.pairwise_comparison.PairwiseComparison object.
- utils.pairwise_comparison.main()[source]
Command line execution of this building block. Please check the command line documentation.
- utils.pairwise_comparison.pairwise_comparison(input_dataset_path: str, output_plot_path: str, properties: dict | None = None, **kwargs) int [source]
Execute the
PairwiseComparison
class and execute thelaunch()
method.
utils.scale_columns module
Module containing the ScaleColumns class and the command line interface.
- class utils.scale_columns.ScaleColumns(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml ScaleColumnsScales columns from a given dataset.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
properties (dic) –
targets (dict) - ({}) Independent variables or columns from your dataset you want to scale. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.scale_columns import scale_columns prop = { 'targets': { 'columns': [ 'column1', 'column2', 'column3' ] } } scale_columns(input_dataset_path='/path/to/myDataset.csv', output_dataset_path='/path/to/newDataset.csv', properties=prop)
- Info:
- wrapped_software:
name: In house
license: Apache-2.0
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
ScaleColumns
utils.scale_columns.ScaleColumns object.
- utils.scale_columns.main()[source]
Command line execution of this building block. Please check the command line documentation.
- utils.scale_columns.scale_columns(input_dataset_path: str, output_dataset_path: str, properties: dict | None = None, **kwargs) int [source]
Execute the
ScaleColumns
class and execute thelaunch()
method.