utils package¶
Submodules¶
utils.correlation_matrix module¶
Module containing the CorrelationMatrix class and the command line interface.
-
class
utils.correlation_matrix.
CorrelationMatrix
(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]¶ Bases:
biobb_common.generic.biobb_object.BiobbObject
biobb_ml CorrelationMatrixGenerates a correlation matrix from a given dataset.Parameters: - input_dataset_path (str) – Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
- output_plot_path (str) –
Path to the correlation matrix plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
- properties (dic) –
- features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.correlation_matrix import correlation_matrix prop = { 'features': { 'columns': [ 'column1', 'column2', 'column3' ] } } correlation_matrix(input_dataset_path='/path/to/myDataset.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
- name: In house
- license: Apache-2.0
- ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl
-
launch
() → int[source]¶ Execute the
CorrelationMatrix
utils.correlation_matrix.CorrelationMatrix object.
-
utils.correlation_matrix.
correlation_matrix
(input_dataset_path: str, output_plot_path: str, properties: dict = None, **kwargs) → int[source]¶ Execute the
CorrelationMatrix
class and execute thelaunch()
method.
utils.dendrogram module¶
Module containing the Dendrogram class and the command line interface.
-
class
utils.dendrogram.
Dendrogram
(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]¶ Bases:
biobb_common.generic.biobb_object.BiobbObject
biobb_ml DendrogramGenerates a dendrogram from a given dataset.Parameters: - input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
- output_plot_path (str) –
Path to the dendrogram plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
- properties (dic) –
- features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.dendrogram import dendrogram prop = { 'features': { 'columns': [ 'column1', 'column2', 'column3' ] } } dendrogram(input_dataset_path='/path/to/myDataset.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
- name: In house
- license: Apache-2.0
- ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl
-
launch
() → int[source]¶ Execute the
Dendrogram
utils.dendrogram.Dendrogram object.
- input_dataset_path (str) –
-
utils.dendrogram.
dendrogram
(input_dataset_path: str, output_plot_path: str, properties: dict = None, **kwargs) → int[source]¶ Execute the
Dendrogram
class and execute thelaunch()
method.
utils.drop_columns module¶
Module containing the DropColumns class and the command line interface.
-
class
utils.drop_columns.
DropColumns
(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]¶ Bases:
biobb_common.generic.biobb_object.BiobbObject
biobb_ml DropColumnsDrops columns from a given dataset.Parameters: - input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
- output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
- properties (dic) –
- targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.drop_columns import drop_columns prop = { 'targets': { 'columns': [ 'column1', 'column2', 'column3' ] } } drop_columns(input_dataset_path='/path/to/myDataset.csv', output_dataset_path='/path/to/newDataset.csv', properties=prop)
- Info:
- wrapped_software:
- name: In house
- license: Apache-2.0
- ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl
-
launch
() → int[source]¶ Execute the
DropColumns
utils.drop_columns.DropColumns object.
- input_dataset_path (str) –
-
utils.drop_columns.
drop_columns
(input_dataset_path: str, output_dataset_path: str, properties: dict = None, **kwargs) → int[source]¶ Execute the
DropColumns
class and execute thelaunch()
method.
utils.dummy_variables module¶
Module containing the DummyVariables class and the command line interface.
-
class
utils.dummy_variables.
DummyVariables
(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]¶ Bases:
biobb_common.generic.biobb_object.BiobbObject
biobb_ml DummyVariablesConverts categorical variables into dummy/indicator variables (binaries).Parameters: - input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
- output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
- properties (dic) –
- targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. If None given, all the columns will be taken. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.dummy_variables import dummy_variables prop = { 'targets': { 'columns': [ 'column1', 'column2', 'column3' ] } } dummy_variables(input_dataset_path='/path/to/myDataset.csv', output_dataset_path='/path/to/newDataset.csv', properties=prop)
- Info:
- wrapped_software:
- name: In house
- license: Apache-2.0
- ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl
-
launch
() → int[source]¶ Execute the
DummyVariables
utils.dummy_variables.DummyVariables object.
- input_dataset_path (str) –
-
utils.dummy_variables.
dummy_variables
(input_dataset_path: str, output_dataset_path: str, properties: dict = None, **kwargs) → int[source]¶ Execute the
DummyVariables
class and execute thelaunch()
method.
utils.map_variables module¶
Module containing the MapVariables class and the command line interface.
-
class
utils.map_variables.
MapVariables
(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]¶ Bases:
biobb_common.generic.biobb_object.BiobbObject
biobb_ml MapVariablesMaps the values of a given dataset.Maps the values of a given dataset according to input correspondence, substituting each value in a series with another value, which may be derived from a function, a dictionary, or another series.Parameters: - input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
- output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
- properties (dic) –
- targets (dict) - ({}) Independent variables or columns from your dataset you want to drop. If None given, all the columns will be taken. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.map_variables import map_variables prop = { 'targets': { 'columns': [ 'column1', 'column2', 'column3' ] } } map_variables(input_dataset_path='/path/to/myDataset.csv', output_dataset_path='/path/to/newDataset.csv', properties=prop)
- Info:
- wrapped_software:
- name: In house
- license: Apache-2.0
- ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl
-
launch
() → int[source]¶ Execute the
MapVariables
utils.map_variables.MapVariables object.
- input_dataset_path (str) –
-
utils.map_variables.
main
()[source]¶ Command line execution of this building block. Please check the command line documentation.
-
utils.map_variables.
map_variables
(input_dataset_path: str, output_dataset_path: str, properties: dict = None, **kwargs) → int[source]¶ Execute the
MapVariables
class and execute thelaunch()
method.
utils.pairwise_comparison module¶
Module containing the PairwiseComparison class and the command line interface.
-
class
utils.pairwise_comparison.
PairwiseComparison
(input_dataset_path, output_plot_path, properties=None, **kwargs)[source]¶ Bases:
biobb_common.generic.biobb_object.BiobbObject
biobb_ml PairwiseComparisonGenerates a pairwise comparison from a given dataset.Parameters: - input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
- output_plot_path (str) –
Path to the pairwise comparison plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
- properties (dic) –
- features (dict) - ({}) Independent variables or columns from your dataset you want to compare. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.pairwise_comparison import pairwise_comparison prop = { 'features': { 'columns': [ 'column1', 'column2', 'column3' ] } } pairwise_comparison(input_dataset_path='/path/to/myDataset.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
- name: In house
- license: Apache-2.0
- ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl
-
launch
() → int[source]¶ Execute the
PairwiseComparison
utils.pairwise_comparison.PairwiseComparison object.
- input_dataset_path (str) –
-
utils.pairwise_comparison.
main
()[source]¶ Command line execution of this building block. Please check the command line documentation.
-
utils.pairwise_comparison.
pairwise_comparison
(input_dataset_path: str, output_plot_path: str, properties: dict = None, **kwargs) → int[source]¶ Execute the
PairwiseComparison
class and execute thelaunch()
method.
utils.scale_columns module¶
Module containing the ScaleColumns class and the command line interface.
-
class
utils.scale_columns.
ScaleColumns
(input_dataset_path, output_dataset_path, properties=None, **kwargs)[source]¶ Bases:
biobb_common.generic.biobb_object.BiobbObject
biobb_ml ScaleColumnsScales columns from a given dataset.Parameters: - input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
- output_dataset_path (str) –
Path to the output dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
- properties (dic) –
- targets (dict) - ({}) Independent variables or columns from your dataset you want to scale. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.utils.scale_columns import scale_columns prop = { 'targets': { 'columns': [ 'column1', 'column2', 'column3' ] } } scale_columns(input_dataset_path='/path/to/myDataset.csv', output_dataset_path='/path/to/newDataset.csv', properties=prop)
- Info:
- wrapped_software:
- name: In house
- license: Apache-2.0
- ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl
-
launch
() → int[source]¶ Execute the
ScaleColumns
utils.scale_columns.ScaleColumns object.
- input_dataset_path (str) –
-
utils.scale_columns.
main
()[source]¶ Command line execution of this building block. Please check the command line documentation.
-
utils.scale_columns.
scale_columns
(input_dataset_path: str, output_dataset_path: str, properties: dict = None, **kwargs) → int[source]¶ Execute the
ScaleColumns
class and execute thelaunch()
method.