dimensionality_reduction package
Submodules
dimensionality_reduction.pls_components module
Module containing the PLSComponents class and the command line interface.
- class dimensionality_reduction.pls_components.PLSComponents(input_dataset_path, output_results_path, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml PLSComponentsWrapper of the scikit-learn PLSRegression method.Calculates best components number for a Partial Least Square (PLS) Regression. Visit the PLSRegression documentation page in the sklearn official website for further information.- Parameters:
input_dataset_path (str) – Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_results_path (str) –
Table with R2 and MSE for calibration and cross-validation data for the best number of components. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the Mean Square Error plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
features (dict) - ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
optimise (boolean) - (False) Whether or not optimise the process of MSE calculation. Beware, if True selected, the process can take a long time depending on the max_components value.
max_components (int) - (10) [1~1000|1] Maximum number of components to use by default for PLS queries.
cv (int) - (10) [1~10000|1] Specify the number of folds in the cross-validation splitting strategy. Value must be between 2 and number of samples in the dataset.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.dimensionality_reduction.pls_components import pls_components prop = { 'features': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'max_components': 10, 'cv': 10 } pls_components(input_dataset_path='/path/to/myDataset.csv', output_results_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn PLSRegression
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
PLSComponents
dimensionality_reduction.pls_components.PLSComponents object.
- dimensionality_reduction.pls_components.main()[source]
Command line execution of this building block. Please check the command line documentation.
- dimensionality_reduction.pls_components.pls_components(input_dataset_path: str, output_results_path: str, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
PLSComponents
class and execute thelaunch()
method.
dimensionality_reduction.pls_regression module
Module containing the PLS_Regression class and the command line interface.
- class dimensionality_reduction.pls_regression.PLS_Regression(input_dataset_path, output_results_path, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml PLS_RegressionWrapper of the scikit-learn PLSRegression method.Gives results for a Partial Least Square (PLS) Regression. Visit the PLSRegression documentation page in the sklearn official website for further information.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_results_path (str) –
Table with R2 and MSE for calibration and cross-validation data. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the R2 cross-validation plot. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
features (dict) - ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
n_components (int) - (5) [1~1000|1] Maximum number of components to use by default for PLS queries.
cv (int) - (10) [1~10000|1] Specify the number of folds in the cross-validation splitting strategy. Value must be betwwen 2 and number of samples in the dataset.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.dimensionality_reduction.pls_regression import pls_regression prop = { 'features': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'n_components': 12, 'cv': 10 } pls_regression(input_dataset_path='/path/to/myDataset.csv', output_results_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn PLSRegression
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
PLS_Regression
dimensionality_reduction.pls_regression.PLS_Regression object.
- dimensionality_reduction.pls_regression.main()[source]
Command line execution of this building block. Please check the command line documentation.
- dimensionality_reduction.pls_regression.pls_regression(input_dataset_path: str, output_results_path: str, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
PLS_Regression
class and execute thelaunch()
method.
dimensionality_reduction.principal_component module
Module containing the PrincipalComponentAnalysis class and the command line interface.
- class dimensionality_reduction.principal_component.PrincipalComponentAnalysis(input_dataset_path, output_results_path, output_plot_path=None, properties=None, **kwargs)[source]
Bases:
BiobbObject
biobb_ml PrincipalComponentAnalysisWrapper of the scikit-learn PCA method.Analyses a given dataset through Principal Component Analysis (PCA). Visit the PCA documentation page in the sklearn official website for further information.- Parameters:
input_dataset_path (str) –
Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_results_path (str) –
Path to the analysed dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the Principal Component plot, only if number of components is 2 or 3. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
features (dict) - ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
n_components (dict) - ({}) Dictionary containing the number of components to keep (int) or the minimum number of principal components such the 0 to 1 range of the variance (float) is retained. If not set ({}) all components are kept. Formats for integer values: { “value”: 2 } or for float values: { “value”: 0.3 }
random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
scale (bool) - (False) Whether or not to scale the input dataset.
remove_tmp (bool) - (True) [WF property] Remove temporal files.
restart (bool) - (False) [WF property] Do not execute if output files exist.
sandbox_path (str) - (“./”) [WF property] Parent path to the sandbox directory.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.dimensionality_reduction.principal_component import principal_component prop = { 'features': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'n_components': { 'int': 2 } } principal_component(input_dataset_path='/path/to/myDataset.csv', output_results_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
name: scikit-learn PCA
version: >=0.24.2
license: BSD 3-Clause
- ontology:
name: EDAM
schema: http://edamontology.org/EDAM.owl
- launch() int [source]
Execute the
PrincipalComponentAnalysis
dimensionality_reduction.pincipal_component.PrincipalComponentAnalysis object.
- dimensionality_reduction.principal_component.main()[source]
Command line execution of this building block. Please check the command line documentation.
- dimensionality_reduction.principal_component.principal_component(input_dataset_path: str, output_results_path: str, output_plot_path: str | None = None, properties: dict | None = None, **kwargs) int [source]
Execute the
PrincipalComponentAnalysis
class and execute thelaunch()
method.