dimensionality_reduction package¶
Submodules¶
dimensionality_reduction.pls_components module¶
dimensionality_reduction.pls_regression module¶
dimensionality_reduction.principal_component module¶
Module containing the PrincipalComponentAnalysis class and the command line interface.
-
class
dimensionality_reduction.principal_component.
PrincipalComponentAnalysis
(input_dataset_path, output_results_path, output_plot_path=None, properties=None, **kwargs)[source]¶ Bases:
biobb_common.generic.biobb_object.BiobbObject
biobb_ml PrincipalComponentAnalysisWrapper of the scikit-learn PCA method.Analyses a given dataset through Principal Component Analysis (PCA). Visit the PCA documentation page in the sklearn official website for further information.Parameters: - input_dataset_path (str) – Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
- output_results_path (str) –
Path to the analysed dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
- output_plot_path (str) (Optional) –
Path to the Principal Component plot, only if number of components is 2 or 3. File type: output. Sample file. Accepted formats: png (edam:format_3603).
- properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
- features (dict) - ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- n_components (dict) - ({}) Dictionary containing the number of components to keep (int) or the minimum number of principal components such the 0 to 1 range of the variance (float) is retained. If not set ({}) all components are kept. Formats for integer values: { “value”: 2 } or for float values: { “value”: 0.3 }
- random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
- scale (bool) - (False) Whether or not to scale the input dataset.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.
Examples
This is a use example of how to use the building block from Python:
from biobb_ml.dimensionality_reduction.principal_component import principal_component prop = { 'features': { 'columns': [ 'column1', 'column2', 'column3' ] }, 'target': { 'column': 'target' }, 'n_components': { 'int': 2 } } principal_component(input_dataset_path='/path/to/myDataset.csv', output_results_path='/path/to/newTable.csv', output_plot_path='/path/to/newPlot.png', properties=prop)
- Info:
- wrapped_software:
- name: scikit-learn PCA
- version: >=0.24.2
- license: BSD 3-Clause
- ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl
-
launch
() → int[source]¶ Execute the
PrincipalComponentAnalysis
dimensionality_reduction.pincipal_component.PrincipalComponentAnalysis object.
-
dimensionality_reduction.principal_component.
main
()[source]¶ Command line execution of this building block. Please check the command line documentation.
-
dimensionality_reduction.principal_component.
principal_component
(input_dataset_path: str, output_results_path: str, output_plot_path: str = None, properties: dict = None, **kwargs) → int[source]¶ Execute the
PrincipalComponentAnalysis
class and execute thelaunch()
method.