dimensionality_reduction package¶

Submodules¶

dimensionality_reduction.pls_components module¶

dimensionality_reduction.pls_regression module¶

dimensionality_reduction.principal_component module¶

Module containing the PrincipalComponentAnalysis class and the command line interface.

class dimensionality_reduction.principal_component.PrincipalComponentAnalysis(input_dataset_path, output_results_path, output_plot_path=None, properties=None, **kwargs)[source]¶

Bases: biobb_common.generic.biobb_object.BiobbObject

biobb_ml PrincipalComponentAnalysis
Wrapper of the scikit-learn PCA method.
Analyses a given dataset through Principal Component Analysis (PCA). Visit the PCA documentation page in the sklearn official website for further information.

Parameters:

input_dataset_path (str) – Path to the input dataset. File type: input. Sample file. Accepted formats: csv (edam:format_3752).
output_results_path (str) –
Path to the analysed dataset. File type: output. Sample file. Accepted formats: csv (edam:format_3752).
output_plot_path (str) (Optional) –
Path to the Principal Component plot, only if number of components is 2 or 3. File type: output. Sample file. Accepted formats: png (edam:format_3603).
properties (dic - Python dictionary object containing the tool parameters, not input/output files) –
- features (dict) - ({}) Features or columns from your dataset you want to use for fitting. You can specify either a list of columns names from your input dataset, a list of columns indexes or a range of columns indexes. Formats: { “columns”: [“column1”, “column2”] } or { “indexes”: [0, 2, 3, 10, 11, 17] } or { “range”: [[0, 20], [50, 102]] }. In case of mulitple formats, the first one will be picked.
- target (dict) - ({}) Dependent variable you want to predict from your dataset. You can specify either a column name or a column index. Formats: { “column”: “column3” } or { “index”: 21 }. In case of mulitple formats, the first one will be picked.
- n_components (dict) - ({}) Dictionary containing the number of components to keep (int) or the minimum number of principal components such the 0 to 1 range of the variance (float) is retained. If not set ({}) all components are kept. Formats for integer values: { “value”: 2 } or for float values: { “value”: 0.3 }
- random_state_method (int) - (5) [1~1000|1] Controls the randomness of the estimator.
- scale (bool) - (False) Whether or not to scale the input dataset.
- remove_tmp (bool) - (True) [WF property] Remove temporal files.
- restart (bool) - (False) [WF property] Do not execute if output files exist.

Examples

This is a use example of how to use the building block from Python:

from biobb_ml.dimensionality_reduction.principal_component import principal_component
prop = {
    'features': {
        'columns': [ 'column1', 'column2', 'column3' ]
    },
    'target': {
        'column': 'target'
    },
    'n_components': {
        'int': 2
    }
}
principal_component(input_dataset_path='/path/to/myDataset.csv',
                    output_results_path='/path/to/newTable.csv',
                    output_plot_path='/path/to/newPlot.png',
                    properties=prop)

Info:

wrapped_software:
- name: scikit-learn PCA
- version: >=0.24.2
- license: BSD 3-Clause
ontology:
- name: EDAM
- schema: http://edamontology.org/EDAM.owl

check_data_params(out_log, err_log)[source]¶: Checks all the input/output paths and parameters

launch() → int[source]¶: Execute the PrincipalComponentAnalysis dimensionality_reduction.pincipal_component.PrincipalComponentAnalysis object.

dimensionality_reduction.principal_component.main()[source]¶: Command line execution of this building block. Please check the command line documentation.

dimensionality_reduction.principal_component.principal_component(input_dataset_path: str, output_results_path: str, output_plot_path: str = None, properties: dict = None, **kwargs) → int[source]¶: Execute the PrincipalComponentAnalysis class and execute the launch() method.