alphaviz.io¶
This module provides functions to read MQ/DiaNN/AlphaPept output files and other IO supplementary functions.
Functions:
|
|
|
|
|
Extract information about peptides from the loaded main DIANN output .tsv file. |
|
Extract information about genes, proteins and protein groups from the loaded main DIANN output .tsv file. |
|
Search for files with the specified extension in the repository and return a list of all file names with that extention. |
|
|
|
Load two files from the DiaNN output folder and returns the data frames containing information about proteins, peptides, and summary information about the whole experiment. |
|
Read the DIANN output .stats.tsv file. |
|
Read some columns from the output file allPeptides.txt of MaxQuant software. |
|
Read some columns from the output file evidence.txt of MaxQuant software. |
|
Read some columns from the output file msms.txt of MaxQuant software. |
|
Read all specified files from the MQ output folder and returns the data frames for each of the files. |
|
Read the output file proteinGroups.txt of MaxQuant software. |
|
Read the output file summary.txt of MaxQuant software. |
|
Read the fasta file using the pyteomics package. |
|
Enable reading the file and retrieving the values from the specified columns. |
- alphaviz.io.create_diann_peptides_table(diann_df: DataFrame)[source]¶
Extract information about peptides from the loaded main DIANN output .tsv file.
- Parameters
diann_df (pd.DataFrame) – The original data frame after loading the main .tsv DIANN output file and filter by the experiment name.
- Returns
The output data frame contains information about peptides.
- Return type
pd.DataFrame
- alphaviz.io.create_diann_proteins_table(diann_df: DataFrame, fasta: object)[source]¶
Extract information about genes, proteins and protein groups from the loaded main DIANN output .tsv file.
- Parameters
diann_df (pd.DataFrame) – The original data frame after loading the main .tsv DIANN output file and filter by the experiment name.
fasta (pyteomics.fasta.IndexedUniProt) – The object containing information about all proteins from the fasta file.
- Returns
The output data frame contains information about genes, proteins and proteins groups.
- Return type
pd.DataFrame
- alphaviz.io.get_filenames_from_directory(directory: str, extensions_list: list) list [source]¶
Search for files with the specified extension in the repository and return a list of all file names with that extention.
- Parameters
directory (str) – Path to the repository to search in.
extensions_list (list) – A list of extensions, e.g. [‘d’, ‘hdf’].
- Returns
The list of filtered file names based on their extensions.
- Return type
list
- alphaviz.io.import_alphapept_output(path_ap_output_folder: str, experiment: str, fasta: object)[source]¶
- alphaviz.io.import_diann_output(path_diann_output_folder: str, experiment: str, fasta: object)[source]¶
Load two files from the DiaNN output folder and returns the data frames containing information about proteins, peptides, and summary information about the whole experiment.
- Parameters
path_diann_output_folder (str) – Path to the DIANN output folder with all output files needed.
experiment (str) – The name of the experiment.
fasta (pyteomics.fasta.IndexedUniProt) – The object containing information about all proteins from the fasta file.
- Returns
The function returns three pandas data frame with the extracted information about proteins, peptides, and summary information about the whole experiment.
- Return type
list of pd.DataFrames
- alphaviz.io.import_diann_stats(filepath: str, experiment: str)[source]¶
Read the DIANN output .stats.tsv file.
- Parameters
filepath (str) – Full path to the .stats.tsv file.
experiment (str) – The name of the experiment.
- Returns
The output data frame contains summary information about the whole experiment.
- Return type
pd.DataFrame
- alphaviz.io.import_mq_all_peptides(filepath: str) DataFrame [source]¶
Read some columns from the output file allPeptides.txt of MaxQuant software.
- Parameters
filepath (str) – Full path to the allPeptides.txt file.
- Returns
- The output data frame contains information about the following MQ columns:
’Pasef MS/MS IDs’ (‘list’ type),
’MS/MS scan number’ (‘int’ type).
The rows of the data frame with missing ‘MS/MS scan number’ values are dropped.
- Return type
pd.DataFrame
- alphaviz.io.import_mq_evidence(filepath: str, experiment: str) DataFrame [source]¶
Read some columns from the output file evidence.txt of MaxQuant software.
- Parameters
filepath (str) – Full path to the evidence.txt file.
experiment (str) – The name of the experiment.
- Returns
- The output data frame contains information about the following MQ columns:
’Sequence’,
’Length’ (‘int’ type),
’Acetyl (Protein N-term)’ (renamed to ‘Acetylation (N-term)’) (‘int’ type),
’Oxidation (M)’ (‘int’ type),
’Proteins’,
’Retention time’ (‘float:.4d’ type),
’Mass’ (‘float:.4d’ type),
’m/z’ (‘float:.4d’ type),
’Charge’ (‘category’ type),
’Intensity’ (‘int’ type),
’1/K0’ (‘float:.4d’ type),
’MS/MS count’ (‘category’ type),
’MS/MS scan number’ (‘int’ type),
’Gene names’ (‘category’ type),
’Score’ (renamed to ‘Andromeda score’) (‘int’ type),
’Raw file’ (‘category’ type),
’Uncalibrated mass error [ppm]’ (‘float:.4d’ type),
’Mass error [ppm]’ (‘float:.4d’ type),
’Modified sequence’.
Renamed columns are marked as is the output data type of all columns. The rows of the data frame with missing ‘MS/MS scan number’ values are dropped.
- Return type
pd.DataFrame
- alphaviz.io.import_mq_msms(filepath: str, experiment: str) DataFrame [source]¶
Read some columns from the output file msms.txt of MaxQuant software.
- Parameters
filepath (str) – Full path to the msms.txt file.
- Returns
- The output data frame contains information about the following MQ columns:
’Scan number’ (‘int’ type),
’Matches’,
’Masses’,
’Mass deviations [Da]’,
’Mass deviations [ppm]’.
- Return type
pd.DataFrame
- alphaviz.io.import_mq_output(necessary_files: list, path_mq_output_folder: str, experiment: str)[source]¶
Read all specified files from the MQ output folder and returns the data frames for each of the files.
- Parameters
necessary_files (list) – A list of strings containing the names of the MQ output files with extensions, e.g. [‘allPeptides.txt’, ‘msms.txt’].
path_mq_output_folder (str) – Path to the MaxQuant output folder with all output files needed.
experiment (str) – The name of the experiment.
- Returns
For each of the specified MQ output files, the function returns a pandas data frame with the extracted information.
- Return type
generator
- alphaviz.io.import_mq_protein_groups(filepath: str, experiment: str) DataFrame [source]¶
Read the output file proteinGroups.txt of MaxQuant software.
- Parameters
filepath (str) – Full path to the proteinGroups.txt file.
experiment (str) – The name of the experiment.
- Returns
# pd.DataFrame
# The output data frame contains information about the following MQ columns
# - ‘Protein IDs’,
# - ‘Protein names’,
# - ‘Gene names’,
# - ‘Number of proteins’ (renamed to ‘# proteins’),
# - ‘Mol. weight [kDa]’ (renamed to ‘Mol weight, kDa’),
# - f’Peptides Exp_{experiment}’ (renamed to ‘(EXP) # peptides’),
# - f’Unique peptides Exp_{experiment}’ (renamed to ‘(EXP) # unique peptides’),
# - f’Sequence coverage Exp_{experiment} [%]’ (renamed to ‘(EXP) Seq coverage, %’),
# - ‘MS/MS count’ (renamed to ‘# MS/MS’),
# - ‘Sequence lengths’,
# Renamed columns are marked. The rows of the data frame with missing ‘Gene names’ values are dropped.
- alphaviz.io.import_mq_summary(filepath: str) DataFrame [source]¶
Read the output file summary.txt of MaxQuant software.
- Parameters
filepath (str) – Full path to the msms.txt file.
- Returns
The output data frame contains summary information of all the experiments.
- Return type
pd.DataFrame
- alphaviz.io.read_fasta(filepath: str) dict [source]¶
Read the fasta file using the pyteomics package.
- Parameters
filepath (str) – Full path to the .fasta file.
- Returns
The output object allows access to all available information in the fasta file using the protein ID.
- Return type
pyteomics.fasta.IndexedUniProt object
- alphaviz.io.read_file(filepath: str, column_names: list) DataFrame [source]¶
Enable reading the file and retrieving the values from the specified columns. Compared to function pd.read_csv() it gains significant time if the file is huge and is only a few ms slower for small files.
- Parameters
filepath (str) – Full path to the file.
column_names (list) – A list of column names to be read.
- Returns
This data frame contains data from all columns of the specified file.
- Return type
pd.DataFrame