alphaviz.io

This module provides functions to read MQ/DiaNN/AlphaPept output files and other IO supplementary functions.

Functions:

create_ap_peptides_table(ap_df)

create_ap_proteins_table(ap_df, fasta)

create_diann_peptides_table(diann_df)

Extract information about peptides from the loaded main DIANN output .tsv file.

create_diann_proteins_table(diann_df, fasta)

Extract information about genes, proteins and protein groups from the loaded main DIANN output .tsv file.

get_filenames_from_directory(directory, ...)

Search for files with the specified extension in the repository and return a list of all file names with that extention.

import_alphapept_output(...)

import_diann_output(...)

Load two files from the DiaNN output folder and returns the data frames containing information about proteins, peptides, and summary information about the whole experiment.

import_diann_stats(filepath, experiment)

Read the DIANN output .stats.tsv file.

import_mq_all_peptides(filepath)

Read some columns from the output file allPeptides.txt of MaxQuant software.

import_mq_evidence(filepath, experiment)

Read some columns from the output file evidence.txt of MaxQuant software.

import_mq_msms(filepath, experiment)

Read some columns from the output file msms.txt of MaxQuant software.

import_mq_output(necessary_files, ...)

Read all specified files from the MQ output folder and returns the data frames for each of the files.

import_mq_protein_groups(filepath, experiment)

Read the output file proteinGroups.txt of MaxQuant software.

import_mq_summary(filepath)

Read the output file summary.txt of MaxQuant software.

read_fasta(filepath)

Read the fasta file using the pyteomics package.

read_file(filepath, column_names)

Enable reading the file and retrieving the values from the specified columns.

alphaviz.io.create_ap_peptides_table(ap_df: DataFrame)[source]
alphaviz.io.create_ap_proteins_table(ap_df: DataFrame, fasta: object)[source]
alphaviz.io.create_diann_peptides_table(diann_df: DataFrame)[source]

Extract information about peptides from the loaded main DIANN output .tsv file.

Parameters

diann_df (pd.DataFrame) – The original data frame after loading the main .tsv DIANN output file and filter by the experiment name.

Returns

The output data frame contains information about peptides.

Return type

pd.DataFrame

alphaviz.io.create_diann_proteins_table(diann_df: DataFrame, fasta: object)[source]

Extract information about genes, proteins and protein groups from the loaded main DIANN output .tsv file.

Parameters
  • diann_df (pd.DataFrame) – The original data frame after loading the main .tsv DIANN output file and filter by the experiment name.

  • fasta (pyteomics.fasta.IndexedUniProt) – The object containing information about all proteins from the fasta file.

Returns

The output data frame contains information about genes, proteins and proteins groups.

Return type

pd.DataFrame

alphaviz.io.get_filenames_from_directory(directory: str, extensions_list: list) list[source]

Search for files with the specified extension in the repository and return a list of all file names with that extention.

Parameters
  • directory (str) – Path to the repository to search in.

  • extensions_list (list) – A list of extensions, e.g. [‘d’, ‘hdf’].

Returns

The list of filtered file names based on their extensions.

Return type

list

alphaviz.io.import_alphapept_output(path_ap_output_folder: str, experiment: str, fasta: object)[source]
alphaviz.io.import_diann_output(path_diann_output_folder: str, experiment: str, fasta: object)[source]

Load two files from the DiaNN output folder and returns the data frames containing information about proteins, peptides, and summary information about the whole experiment.

Parameters
  • path_diann_output_folder (str) – Path to the DIANN output folder with all output files needed.

  • experiment (str) – The name of the experiment.

  • fasta (pyteomics.fasta.IndexedUniProt) – The object containing information about all proteins from the fasta file.

Returns

The function returns three pandas data frame with the extracted information about proteins, peptides, and summary information about the whole experiment.

Return type

list of pd.DataFrames

alphaviz.io.import_diann_stats(filepath: str, experiment: str)[source]

Read the DIANN output .stats.tsv file.

Parameters
  • filepath (str) – Full path to the .stats.tsv file.

  • experiment (str) – The name of the experiment.

Returns

The output data frame contains summary information about the whole experiment.

Return type

pd.DataFrame

alphaviz.io.import_mq_all_peptides(filepath: str) DataFrame[source]

Read some columns from the output file allPeptides.txt of MaxQuant software.

Parameters

filepath (str) – Full path to the allPeptides.txt file.

Returns

The output data frame contains information about the following MQ columns:
  • ’Pasef MS/MS IDs’ (‘list’ type),

  • ’MS/MS scan number’ (‘int’ type).

The rows of the data frame with missing ‘MS/MS scan number’ values are dropped.

Return type

pd.DataFrame

alphaviz.io.import_mq_evidence(filepath: str, experiment: str) DataFrame[source]

Read some columns from the output file evidence.txt of MaxQuant software.

Parameters
  • filepath (str) – Full path to the evidence.txt file.

  • experiment (str) – The name of the experiment.

Returns

The output data frame contains information about the following MQ columns:
  • ’Sequence’,

  • ’Length’ (‘int’ type),

  • ’Acetyl (Protein N-term)’ (renamed to ‘Acetylation (N-term)’) (‘int’ type),

  • ’Oxidation (M)’ (‘int’ type),

  • ’Proteins’,

  • ’Retention time’ (‘float:.4d’ type),

  • ’Mass’ (‘float:.4d’ type),

  • ’m/z’ (‘float:.4d’ type),

  • ’Charge’ (‘category’ type),

  • ’Intensity’ (‘int’ type),

  • ’1/K0’ (‘float:.4d’ type),

  • ’MS/MS count’ (‘category’ type),

  • ’MS/MS scan number’ (‘int’ type),

  • ’Gene names’ (‘category’ type),

  • ’Score’ (renamed to ‘Andromeda score’) (‘int’ type),

  • ’Raw file’ (‘category’ type),

  • ’Uncalibrated mass error [ppm]’ (‘float:.4d’ type),

  • ’Mass error [ppm]’ (‘float:.4d’ type),

  • ’Modified sequence’.

Renamed columns are marked as is the output data type of all columns. The rows of the data frame with missing ‘MS/MS scan number’ values are dropped.

Return type

pd.DataFrame

alphaviz.io.import_mq_msms(filepath: str, experiment: str) DataFrame[source]

Read some columns from the output file msms.txt of MaxQuant software.

Parameters

filepath (str) – Full path to the msms.txt file.

Returns

The output data frame contains information about the following MQ columns:
  • ’Scan number’ (‘int’ type),

  • ’Matches’,

  • ’Masses’,

  • ’Mass deviations [Da]’,

  • ’Mass deviations [ppm]’.

Return type

pd.DataFrame

alphaviz.io.import_mq_output(necessary_files: list, path_mq_output_folder: str, experiment: str)[source]

Read all specified files from the MQ output folder and returns the data frames for each of the files.

Parameters
  • necessary_files (list) – A list of strings containing the names of the MQ output files with extensions, e.g. [‘allPeptides.txt’, ‘msms.txt’].

  • path_mq_output_folder (str) – Path to the MaxQuant output folder with all output files needed.

  • experiment (str) – The name of the experiment.

Returns

For each of the specified MQ output files, the function returns a pandas data frame with the extracted information.

Return type

generator

alphaviz.io.import_mq_protein_groups(filepath: str, experiment: str) DataFrame[source]

Read the output file proteinGroups.txt of MaxQuant software.

Parameters
  • filepath (str) – Full path to the proteinGroups.txt file.

  • experiment (str) – The name of the experiment.

Returns

  • # pd.DataFrame

  • # The output data frame contains information about the following MQ columns

  • # - ‘Protein IDs’,

  • # - ‘Protein names’,

  • # - ‘Gene names’,

  • # - ‘Number of proteins’ (renamed to ‘# proteins’),

  • # - ‘Mol. weight [kDa]’ (renamed to ‘Mol weight, kDa’),

  • # - f’Peptides Exp_{experiment}’ (renamed to ‘(EXP) # peptides’),

  • # - f’Unique peptides Exp_{experiment}’ (renamed to ‘(EXP) # unique peptides’),

  • # - f’Sequence coverage Exp_{experiment} [%]’ (renamed to ‘(EXP) Seq coverage, %’),

  • # - ‘MS/MS count’ (renamed to ‘# MS/MS’),

  • # - ‘Sequence lengths’,

  • # Renamed columns are marked. The rows of the data frame with missing ‘Gene names’ values are dropped.

alphaviz.io.import_mq_summary(filepath: str) DataFrame[source]

Read the output file summary.txt of MaxQuant software.

Parameters

filepath (str) – Full path to the msms.txt file.

Returns

The output data frame contains summary information of all the experiments.

Return type

pd.DataFrame

alphaviz.io.read_fasta(filepath: str) dict[source]

Read the fasta file using the pyteomics package.

Parameters

filepath (str) – Full path to the .fasta file.

Returns

The output object allows access to all available information in the fasta file using the protein ID.

Return type

pyteomics.fasta.IndexedUniProt object

alphaviz.io.read_file(filepath: str, column_names: list) DataFrame[source]

Enable reading the file and retrieving the values from the specified columns. Compared to function pd.read_csv() it gains significant time if the file is huge and is only a few ms slower for small files.

Parameters
  • filepath (str) – Full path to the file.

  • column_names (list) – A list of column names to be read.

Returns

This data frame contains data from all columns of the specified file.

Return type

pd.DataFrame