The purpose of peptagram
is to visually compare peptide hits in proteins across different proteomics experiments.
Peptagram
makes visualizations of peptide hits of results generated from Morpheus, Mascot, Maxqaunt, ProteinProphet/PeptideProphet, X!Tandem and ProteinPilot.
Some test data is provided in example_data.zip
. If you unzip this in the peptagram
directory, you should get a sub-directory example_data
, which contains a list of sub-directories mascot
, morpheus
etc.
For each search engine, there is a separate program to generate a peptagram. Let's look at one in detail in the next section.
Morpheus is a fast search engine designed for high-quality data. As Morpheus does not come with a bundled viewer, peptagram
provides a unique tool to view Morpheus search results.
The scripts that process Morpheus have morpheus_peptagram
in their name:
mac_morpheus_peptagram.command
- which can be clicked in Finder on Mac OSXwin_morpheus_peptagram.bat
- which can be clicked in WIndows Filer Explorerdo_morpheus_peptagram.py
- which is run on the command-line as python do_morpheus_peptagram.py -i
To run an automated test, unzip example_data.zip
. It should create an example_data
directory containing sub-directories such as morpheus
, mascot
etc. Then to run the test:
python do_morpheus_peptagram.py test
To use morpheus_peptagram
, first start the program, then you should get a window that looks something like this:
To load the Morpheus files, click + PSMs.tsv
files and select all the .PSMs.tsv files that you want to compare. morpheus_peptagram
will figure out the corresponding .protein_group.tsv files from these filenames.
Once selected, you'll see a list of files:
You can now reorder the .PSMs.tsv files into your preferred order by dragging the ☰ icon.
Then you can scroll down to the bottom, and click the submit button:
If there are any errors encountered, they'll appear below the submit button. Hopefully the error message at the last line will help you trouble-shoot the problem, and then you can click submit again.
If it worked, you'll get a link to a newly created directory containing your peptagram:
Later, you might want to tweak the options, such as loading the spectra from the original .mzML files, or restricting the display to matches by the Q-score.
morpheus_peptagram
- processes Morpheus search engine results. Requires the modifications.tsv file for modified peptides, and optionally the .mzML file if you want to display spectramascot_peptagram
- processes Mascot .dat. Shows the spectra for each PSM. Requires the original .fasta file to get the full length protein sequencesmaxquant_peptagram
- process Maxquant txt/summary directories. Shows the matched ions in the spectra only. Requires the original fasta file to get the full length protein sequences.prophet_peptagram
- processes the TPP's prot.xml and pep.xml files. Requires the original .fasta file to get the full length protein sequencesxtandem_peptagram
- processes X!Tandem search results. Shows the spectra for each PSM.pilot_peptagram
- processes Protein Pilot .txt/.csv result files. Requires the original .fasta file to get the full length protein sequencesAs some peptagrams can get really big easily, there are a number of options to filter out low-quality matches, from using quality scores like pep and ionscore.
Some of the programs group similar proteins together (Morpheus, Maxquant, Protein Prophet). This effectively reduces the number of proteins, and therefore the size of the resultant peptagram. For the other search engines, you can load a text-file of seqids of contaminants to exclude. Or you can load another text-file of seqids that will be shown.
As well, you can limit the display to tryptic, semi-tryptic or modified peptides.
After you're created the peptagrams using the scripts described above, you can now recombine and edit them using reorder_peptagram
.
With reorder_peptagram
, you can load existing peptagrams, and stack all the rows of the different peptagrams against each other.
Then you reorder and relabel each row, before saving it to a new peptagram.
The one catch is that the sequence IDs of your proteins have to match. One way is to ensure that you use the same FASTA database for your search and that the sequence IDs are single words separated by a space from the description.