Matrix Science
Home Mascot Help  
   
  Help > Results Format   
 
 
 
On this page
Types of Report
Common Features
Score Distribution
Format Controls
Overview Table
Repeating a search
Protein Summary
Peptide Summary
Select Summary
Protein View
Genomic sequences
Peptide View
UniGene
URL Switches
Related Topics
Results Interpretation
 

Results Format

At the completion of a search, a report is displayed that provides an overview of the results. Various links lead to more detailed "views" of the experimental and calculated data.

The best way to describe the contents of these pages is by reference to an example. Use one of the following links to open a results page in a new browser window, then swap between the two browser windows as required.

Types of Report

The Protein Summary report is intended for peptide mass fingerprint results. It is possible to display MS/MS results in this format, but it is not recommended to do so. The default format for peptide mass fingerprint results is a Concise Protein Summary, where proteins that match the same set, or a sub-set, of mass values are grouped into a single hit. The intention is to provide a one page summary of the search results. You can use the format controls to switch to a Full Protein Summary, where each protein hit is listed separately, together with details of individual mass value matches.

For MS/MS searches of less than 1000 spectra, the default report format is the Peptide Summary. This provides the clearest and most complete picture of the results, especially if the sample was a protein mixture. If there are no peptide matches, only molecular weight matches, then a Protein Summary report will be generated instead.

You can use the format controls to switch to a Select Summary, which is similar to a Peptide Summary, but provides a very compact view of the results. The Select Summary is the default when the number of MS/MS spectra exceeds 1000. For searches of less than 1000 MS/MS spectra, you can also choose a Protein Summary, but this format is not recommended for MS/MS data, because it can give a misleading view of the results.

In a Peptide Summary report, if one or more UniGene indexes have been configured for the database being searched, there will be the option to generate a report in which the protein matches are clustered into UniGene families.

If you are submitting MS/MS searches to an in-house Mascot server, you will also have the option to create an Archive Report. This is simply an edited Peptide Summary report, that only includes the protein hits you select.

The final choice on the list of report formats is always Export Search Results. This enables the results to be exported in a number of "machine readable" formats, such as XML and CSV.

Common Features

Header

At the top of the report are a few lines to identify the search uniquely: search title, date, user name, etc. The database is identified with either a release number or an ISO datestamp. Depending on the report type, one or more of the top scoring protein hits may be listed.

Search Parameters

At the foot of the report, search parameters are summarised for the record. Descriptions of individual search parameters can be found here. In the case of a Select Summary, all of the search parameters are placed in the header.

Score Distribution

Following the header, is a histogram of the protein score distribution. The 50 best matching proteins are divided into 16 groups according to their score, and the heights of the bars show the number of matches in each group.

In the case of a peptide mass fingerprint, the protein score is a measure of the statistical significance of a match. The region in which random matches may be expected is shaded green. This region extends up to the significance threshold, which has a default setting of 5%. If a score falls in the green shaded area, there is greater than a 5% probability that the match was a random event, of no significance. Conversely, a match in the unshaded part of the histogram has less than a 5% probability of being a random event. It is quite common to see several proteins getting the same high score. Even if the protein sequences in the database are non-identical, the same group of matched peptides may occur in multiple proteins.

In the case of MS/MS data, it is the ions scores for individual peptide matches that are statistically meaningful. A peptide summary report groups peptide matches into protein hits, and derives a protein score from the combined ions scores, but it is not possible to assign a rigorous significance threshold to the protein score.

Format Controls

These controls enable the report format to be modified. After making changes, press the "Format As" button to reload the report using the new settings.

For a peptide mass fingerprint search, there are just three controls:

  • Report format Choose from the list of available formats
  • Significance threshold The default significance threshold is p < 0.05. You can change this to any value in the range 0.1 to 1E-18.
  • Maximum number of hits This value was initially chosen when the search was submitted. Enter a positive integer if you wish to re-specify the number of protein hits to report. Of course, the total number of hits actually found by the search may be less. Note that for a Protein Summary report, the maximum number of hits is 50, because this format is intended for peptide mass fingerprint results. There is no such limit for a Peptide Summary Report. Entering the word AUTO or a value of 0 will display all of the hits that have a protein score exceeding the significance threshold, plus one extra hit.

For a search of MS/MS data, there are several additional controls:

  • Standard or MudPIT scoring MudPIT scoring is a more aggressive protein score that removes protein hits that have high protein scores purely because they have a large number of low-scoring peptide matches. It is the default protein score for searches that have more than 1000 queries. If you are searching a very small database, you may wish to select this scoring mode for searches with fewer than 1000 queries.
  • Ions score cut-off By setting this to (say) 20, you cut out all of the very low scoring, random peptide matches. This means that homologous proteins are more likely to collapse into a single hit.
  • Show sub-sets By default, each hit in a Peptide Summary shows the set of proteins that match a particular set of peptides. Proteins that match a sub-set of those peptides are not shown. You can choose to show these additional protein hits, but be aware that this may make the report very much longer.
  • Show or Suppress pop-ups The JavaScript pop-up windows, that show the top 10 peptide matches for each query, are very useful, but they make the HTML report larger and slower to load in a web browser. If you have a report that never seems to load, or is very slow to scroll, try suppressing the pop-ups.
  • Sort unassigned These are sorting options for the list of peptide matches that are not assigned to protein hits. Descending score makes it easy to see whether there are any good matches. If so, you will want to increase the number of protein hits or set it to AUTO so as to pull these matches into the main body of the report. Ascending query number is the same as ascending precursor Mr. Descending intensity allows you to find strong spectra that have failed to get a match. These could be candidates for de novo sequencing.
  • Require bold red Requiring a protein hit to include at least one bold red peptide match is a good way to remove duplicate homologous proteins from a report. See the discussion in Results Interpretation for further information.
  • UniGene index (This control is only displayed for a database where UniGene indexes are defined). Choose the UniGene index to be used to cluster the protein hits into gene based families.

Overview Table

The (optional) overview table provides an animated summary of the results.

Each row of the overview table represents a query, i.e. a peptide, while each column represents a protein "hit". Where a protein contains a peptide match, the table cell contains an LED style indicator. This indicator will light up when it is under the mouse cursor, along with all the other indicators in the row that correspond to the same peptide. Even when the sequence database is non-identical, there may still be extensive homology between entries, and the overview table indicators provide a rapid means of identifying which peptides are common to which proteins.

In addition to lighting up the indicators, moving the mouse cursor over a cell displays the query title (if any), the protein accession number, and the peptide sequence in the three text fields above the table. This information is repeated in the status bar at the bottom of the browser window, in case the fields at the top of the table have scrolled out of view.

Clicking on one of the indicators will generate a more detailed view which depends on the type of search. In the case of a peptide mass fingerprint, clicking on any indicator in a column will lead to the Protein View of that column. In the case of a sequence query or an MS/MS ions search, clicking on an indicator will lead to the Peptide View for that match.

Clicking on a column header cell will jump down the page to the corresponding entry in the detailed results list.

The cells in the first column of the overview table identify each query by the experimental m/z value of the peptide. When the mouse cursor is moved over these cells, the query title (if any) is displayed in one of the text fields above the table. Each cell also contains a check box, which can be used to select a sub-set of the peptides for a repeat search.

Repeating a search

A search can easily be repeated, so as to investigate the effect of changes in search parameters. Queries can selected in the result report, then loaded into a search form, where the search parameters can be modified.

In a Protein Summary report, individual queries can only be selected if an overview table is displayed. Otherwise, the choice is between repeating the search with all queries or with those queries that do not match into the top scoring protein hit. In a Peptide Summary report, checkboxes for selecting queries are always included in the body of the report.

If you wish to perform an error tolerant search of MS/MS data, this is implemented as a special type of repeat search.

On the Matrix Science public web site, you can only search an EST database using a repeat search. There are two reasons for this. First, because an unknown sample could easily be a contaminant, such as BSA or keratin, which can be identified more rapidly and reliably by searching against a protein database. Second, because the six frame translation of an EST database contains many times as many entries as one of the non-redundant protein databases. This means that searches of EST consume a great deal of CPU time and should not be submitted casually.

Protein Summary

The body of the report contains a tabular summary of the best matching proteins. The number of proteins shown is specified in the search form, up to a maximum of 50. An example of an entry is shown here:
 
Protein Summary hit
 
For each protein, the first line contains the accession string, (linked to the corresponding Protein View), the protein molecular weight, and the protein score. Expect is the number of times we would expect to obtain an equal or higher score, purely by chance. The lower this expectation value, the more significant the result. This is followed by a brief descriptive title, then a table summarising the matched peptides. The table columns contain:
  1. Experimental m/z value
  2. Experimental m/z transformed to a relative molecular mass
  3. Relative molecular mass calculated from the matched peptide sequence
  4. Difference (error) between the experimental and calculated masses
  5. Inclusive numbering of the residues, starting with 1 for the N-terminal residue of the intact protein
  6. Number of missed cleavage sites
  7. Ions score (not present in peptide mass fingerprint results)
  8. Sequence of the peptide in 1-letter code. The residues that bracket the peptide sequence in the protein are also shown, delimited by periods. If the peptide forms the protein terminus, then a dash is shown instead.

If any variable modifications are found in a peptide, these are listed after the sequence string.

The total score for a protein match contains contributions from the peptide mass fingerprint, ions scores from any queries containing MS/MS data, and scores for any sequence query qualifiers, such as seq, comp, tag, and etag. The Protein Summary is the correct report for a peptide mass fingerprint search, but it is usually the wrong report for an MS/MS search, especially a data set containing peptides from a complex mixture of proteins.

If you choose to use the Protein Summary for MS/MS data, please be aware that the protein score and expectation value may be misleading. For example, in the case of a single peptide MS/MS spectrum, scoring the peptide molecular mass as if it was a peptide mass fingerprint has no meaning. A further disadvantage of the Protein Summary for MS/MS data is that the hit list can be swamped by proteins which are well represented in the database, e.g. actins. Proteins with just one or two peptide matches may not make it into the top 50, so are missed. The Peptide Summary, which has no such limit, is the intended report for MS/MS data, especially from a complex mixture.

Peptide Summary

The body of the Peptide Summary report contains a tabular listing of the proteins, sorted by descending protein score. The number of proteins hits reported can be specified when the search is submitted. If this is set to AUTO, all the significant matches will be listed, however many that is. An example of an entry is shown here:
 
Peptide Summary hit
 
For each protein, the first line contains the accession string, (linked to the corresponding protein view), the protein molecular weight, a non-probabilistic protein score, derived from the ions scores, and the number of peptide matches. This is followed by a brief descriptive title, then a table summarising the matched peptides. The table columns contain:
  1. If the report does not contain an overview table, then checkboxes for selecting queries for a repeat search will appear in the first column of any row containing the the first appearance of a top ranked match.
  2. Hyperlinked query number (see below).
  3. Experimental m/z value.
  4. Experimental m/z transformed to a relative molecular mass.
  5. Calculated relative molecular mass of the matched peptide.
  6. Difference (error) between the experimental and calculated masses.
  7. Number of missed enzyme cleavage sites.
  8. Ions score - If there are duplicate matches to the same peptide, then the lower scoring matches are shown in brackets.
  9. Expectation value for the peptide match. (The number of times we would expect to obtain an equal or higher score, purely by chance. The lower this value, the more significant the result).
  10. Rank of the ions match, (1 to 10, where 1 is the best match).
  11. Sequence of the peptide in 1-letter code. The residues that bracket the peptide sequence in the protein are also shown, delimited by periods. If the peptide forms the protein terminus, then a dash is shown instead.

If any variable modifications are found in a peptide, these are listed after the sequence string.

An abbreviated listing follows for any proteins that contain the same set of peptide matches. It is also possible to display proteins containing a sub-set of peptide matches, but this is disabled by default. It can be enabled globally in the configuration file, mascot.dat, or enabled for a single report by using the checkbox in the format controls.

Clicking on the query number link opens the Peptide View for the match in a new browser window. Resting the mouse cursor over the query number link causes a pop-up window (browser dependent) to appear, displaying the complete list of peptide matches for that query:

peptide summary pop-up

The pop-up window displays the query title, (if any), followed by one or two significance thresholds, which are described in detail here. Then there is a table containing information on the highest scoring peptide matches for the query:

  1. Ions score
  2. Difference (error) between the experimental and calculated masses
  3. Hit number of the (first) protein containing the peptide match. A plus sign indicates that multiple proteins contain a match to this peptide
  4. Accession string of the (first) protein containing the peptide match. If there are matches to this peptide in multiple proteins, the complete list is displayed in the browser status bar (space permitting)
  5. Sequence of the peptide in 1-letter code. If a variable modification has been used to obtain a match, the modified residue is underlined. If the residues that bracket the peptide sequence are the same in all the proteins that contain it, then these residues are also shown, delimited by periods. If the peptide forms the protein terminus, then a dash is shown instead.

Select Summary

The Select Summary was inspired by David Tabb's DTASelect. It is very similar to the Peptide Summary, but more compact because multiple matches to the same peptide sequence are collapsed into a single line. Also, the list of peptide matches that are not assigned to a protein hit is split off into a separate report.

The Select Summary is especially useful for very large data sets, such as those from MudPIT experiments.

Protein View

The protein view of an entry on the hit list can be created by clicking on an indicator in the overview table (peptide mass fingerprint), or by clicking on an accession number in the results list (all searches).

Some information about the protein, the enzyme (if any), and any modifications are printed at the top of the page. This is followed by the formatted sequence of the protein in 1-letter code with matched peptides highlighted in bold, red type.

If the sequence database was nucleic acid, and the matches all came from a single frame, the report will be very similar to that for a protein database entry. If the matches come from multiple frames, because of a frame shift or splice, then only one frame at a time will be displayed. A drop down list can be used to switch between frames.

The sequence block is followed by a detailed table of the peptide matches. For an enzyme digest, you can also choose to display all the calculated peptides, whether matched or not, including all partials up to the limit specified by the Missed Cleavages parameter. The matched peptides are shown in bold, red type, together with a link to the corresponding peptide view. If no enzyme or a semi-specific enzyme was used, this option is not available, and the table contains only the matched peptides.

If the enzyme was a mixture of independent enzymes, and you choose to display calculated peptides, these will be shown for one enzyme component at a time. A drop down list can be used to switch between enzymes. The formatted protein sequence shows highlights for all matches at all times.

The default sort order is start residue order. Controls are provided to re-display the table sorted by increasing or decreasing peptide molecular weight

A graph displays the mass differences between the calculated and experimental mass values for the protein match in the same units as were used to specify the peptide mass error tolerance. There is also a figure for the RMS error of the set of matched mass values in ppm.

If available, at the bottom of the page, the full text of the sequence annotations is reproduced.

Genomic sequences

If the match is to a very long nucleic acid sequence, (greater than 30,000 bases by default), the conventional Protein View is impractical. In this case, Mascot will automatically generate a DDBJ/EMBL/GenBank format feature table. For example:



BLASTCDS        422..469

               /label=Q103

               /colour=2

               /note="Mascot match, ... sequence=GLGTDEDTLIEILASR"

               /blastp_file="../data/20001016/FTGrCfc.dat"

               /mass=1701.88

               /score=82

               /rank=1

               /translation="GLGTDEDTLIEILASR"

BLASTCDS        603..650

               /label=Q105

               /colour=2

               /note="Mascot match, ... sequence=SEDFGVNEDLGDSDAR"

               /blastp_file="../data/20001016/FTGrCfc.dat"

               /mass=1738.73

               /score=82

               /rank=2

               /translation="SEDFGVNEDLGDSDAR"

By default, only matches with significant scores (p < 0.05) are output. A different score threshold can be specified by appending &_featuretableminscore=X to the protein view URL, where X is the score threshold.

The feature table can be saved to a text file and read into a genome browser such as Artemis from the Sanger Centre. This provides a very flexible and powerful way to view Mascot peptide matches in genomic sequence data.

Peptide View

The peptide view of a matched peptide can be created by clicking on an indicator in the overview table (sequence search or MS/MS ions search), or by following a link from the protein view.

The name of the protein and the 1-letter sequence of the peptide are printed at the top of the page. Below this is a mass spectrum in which all of the matched fragment ions are labelled. Note that a small interval around the peptide molecular ion (±2 Da) has been omitted from the graph, reflecting the suppression of these data points in the Mascot search.

Clicking the mouse within the spectrum can be used to zoom in by a factor of 2, so as to show greater detail in crowded regions. Alternatively, controls above the spectrum can be used to specify the plotted mass range directly.

The matched fragment ions are shown in tabular format below the spectrum. As in other views, the sequence is shown in 1-letter code and matched values are highlighted in bold, red type. The ion series are those specified by the INSTRUMENT search parameter.

A graph displays the mass differences between the calculated and experimental fragment ion mass values for this peptide match in the same units as were used to specify the error tolerance. There is also a figure for the RMS error of the set of matched mass values in ppm.

Finally, a link is provided to perform a BLAST search of the matched peptide sequence at NCBI. If NCBI is busy, then copy the sequence to the clipboard and follow the final link to a list of alternative BLAST engines.

UniGene

One of the drawbacks of searching an EST database is that there are very few long sequences, so that extended groupings of peptide matches into protein matches are rare. This can be rectified with UniGene, an index created by automatically partitioning GenBank sequences into a non-redundant set of gene-oriented clusters. Each UniGene cluster is a list of the GenBank sequences, including EST's, which represent a unique gene. It is not an attempt to produce a consensus sequence.

If one or more UniGene indexes have been configured for the database being searched, there will be a format control to generate a species based UniGene report.

Following a Protein View link from a UniGene report will display a list of Unigene family members in place of the standard Protein View.

URL Switches

There are a number of switches to modify the format of the result reports. Many of these have a global default, set by a parameter in the Options section of mascot.dat. These defaults can be changed in an individual report using the format controls, or by appending the relevant switch to the report URL. Switches take the form label=value and the delimiter between switches is an ampersand (&). For example, if the report URL was:
http://local-server/mascot/cgi/master_results.pl?file=../data/20040121/F001847.dat
The type of report could be changed by appending "REPTYPE=protein":
http://local-server/mascot/cgi/master_results.pl?file=../data/20040121/F001847.dat&REPTYPE=protein

Labels and values are not case sensitive. Note that many labels begin with an underscore character. Values that are not literal strings are shown in italics.

master_results.pl

URL mascot.dat Value Description
reptype   peptide Peptide Summary
archive Archive Report
concise Concise Protein Summary
protein Full Protein Summary
select Select Summary (hits)
unassigned Select Summary (unassigned)
report   auto Report all significant hits
N Report N hits
_showsubsets ShowSubSets 1 Set value to 1 to report Peptide Summary hits that match a subset of peptides. Default is 0.
_requireboldred RequireBoldRed 1 Set value to 1 to report Peptide Summary hits only if they contain at least one "bold red" peptide. Default is 0.
_showallfromerrortolerant ShowAllFromErrorTolerant 1 Set value to 1 to report all hits from an error tolerant search, including the garbage. Default is 0.
_sigthreshold SigThreshold N Probability to use for the significance threshold. Range is 0.1 to 1E-18. Default is 0.05.
_sortunassigned SortUnassigned scoredown Sort unassigned matches by descending score, (default)
queryup Sort unassigned matches by ascending query number
intdown Sort unassigned matches by descending intensity
_ignoreionsscorebelow IgnoreIonsScoreBelow N Any ions scores below this value are set to 0. Floating point number, default 0.0.
_showpopups   true Show top 10 peptide matches fro each query in JavaScript pop-up, (default)
false Suppress JavaScript pop-ups.
_alwaysgettitle   1 Set to 1 to force reports to fetch Fasta titles from database when they are not included in the result file. Default is 0.
_mudpit Mudpit N Number of queries at which protein score calculation switches to large search mode. Default 1000

protein_view.pl

URL mascot.dat Value Description
sort   startup Sort table of peptides by ascending start residue number, (default)
massup Sort table of peptides by ascending mass
massdown Sort table of peptides by descending mass
showall   true Show all calculated peptides, not just matched peptides
false Show just matched peptides, (default)
_showallfromerrortolerant ShowAllFromErrorTolerant 1 Set value to 1 to report all hits from an error tolerant search, including the garbage. Default is 0.
_sigthreshold SigThreshold N Probability to use for the significance threshold. Range is 0.1 to 1E-18. Default is 0.05.
_mudpit Mudpit N Number of queries at which protein score calculation switches to large search mode. Default 1000
_featuretablelength FeatureTableLength N Length of database entry in bases at which protein view switches to GenBank output. Default 30000
_featuretableminscore FeatureTableMinScore N Score threshold for inclusion in GenBank feature table format, if undefined then report includes matches that exceed lower of homology or identity threshold
 
 
Copyright © 2005 Matrix Science Ltd. All Rights Reserved.