|
Results Format
At the completion of a search, a report is displayed that
provides an overview of the results. Various links lead to more detailed
"views" of the experimental and calculated data.
The best way to describe the contents of
these pages is by reference to an example. Use one of the following links
to open a results page in a new browser window, then swap between the two
browser windows as required.
The Protein Summary report is intended for peptide mass fingerprint
results. It is possible to display MS/MS results in this format, but it is not recommended to do so.
The default format for peptide mass fingerprint results is a Concise Protein Summary, where proteins
that match the same set, or a sub-set, of mass values are grouped into a single hit. The
intention is to provide a one page summary of the search results. You can use the format controls to switch
to a Full Protein Summary, where each protein hit is listed separately, together with details
of individual mass value matches.
For MS/MS searches of less than 1000 spectra, the default
report format is the Peptide Summary. This provides the clearest and most
complete picture of the results, especially if the sample was a protein mixture. If there are no
peptide matches, only molecular weight matches, then a Protein Summary report will be generated
instead.
You can use the format controls to switch to a Select Summary, which is
similar to a Peptide Summary, but provides a very compact
view of the results. The Select Summary is the default when the number of MS/MS spectra exceeds 1000.
For searches of less than 1000 MS/MS spectra, you can also choose a Protein Summary, but
this format is not recommended for MS/MS data, because it can give a misleading view of the results.
In a Peptide Summary report, if one or more UniGene indexes have been configured for
the database being searched, there will be the option to
generate a report in which the protein matches are clustered into UniGene
families.
If you are submitting MS/MS searches to an in-house Mascot server, you will also have the option
to create an Archive Report. This is simply an edited Peptide Summary report, that only includes the
protein hits you select.
The final choice on the list of report formats is always Export Search
Results. This enables the results to be exported in a number of "machine readable"
formats, such as XML and CSV.
Header
At the top of the report are a few lines to identify the search uniquely:
search title, date, user name, etc. The database is identified with either a release
number or an ISO datestamp. Depending on the report type, one or more of the top
scoring protein hits may be listed.
Search Parameters
At the foot of the report, search parameters are summarised for the record. Descriptions
of individual search parameters can be found here.
In the case of a Select Summary, all of the search parameters are
placed in the header.
Following the header, is a histogram of the protein score distribution. The
50 best matching proteins are divided into 16 groups according to their score,
and the heights of the bars show the number of matches in each group.
In the case of a peptide mass fingerprint, the protein score is a measure of
the statistical significance of a match. The region in which random matches may be expected
is shaded green. This region extends up to
the significance threshold, which has a default setting of 5%. If a score falls in the green
shaded area, there is greater than a 5% probability that the match was a random event, of no
significance. Conversely, a match in the unshaded part of the histogram has less
than a 5% probability of being a random event. It is quite common to see several proteins getting
the same high score. Even if the protein sequences in the database are non-identical,
the same group of matched peptides may occur in multiple proteins.
In the case of MS/MS data, it is the ions scores for individual peptide matches that
are statistically meaningful. A peptide summary report groups peptide matches into
protein hits, and derives a protein score from the combined ions scores, but it is not
possible to assign a rigorous significance threshold to the protein score.
These controls enable the report format to be modified. After making changes, press the
"Format As" button to reload the report using the new settings.
For a peptide mass fingerprint search, there are just three controls:
- Report format Choose from the list of available formats
- Significance threshold The default significance threshold is p < 0.05. You can
change this to any value in the range 0.1 to 1E-18.
- Maximum number of hits This value was initially chosen when the search was submitted.
Enter a positive integer if you wish to re-specify the number of protein hits to report.
Of course, the total number of hits actually found by the search may be less. Note that for a Protein
Summary report, the maximum number of hits is 50, because this format is intended for peptide
mass fingerprint results. There is no such limit for a Peptide Summary Report.
Entering the word AUTO or a value of 0 will display all of the hits
that have a protein score exceeding the significance threshold, plus one extra hit.
For a search of MS/MS data, there are several additional controls:
- Standard or MudPIT scoring MudPIT scoring is a more aggressive protein score that removes
protein hits that have high protein scores purely because they have a large number of low-scoring
peptide matches. It is the default protein
score for searches that have more than 1000 queries. If you are searching a very small database, you may
wish to select this scoring mode for searches with fewer than 1000 queries.
- Ions score cut-off By setting this to (say) 20, you cut out all of the very low scoring,
random peptide matches. This means that homologous proteins are more likely to collapse into a single hit.
- Show sub-sets By default, each hit in a Peptide Summary shows the set of proteins that match a
particular set of peptides. Proteins that match a sub-set of those peptides are not shown. You can choose
to show these additional protein hits, but be aware that this may make the report very much longer.
- Show or Suppress pop-ups The JavaScript pop-up windows, that show the top 10 peptide matches
for each query, are very useful, but they make the HTML report larger and slower to load in a web
browser. If you have a report that never seems to load, or is very slow to scroll, try suppressing the
pop-ups.
- Sort unassigned These are sorting options for the list of peptide matches that are not
assigned to protein hits. Descending score makes it easy to see whether there are any good matches. If so,
you will want to increase the number of protein hits or set it to AUTO so as to pull these matches
into the main body of the report. Ascending query number is the same as ascending precursor Mr.
Descending intensity allows you to find strong spectra that have failed to get a match. These could
be candidates for de novo sequencing.
- Require bold red Requiring a protein hit to include at least one bold red peptide match
is a good way to remove duplicate homologous proteins from a report. See the discussion
in Results Interpretation for further information.
- UniGene index (This control is only displayed for a database where UniGene indexes are
defined). Choose the UniGene index to be used to cluster the protein hits
into gene based families.
The (optional) overview table provides an animated summary of the results.
Each row of the overview table represents a query, i.e. a peptide, while each
column represents a protein "hit".
Where a protein contains a peptide match, the table cell contains an LED style
indicator. This indicator will light up when it is under the mouse cursor, along with
all the other indicators in the row that correspond to the same peptide. Even when the
sequence database is non-identical, there may still be extensive homology between
entries, and the overview table indicators provide a rapid means of identifying which peptides are common
to which proteins.
In addition to lighting up the indicators, moving the mouse cursor over a cell
displays the query title (if any), the protein accession number, and the peptide
sequence in the three text fields above the table. This information is repeated in the
status bar at the bottom of the browser window, in case the fields at the top of the
table have scrolled out of view.
Clicking on one of the indicators will generate a more detailed view which depends
on the type of search. In the case of a peptide mass fingerprint, clicking on any indicator
in a column will lead to the Protein View of that column. In the case
of a sequence query or an MS/MS ions search, clicking on an indicator will lead to the
Peptide View for that match.
Clicking on a column header cell will jump down the page to the corresponding
entry in the detailed results list.
The cells in the first column of the overview table identify each
query by the experimental m/z value of the peptide. When the mouse cursor is
moved over these cells, the query title (if any) is displayed in one of the text fields above
the table. Each cell also contains a check box, which can be used to select a sub-set
of the peptides for a repeat search.
A search can easily be repeated, so as to investigate the effect of changes in search
parameters. Queries can selected in the result report, then loaded into a search form,
where the search parameters can be modified.
In a Protein Summary report, individual queries can only be selected if an overview table is
displayed. Otherwise, the choice is between repeating the search with all queries or
with those queries that do not match into the top scoring protein hit. In a Peptide
Summary report, checkboxes for selecting queries are always included in the body of the report.
If you wish to perform an error tolerant search
of MS/MS data, this is implemented as a special type of repeat search.
On the Matrix Science public web site, you can only search an EST database using a repeat search.
There are two reasons
for this. First, because an unknown sample could easily be a contaminant, such as BSA or
keratin, which can be identified more rapidly and reliably by searching against a protein
database. Second, because the six frame translation of an EST database contains many times as
many entries as one of the non-redundant protein databases. This means that searches of
EST consume a great deal of CPU time and should not be submitted casually.
The body of the report contains a tabular summary of the
best matching proteins. The number of proteins shown is specified in the
search form, up to a maximum of 50. An example of an entry is shown here:
|
|
|
|
For each protein, the first line contains the accession string,
(linked to the corresponding Protein View), the
protein molecular weight, and the protein
score. Expect is the number of times we would expect to obtain an equal or
higher score, purely by chance. The lower this expectation value, the more
significant the result.
This is followed by a brief descriptive title, then a table
summarising the matched peptides. The table columns contain:
- Experimental m/z value
- Experimental m/z transformed to a relative molecular mass
- Relative molecular mass calculated from the matched peptide sequence
- Difference (error) between the experimental and calculated masses
- Inclusive numbering of the residues, starting with 1 for the
N-terminal residue of the intact protein
- Number of missed cleavage sites
- Ions score (not present in peptide mass fingerprint results)
- Sequence of the peptide in 1-letter code. The residues that
bracket the peptide sequence in the protein are also shown, delimited
by periods. If the peptide forms the protein terminus, then a dash
is shown instead.
If any variable modifications are found in a peptide, these are listed
after the sequence string.
The total score for a protein match contains contributions from the peptide mass fingerprint,
ions scores from any queries containing MS/MS data, and scores for any sequence query qualifiers, such as
seq, comp, tag, and etag. The Protein Summary is the
correct report for a peptide mass fingerprint search, but it is usually
the wrong report for an MS/MS search, especially a data set
containing peptides from a complex mixture of proteins.
If you choose to use the Protein Summary for MS/MS data, please be aware that the
protein score and expectation value may be misleading. For example, in the case of a
single peptide MS/MS spectrum, scoring the peptide molecular mass as if it was
a peptide mass fingerprint has no meaning. A further disadvantage of the Protein Summary for MS/MS data
is that the hit list can be swamped by proteins which are well represented
in the database, e.g. actins. Proteins with just one or two peptide matches may not make it into the top
50, so are missed. The Peptide Summary, which has no such limit, is the
intended report for MS/MS data, especially from a complex mixture.
The body of the Peptide Summary report contains a tabular listing of the proteins, sorted by
descending protein score. The number of proteins hits reported can be specified
when the search is submitted. If this is set to AUTO, all the significant matches
will be listed, however many that is. An example of an entry is shown here:
|
|
|
|
For each protein, the first line contains the accession string,
(linked to the corresponding protein view), the
protein molecular weight, a non-probabilistic protein score,
derived from the ions scores, and the number of peptide
matches. This is followed by a brief descriptive title, then a table
summarising the matched peptides. The table columns contain:
- If the report does not contain an overview table, then checkboxes
for selecting queries for a repeat search will appear in the first column of
any row containing the the first appearance of a top ranked match.
- Hyperlinked query number (see below).
- Experimental m/z value.
- Experimental m/z transformed to a relative molecular mass.
- Calculated relative molecular mass of the matched peptide.
- Difference (error) between the experimental and calculated masses.
- Number of missed enzyme cleavage sites.
- Ions score - If there are duplicate matches to the same peptide, then
the lower scoring matches are shown in brackets.
- Expectation value for the peptide match. (The number of times we would expect to obtain an equal or
higher score, purely by chance. The lower this value, the more
significant the result).
- Rank of the ions match, (1 to 10, where 1 is the best match).
- Sequence of the peptide in 1-letter code. The residues that
bracket the peptide sequence in the protein are also shown, delimited
by periods. If the peptide forms the protein terminus, then a dash
is shown instead.
If any variable modifications are found in a peptide, these are listed
after the sequence string.
An abbreviated listing follows for any proteins that contain the same set of peptide matches.
It is also possible to display proteins containing a sub-set of peptide matches, but this
is disabled by default. It can be enabled globally in the configuration file, mascot.dat, or
enabled for a single report by using the checkbox in the format controls.
Clicking on the query number link opens the Peptide View for the match
in a new browser window. Resting the mouse cursor over the query number link causes a pop-up window
(browser dependent) to appear, displaying the complete list of peptide matches for that query:
The pop-up window displays the query title, (if any), followed by one or two
significance thresholds, which are described in detail
here. Then there is a table containing
information on the highest scoring peptide matches for the query:
- Ions score
- Difference (error) between the experimental and calculated masses
- Hit number of the (first) protein containing the peptide match. A plus sign
indicates that multiple proteins contain a match to this peptide
- Accession string of the (first) protein containing the peptide match. If there are
matches to this peptide in multiple proteins, the complete list is displayed in the
browser status bar (space permitting)
- Sequence of the peptide in 1-letter code. If a variable modification has been used
to obtain a match, the modified residue is underlined.
If the residues that bracket the peptide sequence are the same in all the proteins that
contain it, then these residues are also shown, delimited
by periods. If the peptide forms the protein terminus, then a dash
is shown instead.
The Select Summary was inspired by David Tabb's
DTASelect. It is very
similar to the Peptide Summary, but more compact because
multiple matches to the same peptide sequence are collapsed into a single line. Also,
the list of peptide matches that are not assigned to a protein hit is split off into a separate
report.
The Select Summary is especially useful for very large data sets, such as those from
MudPIT experiments.
The protein view of an entry on the hit list can be created by clicking on an indicator
in the overview table (peptide mass fingerprint), or by clicking on an accession number
in the results list (all searches).
Some information about the protein, the enzyme (if any), and any modifications are
printed at the top of the page. This is followed by the formatted sequence of the protein
in 1-letter code with matched peptides highlighted in bold, red type.
If the sequence database was nucleic acid, and the matches all came from a single frame,
the report will be very similar to that for a protein database entry. If the matches
come from multiple frames, because of a frame shift or splice, then only one frame at
a time will be displayed. A drop down list can be used to switch between frames.
The sequence block is followed by a detailed table of the
peptide matches. For an enzyme digest, you can also choose to display all the calculated peptides,
whether matched or not, including all partials up to the limit specified by the
Missed Cleavages parameter. The matched peptides
are shown in bold, red type, together with a link to the
corresponding peptide view. If no enzyme or a semi-specific enzyme was used, this option is
not available, and the table contains only the matched peptides.
If the enzyme was a mixture of independent enzymes, and you choose to display calculated peptides,
these will be shown for one enzyme component at a time. A drop down list can be used to switch between enzymes.
The formatted protein sequence shows highlights for all matches at all times.
The default sort order is start residue order. Controls are provided to re-display the table
sorted by increasing or decreasing peptide molecular weight
A graph displays the mass differences between the calculated and
experimental mass values for the protein match in the same units as were used to specify the peptide
mass error tolerance. There is also a figure for the RMS error of the set of matched mass values in ppm.
If available, at the bottom of the page, the full text of the sequence annotations is reproduced.
If the match is to a very long nucleic acid sequence, (greater than 30,000 bases by default),
the conventional Protein View is impractical. In this case, Mascot will automatically generate
a DDBJ/EMBL/GenBank format feature
table. For example:
|
BLASTCDS 422..469
/label=Q103
/colour=2
/note="Mascot match, ... sequence=GLGTDEDTLIEILASR"
/blastp_file="../data/20001016/FTGrCfc.dat"
/mass=1701.88
/score=82
/rank=1
/translation="GLGTDEDTLIEILASR"
BLASTCDS 603..650
/label=Q105
/colour=2
/note="Mascot match, ... sequence=SEDFGVNEDLGDSDAR"
/blastp_file="../data/20001016/FTGrCfc.dat"
/mass=1738.73
/score=82
/rank=2
/translation="SEDFGVNEDLGDSDAR"
|
By default, only matches with significant scores (p < 0.05) are output. A different score
threshold can be specified by appending &_featuretableminscore=X to the protein view URL, where
X is the score threshold.
The feature table can be saved to a text file and read into a genome browser such as
Artemis from the Sanger Centre.
This provides a very flexible and powerful way to view Mascot peptide matches in genomic sequence data.
The peptide view of a matched peptide can be created by clicking on an indicator
in the overview table (sequence search or MS/MS ions search), or by following a link from the
protein view.
The name of the protein and the 1-letter sequence of the peptide are
printed at the top of the page. Below this is a mass spectrum in which all of the matched fragment
ions are labelled. Note that a small interval around the peptide molecular ion (±2 Da)
has been omitted from the graph, reflecting the suppression of these data points in the
Mascot search.
Clicking the mouse within the spectrum can be used to zoom in by a factor of 2, so as to
show greater detail in crowded regions. Alternatively, controls above the spectrum can be
used to specify the plotted mass range directly.
The matched fragment ions are shown in tabular format below the spectrum. As in other
views, the sequence is shown in 1-letter code and matched values are highlighted in bold,
red type. The ion series are those specified by the
INSTRUMENT search parameter.
A graph displays the mass differences between the calculated and
experimental fragment ion mass values for this peptide match in the same units as were used to
specify the error tolerance. There is also a figure for the RMS error of the set of matched mass
values in ppm.
Finally, a link is provided to perform a BLAST search of the matched peptide sequence at
NCBI. If NCBI is busy, then copy the sequence to the clipboard and follow the final link to
a list of alternative BLAST engines.
One of the drawbacks of searching an EST database is that there are very few long sequences,
so that extended groupings of peptide matches into protein matches are rare. This can be rectified
with UniGene,
an index created by automatically partitioning GenBank sequences into a non-redundant
set of gene-oriented clusters. Each UniGene cluster is a list of the GenBank sequences, including EST's,
which represent a unique gene. It is not an attempt to produce a consensus sequence.
If one or more UniGene indexes have been configured for the database
being searched, there will be a format control to generate a species based UniGene report.
Following a Protein View link from a UniGene report will display a list of Unigene family members
in place of the standard Protein View.
There are a number of switches to modify the format of the result reports. Many of these have a global default,
set by a parameter in the Options section of mascot.dat. These defaults can be changed in an
individual report using the format controls,
or by appending the relevant switch to the report URL. Switches take the form label=value and
the delimiter between switches is an ampersand (&). For example, if the report URL was:
http://local-server/mascot/cgi/master_results.pl?file=../data/20040121/F001847.dat
The type of report could be changed by appending "REPTYPE=protein":
http://local-server/mascot/cgi/master_results.pl?file=../data/20040121/F001847.dat&REPTYPE=protein
Labels and values are not case sensitive. Note that many labels begin with an underscore character. Values that are not
literal strings are shown in italics.
master_results.pl
URL |
mascot.dat |
Value |
Description |
reptype |
|
peptide |
Peptide Summary |
archive |
Archive Report |
concise |
Concise Protein Summary |
protein |
Full Protein Summary |
select |
Select Summary (hits) |
unassigned |
Select Summary (unassigned) |
report |
|
auto |
Report all significant hits |
N |
Report N hits |
_showsubsets |
ShowSubSets |
1 |
Set value to 1 to report Peptide Summary hits that match a subset
of peptides. Default is 0. |
_requireboldred |
RequireBoldRed |
1 |
Set value to 1 to report Peptide Summary hits only if they
contain at least one "bold red" peptide. Default is 0. |
_showallfromerrortolerant |
ShowAllFromErrorTolerant |
1 |
Set value to 1 to report all hits from an error tolerant search, including
the garbage. Default is 0. |
_sigthreshold |
SigThreshold |
N |
Probability to use for the significance threshold.
Range is 0.1 to 1E-18. Default is 0.05. |
_sortunassigned |
SortUnassigned |
scoredown |
Sort unassigned matches by descending score, (default) |
queryup |
Sort unassigned matches by ascending query number |
intdown |
Sort unassigned matches by descending intensity |
_ignoreionsscorebelow |
IgnoreIonsScoreBelow |
N |
Any ions scores below this value are set to 0. Floating point number,
default 0.0. |
_showpopups |
|
true |
Show top 10 peptide matches fro each query in JavaScript pop-up, (default) |
false |
Suppress JavaScript pop-ups. |
_alwaysgettitle |
|
1 |
Set to 1 to force reports to fetch Fasta titles from database when they are not
included in the result file. Default is 0. |
_mudpit |
Mudpit |
N |
Number of queries at which protein score calculation switches to large search mode.
Default 1000 |
protein_view.pl
URL |
mascot.dat |
Value |
Description |
sort |
|
startup |
Sort table of peptides by ascending start residue number, (default) |
massup |
Sort table of peptides by ascending mass |
massdown |
Sort table of peptides by descending mass |
showall |
|
true |
Show all calculated peptides, not just matched peptides |
false |
Show just matched peptides, (default) |
_showallfromerrortolerant |
ShowAllFromErrorTolerant |
1 |
Set value to 1 to report all hits from an error tolerant search, including
the garbage. Default is 0. |
_sigthreshold |
SigThreshold |
N |
Probability to use for the significance threshold.
Range is 0.1 to 1E-18. Default is 0.05. |
_mudpit |
Mudpit |
N |
Number of queries at which protein score calculation switches to large search mode.
Default 1000 |
_featuretablelength |
FeatureTableLength |
N |
Length of database entry in bases at which protein view switches to GenBank output.
Default 30000 |
_featuretableminscore |
FeatureTableMinScore |
N |
Score threshold for inclusion in GenBank feature table format, if undefined then
report includes matches that exceed lower of homology or identity threshold |
|
|
|