Proteomics

Database Searching

' Dr. Armel Nicolas

Ideally, each MS2 spectrum would contain enough information to go back unambiguously to the sequence of its parent precursor. Since this is actually rarely the case, a strategy called Database Searching is usually used instead.

MS2 fragmentation spectra are used to identify the sequence behind their MS1 parent peak. However, MS2 spectra rarely allow unambiguous precursor identification, because fragmentation may be incomplete, or too extensive, or several precursors may have been co-isolated. Thus, while de novo sequencing is trivial for Genomics and Transcriptomics, it remains an elusive Holy Grail in Proteomics.

Instead, a strategy called Database Searching is commonly used: observed spectra are compared to theoretical spectra predicted from expected peptide sequences and assigned the best matching sequence and a score. Only peptide sequences existing in the database and with Post Translational Modifications included in the search parameters can be identified.

The False Discovery Rate (FDR) is controlled by also searching spectra against a database of  “fake” decoy peptides, typically derived from digesting in silico the inverted or scrambled sequences of expected proteins.

For a given Peptide-Spectrum Match (PSM), a Posterior Error Probability (PEP, a local form of FDR) is calculated as the proportion of decoy peptides expected at the local score – in practice, because there will only ever be one peptide with a given exact score, this is actually calculated from models of the two distributions. A peptide’s Q-value can equivalently be formulated as the lowest FDR at which the given peptide would be reported; the FDR if the threshold was set at this peptide’s score; or the percentage of decoy hits among PSMs with higher or equal scores.

 

Point of View
Related Posts: