Detecting Amino Acid Variants Using Next-Generation Protein Sequencing (NGPS)
Authors: Mathivanan Chinnaraj, Jianan Lin, Kristin Blacklock, Eric Hermes, Michael Meyer,
Douglas Pike, John Vieceli, Ilya Chorny
doi: https://doi.org/10.1101/2024.12.17.629036
This article is a preprint and has not been certified by peer review [what does this mean?]
Abstract:
Next-Generation Protein Sequencing™ (NGPS) is a single-molecule approach for characterizing protein variants, offering detailed insight into proteoforms and amino acid substitutions not easily discerned by mass spectrometry. The novel data type produced by NGPS, which is based on binding of N-terminal amino acids by fluorescently tagged recognizer proteins, requires the development of new data analysis methods and bioinformatic tools. Here, we present ProteoVue™, a comprehensive bioinformatics pipeline for Single Amino Acid Variant (SAAV) detection and quantification using the Quantum-Si Platinum® NGPS platform. ProteoVue integrates multiple analytical components, including robust pulse-calling, recognition segment detection, fluorescence dye classification, and a neural network-driven kinetic signature database for pulse duration prediction. These components feed into a scoring-based alignment and clustering framework that enables accurate variant calling within binary peptide mixtures. We demonstrate that ProteoVue recovers expected variant ratios across diverse substitution types, including residues that lack direct amino acid recognizers. While some extreme cases remain challenging, the pipeline consistently captures the key kinetic features required for variant discrimination, underscoring its potential as a versatile and powerful tool for proteomic studies. As NGPS technology matures and recognizer libraries expand, ProteoVue provides a foundation for increasingly refined variant analysis in basic research, biomarker discovery, and clinical applications.