Microproteins and the Dark Proteome: Tiny Molecules with Big Potential

January 22, 2025 • Meredith Carpenter, PhD – Head of Scientific Affairs

A major goal in modern biology is fully understanding the protein-coding genome, which remains a challenge even 20+ years after the human genome was first sequenced. In recent years, researchers have turned their attention to non-canonical open reading frames (ncORFs)—regions of the genome that were once thought to be largely irrelevant. Emerging evidence shows that these regions are translated into proteins across various human cell types and disease states, though their full impact on biology has remained unclear due to a lack of large-scale data.

One intriguing class of proteins emerging from ncORFs is microproteins, also known as miniproteins or micropeptides. These are small proteins, typically fewer than 100 amino acids, that are translated from independent small open reading frames (sORFs or smORFs). Once dismissed because of their size, microproteins are now recognized as key players in regulating cellular functions, influencing everything from gene expression to cellular pathways and disease mechanisms. Recent studies are also beginning to reveal their therapeutic potential, especially in fields like cancer, neurodegenerative diseases, and genetic disorders, where their ability to interact with other proteins in unique ways opens up exciting possibilities for new treatments.

A recent preprint adds to the importance of this “dark proteome.” The study found that at least 25% of 7,264 ncORFs across human cells produce translated gene products. By combining techniques like proteomics, immunopeptidomics, and Ribo-seq, the researchers identified over 3,000 peptides from these previously overlooked regions. With data from more than 95,000 experiments, this study offers a comprehensive view of the ncORF landscape. The scale of this analysis underscores the untapped potential of ncORFs and sets the stage for research in a wide range of species, from humans to plants and animals.

Looking ahead, Next-Generation Protein Sequencing™ (NGPS™) is poised to transform our ability to study these small proteins. Traditional proteomics methods can struggle to detect tiny or low-abundance proteins. In contrast, NGPS can bridge the gap between genomic data and functional proteins, allowing researchers to detect smaller peptides and translate ncORFs more effectively.

Dr. John Prensner, one of the lead authors on the study, envisions broad applications of NGPS in this area. “We are starting to learn that the human ‘dark proteome’ may be vast,” he says. “We are finding thousands of HLA-bound peptides from these unannotated protein entities.  Yet, in the overwhelming majority of cases, we don’t see them by traditional mass spectrometry. I’m excited that Next-Generation Protein Sequencing might offer another avenue to look deeper into this dark proteome in a way that could complement traditional mass spectrometry.”In the years ahead, the study of microproteins, miniproteins, and ncORFs could transform fields like drug design, gene therapy, and molecular biology. As we continue to unravel the hidden complexities of the proteome, these small but mighty molecules could play a crucial role in advancing precision medicine, offering novel therapeutic options for diseases that have long evaded effective treatments.


Meredith Carpenter, PhD, Head of Scientific Affairs, Quantum-Si

Meredith L. Carpenter, PhD, is head of scientific affairs at Quantum-Si, where she manages external collaborations and publication strategy. Dr. Carpenter has over 10 years of experience in developing and deploying novel genomics and multi-omics tools in the biotech industry. Prior to Quantum-Si, Dr. Carpenter held roles as director of assay development at Arc Bio and senior director of strategic alliances at Cantata Bio. She earned a BS in Biology from Emory University and a PhD in Molecular and Cell Biology from UC Berkeley, and she performed postdoctoral research in the Department of Genetics at Stanford University.