Blog post

Synthetic signal peptides and why they matter in developing protein-based therapies

May 5, 2026
Blog post

Synthetic signal peptides and why they matter in developing protein-based therapies

May 5, 2026

Sign up

Therapy developers are striving to make biologics development faster, more efficient, and more predictable, so that high-quality biologic drugs reach patients sooner. Computational models are emerging as a new way to design and optimize protein-based therapies. Here’s how this emerging field is already changing discovery and manufacturability.

A powerful lever to optimize production bottlenecks

Most biologics are secreted proteins with a specific signal peptide as a natural expression element, tasked with guiding proteins into the endoplasmic reticulum. Signal peptides control expression rate in addition to regulating protein folding and post-translational modifications (1).

This natural process of specific signal peptide-protein combinations has not evolved to maximize protein production. Cells need controlled amounts of a variety of proteins, not a maximal amount of a single protein. Studies indicate that a major bottleneck in the production of secretory proteins are their early biogenesis steps in the endoplasmic reticulum (ER) (2). Most bottlenecks arise because folding, glycosylation, and ER translocation must occur at controlled rates; overloading this machinery can cause misfolding or poor secretion.

Currently, screening and engineering signal peptides to maximize protein production is done in a limited manner – usually only a handful of signal peptides is analyzed for a protein of interest.

However, even smaller volume of signal peptide screening and engineering have been shown to significantly boost yield and quality of medically relevant proteins (2).

Machine learning methods are not only bringing tools to screen more signal peptides simultaneously but also enabling generating new sequences: Synthetic elements expand the library of options exponentially. What’s more: Predictive models of synthetic signal peptides matching or even outperforming traditional industrial peers are beginning to evolve (3).

As an increasing proportion of biologics are designed with artificial intelligence and they diverge from natural proteins, it is reasonable to consider that the optimal signal peptide to match may well be a synthetic rather than a naturally occurring signal peptide. Designing signal peptides for AI-designed proteins will present a challenge for biologics production in the near future.

Engineering performance beyond natural capabilities

Typical strategies to engineer synthetic signal peptides have been the following.

  • Strategy A: Modify existing/native signal peptides: Starting from a known high-performing signal peptide, introduce targeted mutations to improve yield or fix issues with heterogenous signal peptide cleavage.
  • Strategy B: Generate de novo synthetic (computationally designed) signal peptides: train a generative ML model on a large signal peptide sequence dataset, then generate novel sequences for a target protein.
  • Strategy C: Hybrid workflow: Build a library of signal peptides by combining sequence elements of known signal peptides in new permutations, then apply predictive modelling to rank candidate signal peptides in the context of the target protein and host

The current limitation is engineering volume:

The space of functional signal peptides is enormous – in nature alone, there are thousands of experimentally validated and millions computationally predicted signal peptides with highly varied sequences.

  • Proteins are very specific about their signal peptides. One signal peptide might maximize production for a protein but suppress it for another (4)
  • Large scale synthetic signal peptide generation requires in silico modeling of signal peptide-protein compatibility – not only expression but features like cleavage purity and post-translational modifications as well.
  • No reliable system for such comprehensive in silico evaluation exists.
  • Identifying the right signal peptide usually requires manual, low-throughput screening, which is impractical at biologics-development scale.

What’s next for synthetic signal peptide engineering?

Labeled training data is scarce. While there are vast libraries on signal peptides, data on how a specific signal peptide impacts the production of a specific protein is much more limited. Engineering synthetic signal peptides for varied categories of proteins requires high quality training data sets and real-life validation (often high throughput chemistry not replaceable by computational methods alone) to allow prediction on signal peptide-protein compatibility.

Advanced engineering methods.To map the impact and diversity of thousands of signal peptides for a protein of interest, analysis methods require high throughput strategies. Avenue Biosciences has built a high throughput platform for analyzing thousands of signal peptides in parallel. The platform produces large, high-quality sequence–function datasets that directly feed machine-learning-model training and enable rapid, data-driven selection of top candidate signal peptides.

Co-development and maturity of AI-designed proteins. Synthetic signal peptides may require co-designing with synthetic proteins – a field still in development but moving from theory to practice. However, the field remains focused on creating AI-designed functional proteins, and less on how they will actually become industrially manufacturable.

Sources:

1.    Hegde, Ramanujan S. et al. 2006. The surprising complexity of signal sequences: Trends in Biochemical Sciences. Volume 31, Issue 10, 563 – 571

2.    Haryadi R, et al. (2015) Optimization of Heavy Chain and Light Chain Signal Peptides for High Level Expression of Therapeutic Antibodies in CHO Cells. PLoS ONE 10(2): e0116878.

3.    Wu Z, et al (2020). Signal Peptides Generated by Attention-Based Neural Networks. ACS Synth Biol. 2020 Aug 21;9(8):2154-2161.

4.    O’Neill,P. et al (2023). Protein-Specific Signal Peptides for Mammalian Vector Engineering. ACS Synth. Biol. 2023, 12 (8), 2339–2352.

 

Photo credit: Yianni Mathioudakis

Machine learning and faster discovery in biotechnology.

OPTIMIZE THE
SECRETORY PATHWAY

High throughput protein engineering for accelerating protein-based therapies.

Contact
Services