AI for Science: Phenotype-Embedding Theorem and Genotype-Fitness Landscape

Abstract

The relationship between genotype and fitness is fundamental to evolution, but quantitatively mapping genotypes to fitness has remained challenging. We propose the Phenotypic-Embedding theorem (P-E theorem) that bridges genotype–phenotype through an encoder–decoder deep learning framework. Inspired by this, we proposed a more general first principle for correlating genotype–phenotype, and the P-E theorem provides a computable basis for the application of first principle. As an application example of the P-E theorem, we developed the Co-attention based Transformer model to bridge Genotype and Fitness model, a Transformer-based pre-train foundation model with downstream supervised fine-tuning that can accurately simulate the neutral evolution of viruses and predict immune escape mutations. Accordingly, following the calculation path of the P-E theorem, we accurately obtained the basic reproduction number (⁠R0) of SARS-CoV-2 from first principles, quantitatively linked immune escape to viral fitness and plotted the genotype-fitness landscape. The theoretical system we established provides a general and interpretable method to construct genotype–phenotype landscapes, providing a new paradigm for studying theoretical and computational biology.


Dr. Yixue Li is currently a principal investigator at Guangzhou National Laboratory and the Director of the Biomedical Big Data Center at the Shanghai Institute of Nutrition and Health, Chinese Academy of Sciences. He received his Ph.D. in theoretical physics from Heidelberg University, Germany, in 1996. Dr. Li's research interests include bioinformatics, systems biology, and computational biology. He has published over 300 peer-reviewed journal papers, including in prestigious journals such as Science, Nature, Nature Genetics, Nature Biotechnology, and Nature Communications. His research has been cited more than 25,000 times, achieving an H-index of 78 (according to Google Scholar).

Dr. Li has served as a reviewer and panelist for numerous national research foundations and agencies, including the Chinese National Science Foundation, the National High-Tech Program (863), and the National Key Basic Research Program (973). He has also organized several international conferences and workshops and served as a program committee member for major national and international conferences such as GIW, IBW, HUPO, and the National Bioinformatics Conference, among others.