Achievements

Predicting functional divergence between duplicate genes:

In the past 15 years, my research group has developed a series of statistical models and algorithms to analyze the pattern of functional divergence between duplicate genes, based on the phylogeny-based analysis of multiple protein sequence alignment. The underlying premise is that functional divergence may lead to site-by-site differences in evolutionary rates between duplicates. A simple case is, for instance, that an amino acid site is highly conserved in one duplicate cluster whereas highly variable in the other one. We integrated these molecular evolutionary analyses to a user-friendly, popular software that has been constantly updated from DIVERGE to DIVERGE2 and to DIVERGE3 (https://github.com/xungulab/diverge).

We have published over fifteen DIVERGE-related papers, and collectively the number of total citations is above 2000. Indeed, DIVERGE software series have become the mostly-cited bioinformatics tool among those that perform similar analyses.

Transcriptome evolution analysis:

My achievements are as follows. (i) Statistical framework of transcriptome evolution: for both microarray and RNA-seq, it includes phylogenetic analysis of transcriptome evolution, expression clock (constant evolutionary rate of expression divergence) and testing, ancestral transcriptome inference along a phylogeny, and phylogenetic network analysis of multiple tissues/stages (e.g., see Gu 2004; Gu et al. 2005; Gu and Su 2007; Gu et al. 2013; Gu 2015a, Gu 2015b). (ii) Primate brain evolution As one of first few groups, we (Gu and Gu 2003) discovered that gene expression in the human brain tends to be upregulated since the split from chimpanzee. (iii) Expression divergence between duplicates Using the yeast as an example, we (Gu et al. 2005) have demonstrated a rapid expression evolution after gene duplications, which has shown significant impacts (over 180 citations). (iv) Tissue-driven hypothesis: Together with earlier microarray studies, Gu and Su (2007) proposed this hypothesis as an umbrella for all tissue-specific effects on molecular and genomic evolution, For instance, the brain tissue (or more generally, central nerve systems CNS) may have stronger functional constraints, the expressed genes may evolve slowly in both coding sequence and expression. Recent RNA-Seq data have confirmed the major results.

As one of pioneer investigators for the emergence of this scientific paradigm, I have established a model-based research program that help systematically analyze the high throughput transcriptome data from microarray to RNA-seq.

Foundation of molecular evolution theory:

In molecular evolution, the first thinking is population genetics and the second thinking is phylogeny analysis of sequence, between which is connected by the well-known Kimura’s formula underlying the neutral theory of molecular evolution. However, explosive functional genomics calls for a third thinking, tentatively termed ‘genotype-phenotype’ thinking. Though numerous empirical studies have been reported, how one can integrate this ‘genotype-phenotype’ thinking into the theory of molecular evolution remain a challenge. I have worked on this problem for ten years. (i) Genotype-phenotype model of molecular evolution: I first proposed a statistical framework of molecular evolution under the genotype-phenotype map (Gu 2007). Though the mapping model is mathematically abstract, it successfully introduced the fundamental mapping parameter into the theory of molecular evolution, that is, the rank (K) of genotype-phenotype map, which is the minimum between the rank of genotype (r) and the rank of phenotype (n) (Gu 2014). (ii) Effective estimation of gene pleiotropy: I have developed a statistical method to estimate K=min (n, r) from a multiple protein sequence alignment, which can be considered as an effective estimate of the gene pleiotropy (n), or the number of fitness components related to the gene. (iii) Gene pleiotropy hypothesis of molecular evolution: After conducting large-scale data analyses, we proposed that a gene with high pleiotropy tends to evolve slowly. (iv) Molecular evolution under three thinking recently, I (Gu 2015c) showed that, without positive selection, sequence conservation (intensity of purifying selection) is determined by the effective population size, protein structure stability, gene pleiotropy and expression effect.

The foundation of molecular evolution stems from two scientific thinking: One is population genetics and the other is the ‘molecular clock’. The paradigm shift of genome sciences demands an additional third thinking to connect between the model-driven evolutionary theory and the data-driven comparative genomic exploration, that is, the genotype-phenotype thinking. I have studied this fundamental problem for a decade and believe that I am the only one in attempt to extend the classical theory of molecular evolution to integrate the third thinking. My work have been gradually recognized recently.

Patterns of genome evolution:

While the pattern of molecular evolution such as molecular clock has been the central topic for half century, this paradigm is having been shifted to the pattern of genome evolution. In the past two decades, I have made significant contributions on the following topics. (i) We first discovered (Gu et al. 2002) that in mammalian genome, emergence of duplicate genes follows a three-component pattern: Wave-I (recent duplications within mammals), Wave-II (genome duplications in early vertebrates), and the ancient component (during metazoan evolution or earlier). (ii) We (Gu and Gu 2003) discovered that gene expression in the human brain tends to be upregulated since the split from chimpanzee Expression evolution in primate tissues. (iii) We studied the role of gene duplication on genetic robustness, revealing that the contribution is significant but complicated. (iv) We (Su et al. 2006) proposed the hypothesis that alternatively splicing isoforms can be fixed in one of duplicate genes, respectively. (v) We developed the method of gene-content phylogeny to study the evolution (gain or loss) of gene content at the genome level. (vi) Using TATA-box regulatory pathway as an example, we (Zou et al. 2012) demonstrated the expression divergence after gene duplication that may be environmentally-dependent: (vii) With the help of ENCODE data, we (Zhou et al. 2013) studied the evolution of human transcription factor (TF) network. (viii) Evolution of phophalted site in metazoans.

With the growth of genomics for two decades, we have actively participated to the adventure of genomic explorations. Though the genomic data changed rapidly, our research has always focused on the comparative genomic and evolution.

Statistical theory and methods for evolutionary genomics: A synthesis:

In 2011, as single author, I published a monograph book ‘statistical theory and methods for evolutionary genomics’ by Oxford University Press, 258 pages. This is the first book, and to our best knowledge is probably the only one at present in attempt to synthesize numerous literatures and my three-decade research in this field. My goal is to develop a systematic framework from model presentations to application illustrations for both research and education purposes. My life-time mission is to approach this goal and ultimately accomplish.

Papers with over hundred (100) citations
  • Gu, Xun (1998) Early metazoan divergence was about 830 million years ago. Journal of Molecular Evolution 47:369-371
  • Gaucher, Eric A; Gu, Xun; Miyamoto, Michael M; Benner, Steven A (2002) Predicting functional divergence in protein evolution by site-specific rate shifts. Trends in biochemical sciences 27:315-321
  • Gu, Xun (1999) Statistical methods for testing functional divergence after gene duplication.. Molecular biology and evolution 16:1664-1674
  • Su, Zhixi; Wang, Jianmin; Yu, Jun; Huang, Xiaoqiu; Gu, Xun (2006) Evolution of alternative splicing after gene duplication. Genome research 16:182-189
  • Gu, Xun (2001) Maximum-likelihood approach for gene family evolution under functional divergence. Molecular biology and evolution 18:453-464
  • Jiang, Cizhong; Gu, Xun; Peterson, Thomas (2004) Identification of conserved gene structures and carboxy-terminal motifs in the Myb gene family of Arabidopsis and Oryza sativa L. ssp. indica. Genome biology 5:1-11
  • Lin, Haining; Zhu, Wei; Silva, Joana C; Gu, Xun; Buell, C Robin (2006) Intron gain and loss in segmentally duplicated genes in rice. Genome biology 7:1-11
  • Gu, Zhenglong; Steinmetz, Lars M; Gu, Xun; Scharfe, Curt; Davis, Ronald W; Li, Wen-Hsiung (2003) Role of duplicate genes in genetic robustness against null mutations. Nature 421:63-66
  • Gu, Xun; Zhang, Zhongqi; Huang, Wei (2005) Rapid evolution of expression and regulatory divergences after yeast gene duplication. Proceedings of the National Academy of Sciences 102:707-712
  • Gu, Xun (2003) Evolution of duplicate genes versus genetic robustness against null mutations. Trends in Genetics 19:354-356
  • Gu, Xun; Li, Wen-Hsiung (1992) Higher rates of amino acid substitution in rodents than in humans. Molecular phylogenetics and evolution 1:211-214
  • Gu, Xun; Zhang, Jianzhi (1997) A simple method for estimating the parameter of substitution rate variation among sites.. Molecular Biology and Evolution 14:1106-1113
  • Wang, Yufeng; Gu, Xun (2001) Functional divergence in the caspase gene family and altered functional constraints: statistical analysis and prediction. Genetics 158:1311-1320
  • Gu, Jianying; Gu, Xun (2003) Induced gene expression in human brain after the split from chimpanzee. Trends in Genetics 19:63-65
  • Nei, Masatoshi; Gu, Xun; Sitnikova, Tatyana (1997) Evolution by the birth-and-death process in multigene families of the vertebrate immune system. Proceedings of the National Academy of Sciences 94:7799-7806
  • Gu, Xun (2006) A simple statistical method for estimating type-II (cluster-specific) functional divergence of protein sequences. Molecular biology and evolution 23:1937-1945
  • Gu, Xun; Wang, Yufeng; Gu, Jianying (2002) Age distribution of human gene families shows significant roles of both large-and small-scale duplications in vertebrate evolution. Nature genetics 31:205-209
  • Jiang, Cizhong; Gu, Jianying; Chopra, Surinder; Gu, Xun; Peterson, Thomas (2004) Ordered origin of the typical two-and three-repeat Myb genes. Gene 326:13-22
  • Gu, Xun; Fu, Yun-Xin; Li, Wen-Hsiung (1995) Maximum likelihood estimation of the heterogeneity of substitution rate among nucleotide sites.. Molecular Biology and evolution 12:546-557
  • Gu, Xun; Zou, Yangyun; Su, Zhixi; Huang, Wei; Zhou, Zhan; Arendsee, Zebulun; Zeng, Yanwu (2013) An update of DIVERGE software for functional divergence analysis of protein family. Molecular biology and evolution 30:1713-1719
  • Gu, Xun; Vander Velden, Kent (2002) DIVERGE: phylogeny-based analysis for functional–structural divergence of a protein family. Bioinformatics 18:500-501
  • Li, Wen-Hsiung; Yang, Jing; Gu, Xun (2005) Expression divergence between duplicate genes. TRENDS in Genetics 21:602-607
Papers with citations between 50 and 99
  • Gu, Xun; Li, Wen-Hsiung (1996) A general additive distance with time-reversibility and rate variation among nucleotide sites. Proceedings of the National Academy of Sciences 93.0:4671-4676
  • Zhang, Zhao; Shen, Libing; Gu, Xun (2016) Evolutionary dynamics of MERS-CoV: potential recombination, positive selection and transmission. Scientific Reports 6.0:1-10
  • Wang, Yufeng; Gu, Xun (2000) Evolutionary patterns of gene families generated in the early stage of vertebrates. Journal of Molecular Evolution 51.0:88-96
  • Gu, Xun (2003) Functional divergence in protein (family) sequence evolution. Origin and evolution of new gene functions nan:133-141
  • Su, Chen; Jakobsen, Ingrid; Gu, Xun; Nei, Masatoshi (1999) Diversity and evolution of T-cell receptor variable region genes in mammals and birds. Immunogenetics 50.0:301-308
  • Zhou, Huaijun; Gu, Jianying; Lamont, Susan J; Gu, Xun (2007) Evolutionary analysis for functional divergence of the toll-like receptor gene family and altered functional constraints. Journal of molecular evolution 65.0:119-123
  • Zhang, Wen‐Juan; Zhou, Jie; Li, Zuo‐Feng; Wang, Li; Gu, Xun; Zhong, Yang (2007) Comparative analysis of codon usage patterns among mitochondrion, chloroplast and nuclear genes in Triticum aestivum L.. Journal of Integrative Plant Biology 49.0:246-254
  • Gu, Xun; Li, Wen-Hsiung (1998) Estimation of evolutionary distances under stationary and nonstationary models of nucleotide substitution. Proceedings of the National Academy of Sciences 95.0:5899-5905
  • Gu, Xun; Zhang, Hongmei (2004) Genome phylogenetic analysis based on extended gene contents. Molecular Biology and Evolution 21.0:1401-1408
  • Zhang, Jianzhi; Gu, Xun (1998) Correlation between the substitution rate and rate variation among sites in protein evolution. Genetics 149.0:1615-1625
  • Gu, Jianying; Gu, Xun (2003) Natural history and functional divergence of protein tyrosine kinases. Gene 317.0:49-57
  • Huang, Wei; Chang, Benny H-J; Gu, Xun; Hewett-Emmett, David; Li, W-H (1997) Sex differences in mutation rate in higher primates estimated from AMG intron sequences. Journal of molecular evolution 44.0:463-465
  • Wang, Erli; Sun, Shuna; Qiao, Bin; Duan, Wenyuan; Huang, Guoying; An, Yu; Xu, Shuhua; Zheng, Yufang; Su, Zhixi; Gu, Xun (2013) Identification of functional mutations in GATA4 in patients with congenital heart disease. PloS one 8.0:e62138
  • Gu, Xun; Hewett-Emmett, David; Li, Wen-Hsiung (1998) Directional mutational pressure affects the amino acid composition and hydrophobicity of proteins in bacteria. Mutation and Evolution nan:383-391
  • Zhang, Zhongqi; Gu, Jianying; Gu, Xun (2004) How much expression divergence after yeast gene duplication could be explained by regulatory motif evolution?. TRENDS in Genetics 20.0:403-407
  • Lin, Haining; Ouyang, Shu; Egan, Amy; Nobuta, Kan; Haas, Brian J; Zhu, Wei; Gu, Xun; Silva, Joana C; Meyers, Blake C; Buell, C Robin (2008) Characterization of paralogous protein families in rice. BMC plant biology 8.0:1-14
  • Gu, Xun; Su, Zhixi (2007) Tissue-driven hypothesis of genomic evolution and sequence-expression correlations. Proceedings of the National Academy of Sciences 104.0:2779-2784
  • Lin, Haining; Moghe, Gaurav; Ouyang, Shu; Iezzoni, Amy; Shiu, Shin-Han; Gu, Xun; Buell, C Robin (2010) Comparative analyses reveal distinct sets of lineage-specific genes within Arabidopsis thaliana. BMC evolutionary biology 10.0:1-14
  • Gu, Xun; Nei, Masatoshi (1999) Locus specificity of polymorphic alleles and evolution by a birth-and-death process in mammalian MHC genes.. Molecular biology and evolution 16.0:147-156
  • Gu, Xun (2004) Statistical framework for phylogenomic analysis of gene family expression profiles. Genetics 167.0:531-542
  • Wu, Jingcheng; Zhao, Wenyi; Zhou, Binbin; Su, Zhixi; Gu, Xun; Zhou, Zhan; Chen, Shuqing (2018) TSNAdb: a database for tumor-specific neoantigens from immunogenomics data analysis. Genomics, proteomics & bioinformatics 16.0:276-282