Time : 13:00-17:00 pm , May 9（Wednesday）
Venue: Room 300, SIBS Main Building, Yueyang Road 320
Speaker：ZHANG Chao, WANG Yuchen, FENG Qidi
Title 1: Population Genomics Studies of Human Evolutionary History, Local Adaptation, and Database Construction (ZHANG Chao)
Population genomics is derived from population genetics. It explores the microevolution dynamics of genomes at the population level and studies the evolutionary laws of populations at the genomics level. There is a wide range of reasons underlying the exploration of genetic diversity or investigation of population genomics. Firstly, large-scale polymorphism data offer an opportunity to improve the inferences concerning evolutionary histories of human populations. Secondly, it allows researchers to understand evolutionary forces (such as mutation, drift and selection) that shape the genetic diversity and genomic architecture. Thirdly, population genomic approaches allow for the determination of possible adaptive genetic variants and, therefore, help to identify variants that are functionally important. Finally, population genomic data represent a unique resource to obtain deeper insight into the relationship between genotypes and phenotypes, the genetic origin of human diseases as well as the mechanism underling disease disparities in different populations.
The advance of single nucleotide polymorphism (SNP) array genotyping and next generation sequencing technology enhanced the development of human population genomics. Several international projects have been conducted to completely describe the genetic variation that occurs in diverse human ethnic groups. These effects include the International HapMap Project, Human Genome Diversity Project, the 1000 Genomes Project and Simons Genome Diversity Project etc., providing new insights into genetic diversity of humans. The studies during my PhD program are conducting under the development of human population genomics, which cover many kinds of aspect such as dissecting local adaptation, inferring evolutionary history and constructing databases for diverse population genomics data.
My first research work is detecting natural selection signals of zinc transporter genes among worldwide human populations. As is known, modern humans originated ~200,000 years ago in Africa. Over the past 100,000 years, humans spread across the globe into a variety of habitats and adapted to diverse environments such as dietary habitats, bacteria infections, cold as well as hypoxia conditions. Exploring the footprints of natural selection signals provide deep insights into the biological basis of how human adapted to environments and help identify those functionally important genes and variants. Over the past 10,000 years, the most profound revolution is agriculture expansion, from which most human population shifted their hunter-gathering lifestyle to farming and raising animals, and therefore changed the dietary habits of present-day humans.
Zinc is an essential trace element in human dietary nutrition, and its homeostasis is closely related to health and diseases. Previous studies showed zinc contents in soils or crops are extremely diverse across continents and African populations have undergone severe zinc deficiency. We hypothesized that, due to the uneven global distribution of absorbable zinc and different diet habits, some zinc transporter genes (ZTGs) with adaptable variations in different populations might be underlying natural selection as their transporting capability changed, thus maintaining the balance of intercellular or serous zinc in the human body. We therefore systematically analyzed the patterns of genetic diversity and signals of natural selection for 24 ZTGs in 14 worldwide populations. We showed that ZTGs harbor many more highly population-differentiated variants compared with random genes and discussed the potential underlying forces shaping the genetic diversity of ZTGs. Moreover, SLC30A9 was underlying natural selection in both East Asians and Africans but in different directions. By performing a correlation, we found the evolutionary force underlying the selective sweep of SLC30A9 may be the uneven worldwide zinc distribution in corps. Moreover, we predicted 17 potentially functional SNPs, which may guide the study of molecular mechanism of ZTGs. Our results increase our understanding of the evolutionary forces that affect ZTGs and proposed the necessity of precision nutrition of different ethnic groups.
My second research work is dissecting the evolutionary histories and high-altitude adaptations of the Tibetan highlanders. Living in the Qinghai-Tibet Plateau with an average elevation of over 4500 m, the Sherpas and Tibetans were some of the most mysterious populations until Tenzing Norgay, a Sherpa, conquered Mount Everest in the middle of the 20th century and attracted the attention of anthropologists, archaeologists, and geneticists. The knowledge of the origin and population history of Tibetan highlanders is still very much in its infancy and controversial. In particular, some key questions remain unsolved: (1) who did Tibetans descend from, (2) how long have human beings been living at the Tibetan Plateau, and (3) what is the genetic relationship between those highlanders. Moreover, both highlander groups seem to cope well with the tremendously hypoxic environment and possess a distinctive set of adaptive physiological traits, including un-elevated hemoglobin concentrations even up to 4000 m, which is clearly associated with oxygen delivery. It is natural to ask what is the genetic mechanism underlying high-altitude adaptation and is there any shared or differentiated mechanism between Tibetans and Sherpas.
We used whole-genome deep sequencing and genome-wide genotyping data from Sherpas, Tibetans, and the Han Chinese to revisit and reconstruct the evolutionary histories and high-altitude adaptations of the highlanders. We addressed four major unresolved issues: (i) whether they are two genetically different ethnic groups; (ii) whether population substructures exist in either of the two groups; (iii) how long they have diverged from their ancestral group and when the two separated groups started to re-contact by population admixture; and (iv) whether the two groups share major high- altitude adaptation mechanisms. We found that the average time of divergence between Tibetan highlanders and Han Chinese was estimated to be ~15,000–9,000 years before present, most likely resulting from recent migration to the Tibetan Plateau after the Last Glacial Maximum. This suggests that Tibetan highlanders and Han Chinese are tonggen tongyuan—of the same roots and the same source. Moreover, Sherpas and Tibetans show sufficient genetic difference and can be distinguished as two distinct groups; their divergence time (~3200–11,300 years ago) is much more recent than that of their common ancestors and Han Chinese. The two highlander groups harbor shared and differentiated genetic variants associated with adaptation to either highaltitude or UV radiation. On the one hand, both the Sherpa and Tibetan population harbor elevated non-human sequences in EPAS1 region, where Denisovan-like, Neanderthal-like, ancient-Siberian-like, and unknown ancestries are entangled, suggesting a “borrowed fitness” mechanism of adaptation. On the other hand, Sherpa exists specific adaptive variants (such as chr17:19645417 in ALDH3A1), might protecting UV radiation on the Tibetan Plateau. In summary, our results indicate that complex history of population divergence, a long period of isolation, local adaptation, and recent gene flow jointly shaped the genetic landscape of human genetic diversity on the plateau.
The third research work is mapping/identifying genes and variants underlying Mendelian diseases. Human suffers heavy burden of Mendelian diseases. It is estimated that there are more than 7,000 types of Mendelian disorder in humans, 50% of which the underlying genetic variants have not been detected. The advent of whole-genome sequencing era offers great opportunities for studying the genetic basis of these disorders. Though with challenges, reasonable variant-mapping strategies could highly improve success rate.
Metaphyseal hondrodysplasia, Schmid type (MCDS) [OMIM, 156500] is a typical type of dwarfism, being characterized by short stature with abnormally short stature, disproportionate body, bowed legs, coxa valga, metaphyseal widening and sclerosis. Previous studies have identified near 50 variants related to MCDS, suggesting high heterogeneities of this disease. However, before our studies, no one used next generation sequencing technology to detect functional variants underlying MCDS. We enrolled a 7-generation dwarf family (110 individuals) with typical MCDS syndromes from an isolated consanguineous tribe in Pakistan, and sequenced 6 of them with whole-genome deep sequencing to pinpoint causal variant. We identified a missense variant c.2011T>C (p.Ser671Pro) located in COL10A1, encoding the alpha chain of type X collagen, which is the most likely contributor to the dwarfism family. We revisited the 50 variants detected by previous studies and found that some of them exist with high frequencies in worldwide populations, suggesting that those variants found by traditional methods are probably false negative and therefore needed to be revisited in the increasing genomes of world-wide populations. Moreover, our result provides a reliable reference pipeline for identifying cause variants related to Mendelian diseases.
My fourth research work is constructing a suite of databases and tools for population genomics and genetics. Genomics is a Big Data science and is going to get much bigger, due to its great volume and diverse types. However, the current challenge is that the data processing (both upstream and downstream analysis) speed far lags behind the data generation process. The major reasons are (a) lower data-transfer speed, (b) poor data sharing strategies (most of the data are unavailable), (c) the inconsistence of the data processing strategies between different researchers, and (d) the necessity of method innovation. Based on big genomic data, the concept of precision medicine entered the public's awareness. However, any application requires powerful scientific support to be a backing. As one of the major types of biomedicine data, population genomics data are of extreme importance in ancestry tracing, disease mapping, variant function prediction etc. The WEB, cloud computing and database technology are one of the most suitable strategy to complete the mission for storing, processing, exploring, and sharing genomic data, and therefore realize scientific value of big data.
In this study, we aim to construct a suite of databases concentrated on human population genomics and genetics (PGG.Suite). The databases involve different kinds of aspect, ranging from genomic variant annotation (PGG.SNV and PGG.SV), genomic diversity and genetic history of human populations (PGG.Population), genetic variants that regulating gene expression (PGG.Expression), and genomic tools and APIs for dealing with genomics data (PGG.Tools). Currently, the database on genomic diversity and genetic history has been completed. In this work, we collected more than 7,000 genomes covering 356 ethnic groups and performed many analyses such as genetic diversity, population structure and local adaption for each population. PGG.Population is to create a comprehensive depository of geographic and ethnic variation of human genome, as well as a platform bringing influence on future practitioners of medicine and clinical investigators. The construction works of other databases are under their way. Users can access PGG.Population at www.pggpopulation.org. Other databases are still under construction. It is expected that PGG.Suite would be a famous and professional databases focusing the downstream analysis of population genomics.
Title 2: Genetic Structre, Admixure History, and Ancestry of East Asian Populations (WANG Yuchen)
Population genetics is a discipline that studies the genetic structure and the maintenance and change of gene frequencies in populations. With the rapid development of whole-genome microarray and next-generation sequencing technologies in recent years, a large number of new methods and results have been developed in human population genetics. We noticed that these kinds of researches have achieved greater results in the European or African populations, and there is still a lack of systematic and comprehensive research in Asian, especially East Asian populations. In this paper, I will introduce the results of the East Asian project, the Asian Diversity Project, the Tibetan project and 100K Han Chinese Microarray Project by four aspects of population structure analysis, population history reconstruction, admixture and gene flow, ancestry informative markers and ancestry analysis.
We first studied the representative East Asian populations—North and South Han Chinese, Hondo Japanese and Korean—with reference population Dai, Mongolian and Ryukyuan. We found that the Han Chinese, Japanese and Korean populations have close genetic relationships, but they do have different genome makeup that can be genetically well distinguished, and the difference is correlated with their geographical distribution. We also found that recent admixture is ubiquitous in East Asia populations, and so are Han Chinese, Japanese and Korean, which could partly explain their similarity.
In the ADP project, we extended our research to the entire Asian region. We and our collaborators collected whole genome data of samples from more than 70 Asian populations, integrated with samples from more than 60 pan-Asian and Pacific populations from the public Human Origin data set, result in 105 high-quality representative groups. Using these samples, we obtained genetic structure and population history of the Asian people, and we also did some explorations of the mainly admixture pattern in Asian populations.
In the Tibetan project, we estimated the contribution of the three ancient samples—Altai Neanderthals, Denisovans and Ust’-Ishim—in modern human genome, and focus on comparisons between Tibetans and Han Chinese, or the difference between East Asians and other world-wide populations.
We have developed the pipeline of screening ancestry informative markers (AIMs) based on a phylogeny tree, and customized the Illumina GSA and Illumina ASA chips with AIMs. We have also developed an automatic tool for analyzing personal ancestry based on single sample whole genome microarray data. Using this tool, we can easily get a report of personal ancestry information, including population and subpopulation classification, genome component analysis, ancient human gene estimation, and chromosome typing.
Title 3: Genetic structure, Population admixture and Natural selection of Uyghur and Tajik populations in Xinjiang (FENG Qidi)
Xinjiang, previously known as “Xiyu”, is a vast territory in northwestern China, spanning over 1.6 million square kilometers. Located in the center of Eurasia, it is crossed by the well-known route of the historical Silk Road and borders the countries of Afghanistan, India, Kazakhstan, Kyrgyzstan, Tajikistan, Pakistan, Russia and Mongolia. Studying the genetic history of Xinjiang admixed populations is the key to understand human migration, admixture processes across the whole Eurasia.
With a population size of more than 10 million, Xinjiang Uyghur is the most influential population in Central Asia. We analyzed 951 Uyghur samples from 14 regions in Xinjiang. Results show that substructure exists between Southwestern and Northeastern Uyghurs, which was likely shaped jointly by the Tianshan Mountains, which traverses from east to west as a natural barrier, and gene flow from both east and west directions. In Xinjiang Uyghur we identified 4 major ancestral components: West Eurasian (25%-37%), South Asian (12%-20%), East Asian (29%-47%) and Siberian (15%-17%) ancestries. We came up with “Admixture of admixture” model of Xinjiang Uyghur based on correlation of these four ancestries proportions – the four major ancestral components were derived from two earlier admixed groups: one from the West, harboring West Eurasian and South Asian ancestries, and the other from the East, with East Asian and Siberian ancestries. By using a newly developed method, MultiWaver, the complex admixture history of Xinjiang Uyghur was modeled as a two-wave admixture. An ancient wave was dated back to ~3,750 years ago (ya), which is much earlier than that estimated by previous studies, but fits within the range of dating of mummies that exhibited European features that were discovered in the Tarim basin, which is situated in southern Xinjiang (4,000-2,000); a more recent wave occurred around 750 ya, which is in agreement with the estimate from a recent study using other methods. We unveiled a more complex scenario of ancestral origins and admixture history in Xinjiang Uyghur than previously reported, which further suggests Bronze Age massive migrations in Eurasia and East-West contacts across the Silk Road.
Compared with other Central Asians, Xinjiang Tajiks are the only main group using Iranian, Indo-European language. We analyzed 46 Xinjiang Tajiks and resolved their ancestral components. Compare with Xinjiang Uyghurs, Tajiks own more ancestral components from West Eurasia (44.9%) and South Asian (38.5%), while less ancestral components from East Asian (4.97%) and Siberian (8.33%). Using Tajikistan Tajik from Human Origins datasets as reference, we found that Xinjiang Tajiks originated from Tajikistan Tajiks, and suffered from more gene flow from the East. Most of Xinjiang Tajiks distributed in Tashkurgan county in Pamir Plateau, with average altitude more than 4000 meters. We scanned the whole genome to identify high altitude adaptation signals, with COL11A1 and ARNT2 appear to be top signals. Study with global populations as references noticed an unique haplotype that covers the CAPN3 and GANC gene region enriches in Xinjiang Tajiks but totally absent from Africans. Through comparison with archaic segments, results indicated that this unique haplotype seems inherited from unknown archaic and play an important role in Tajiks high altitude adaptation.
Except for Xinjiang admixed populations, we also studied the genetic relationship between Sherpas and Tibetans. Through analysis of 111 Sherpas and 177 Tibetans, we found that Sherpas and Tibetans show considerable genetic differences and can be distinguished as two distince groups, even though the divergence between them (~3,200-11,300 ya) is much later than that between Han Chinese and either of the two groups (~6,200-16,000 ya). Compared to Tibetans, Sherpas own more South Asian ancestral component, while Tibetans show higher levels of East Asian and Siberian ancestry.