CompGen Initiative linking disease traits through ancestral trees
May 7, 2016
Traits are common aspects of our identities that we often use to differentiate ourselves from one another. Whether it is eye or hair color, each trait is passed down to an individual. The question of how the origin of a trait is tracked is under examination at the Institute for Genomic Biology (IGB) labs. Understanding how ancestral trees are generated can help with disease identification.
The project is being undertaken by different members of the CompGen Initiative, which enables multi-disciplinary work focused on computational genetics. People involved in sociology, biology, psychiatry, engineering, and the High Performance Computing (HPC) bio group have all come together in the pursuit of a common goal: to better understand the relation between human health and disease.
The CompGen Initiative’s online home page says that they “seek to combine the collective strengths of Illinois’ genomic research with its prowess in large-scale parallel systems and big data to develop new technology that enables future genomic breakthroughs. The new technology could enable a better understanding of the basic processes of life, illumination on how evolution works, and custom treatments for disease, among other discoveries.”
“We think that to make strides in understanding health and disease it’s time to break down disciplinary boundaries and start talking with folks who are not just interested in human medical approaches, but take novel ways of looking at problems,” said Derek Wildman, professor of molecular and integrative physiology. “That’s what this institute, the IGB, is all about: multi-interdisciplinary big science.”
Wildman specializes in comparative genomics and phylogenetics – the study of how species are related to one another. There are many different phylogenetic techniques that can be used to help detect human disease traits and that can help us gain a better understanding of the ancestral history of humans.
Get The Daily Illini in your inbox!
A vast majority of these kinds of studies are referred to as ‘case controls,’ where the single nucleotide polymorphism (SNP) – a variation in a base pair of a person’s DNA – of individuals is monitored in the context of a person’s particular phenotype, or the observable characteristics of a person. However, the overall uniqueness and differences amongst human genomes can make this difficult.
Thanks to the Thousands of Genomes project, human genomes from central Europe, Africa and parts of Asia are now easily accessible for study. The team at Illinois can now sequence different human genomes and examine the entire set of SNP’s, or variants, that people in a certain region may have.
Don Armstrong, an IGB research scientist specializing in computational genetics, is using bioinformatics to help focus in on the 2,500 individual human genomes at his disposal to help determine how changes in someone’s genetic code can lead to phenotypes.
In his proposal to have access to Blue Waters – the largest non-classified super computer in the U.S. – he wrote, “Many non-communicable human diseases – such as diabetes, cancer, cardiovascular disease, and mental illness – are caused in part by a complex network of interacting genetic variants. Multiple projects are currently underway to sequence the whole genomes of hundreds of thousands of cases and controls necessary to identify theses variants and their effect upon cellular processes which lead to disease.”
Armstrong received a preliminary grant that enabled him to have 50,000 node hours on Blue Water to aid him in tackling the comparison of 2,500 individual humans.
“Imagine the size of that data set, all the possible (ancestral) trees that you could draw. If you just had you and your parents, there’s just one possible branch that can be drawn. Now if you took four people, the number of trees and how they can be arranged increases. Now if you imagine having 10, it’s exploding factorially. It’s a non-polynomial – explodes in terms of its complexity – problem,” Armstrong said.
“And because of that the solutions are not exact, so we have to try a bunch of different solutions and hope we come up with the best model that models the tree. It’s only because of the existence of Blue Waters that we’re even able to contemplate working on this problem,” Armstrong said.
Multiple sequence alignments will allow comparison of singular differences of all of the individuals’ genomes lined up on top of each other. That information can then be used to make phylogenetic trees – depictions of relationships amongst humans.
To find the best phylogenetic tree, they’re put through a series of simulations using the Bayesian method of statistical inference and the Markov chain Monte Carlo (MCMC) technique of expectations of statistics helping to determine an estimate.
“We want to find the best (phylogenetic tree). The simulations are models of sequence evolution and you can simulate the data and you get a tree returned, and that tree will have a score,” Wildman said. “It crudely represents how many mutations occurred to describe that data. We do these simulations millions of times and the goal is to get it so we find the pinnacle of the best scores for those trees. At the end of the day we’ll have our best view of how all these genomes are all related to one another”
Once the best phenotypic ancestral tree is determined, the team is aiming for it to be applicable to help determine trait value of any diseases that contain genetic components.
“A lot of the more pervasive diseases that affect a large number of people – heart disease, cancer, lupus – we’re only in the infancy of understanding the molecular mechanisms that predispose people to getting those diseases,” Armstrong said. “So understanding the genetic changes that had happened will hopefully give us a better understanding of how the disease progresses and give us tools so we can stop the disease in an earlier stage and also make more accurate diagnoses of it.”
With continuously evolving efforts to understand the human genome, the results hold the potential to have a positive effect on human disease treatment and overall health.
“At IGB our slogan is ‘where science meets society’ and our University is a land-grant institution and our mission is to improve people’s lives,” Wildman said. “So we think genomics can be used to improve people’s lives, specifically by looking at our health and disease.”