Algorithms might give genetic clues

By Kelly Fugo

Computer algorithms developed at the University may help researchers discover how genes are activated.

The genomic sequences of 14 different organisms are being determined at the University, according to the Genomes OnLine Database. Genomes of four additional organisms have been completely determined at the University in the last 10 years.

Information, on more than 2,000 ongoing genome projects throughout the world, is accessible through the database, and nearly 400 completed genomes are publicly available. Researchers are able to download and compare the sequences of different organisms, which range from yeast to humans.

After the determination of genomic sequences of many species, comparisons can be made to determine the necessary molecules for any process ranging from developmental stages of life to complex behavior.

“There has been a huge investment in measurement at the molecular level,” said Saurabh Sinha, assistant professor in computer science, “but those measurements are just lying there as files and databases and haven’t translated into discoveries as often as we would like them to.”

In comparing the DNA of a human with that of one of its closest evolutionary relatives, a chimpanzee, the genomes are 98.7 percent similar. Of the fruit fly’s 13,000 genes, about 44 percent, are similar to humans. But what sometimes differs between species is how and when the DNA is able to convert into a functional protein.

“It’s as if we understand the parts list, which is the set of genes, but we don’t know the assembly manual,” Sinha said. “And the assembly manual is what differs between humans and flies.”

Sinha’s computer algorithms focus on the non-coding sequence of DNA, which contains regulatory elements that help determine when a gene turns on.

“These days the focus has shifted dramatically towards studying regulatory networks,” Sinha said.

Sinha’s programs will help Phil Newmark, assistant professor in cell and developmental biology, identify regulatory elements in the planarian, a type of regenerating flatworm.

“We have a reasonable amount of data showing cell types in which given genes are expressed in the animal, so we can identify genes that are expressed in the nervous system, for example, or in other specific cell types,” he said.

The planarian genome is being sequenced at the Washington University Genome Sequencing Center. Newmark’s research group has been working with the University’s Keck Center for Comparative and Functional Genomics to identify about 10,000 planarian genes and compare these sequences between species, Newmark said.

“The hope is, by giving a computational biologist like Saurabh a large group of genes known to be expressed in the same cell type, what we could then do is mine the genomic information and look for potential regulatory elements that will be involved in driving expression in those cell types,” Newmark said.

Once the researchers can determine how genes are expressed in adult cells, they can work backwards to determine how gene expression controls development and is reset during regeneration, Newmark said.

Joel Stary, graduate student, uses several different programs to compare planarian sequence data with sequences from other organisms that are able to regenerate. The programs make it possible to compare many sequences simultaneously.

“You do a large amount of work rapidly but then have to go over it carefully and slowly,” he said.

These comparative analyses will help narrow the candidate genes involved in tissue replacement, and then more detailed techniques can be used to determine gene characteristics, Stary said.