Computer science tackles big data in medicine

Jan 5, 2018

By Matthew Chin

Professor Eran Halperin analyzes complex datasets in genomics, microbiology, and healthcare. 

By Sarah C.P. Williams

A century ago, the results of a biological study were generally observations scrawled longhand in a lab notebook, perhaps accompanied by a sketch. Today, though, experimental results can look quite different— they’re often huge sets of data generated by automated machines, and they can take up more hard drive space than you have on your own personal computer. Grappling with these datasets, which require advanced computational methods and statistics to analyze, can be a headache for some biologists and medical researchers. But for Eran Halperin, a UCLA professor of computer science, anesthesiology, and human genetics, it’s his forte.

“I’m a computer scientist who decided about 15 years ago to focus on applications in biology and medicine,” Halperin said. “At the time, the way medicine was being performed was in a very non-qualitative way and I saw the potential for it to be better.”

Last year, Halperin left Tel-Aviv University to join the UCLA faculty, where he’s already involved in collaborations with other UCLA researchers whose work spans computer science and medicine. “The physical proximity between high-quality computer science researchers and high-quality medical researchers like it is here at UCLA is very rare,” Halperin said. “My collaborators quite literally go to the same coffee shop as me, so it’s very easy to work together.”

Halperin’s goal is to design new approaches to finding patterns in the large data sets that are produced by biological experiments. If a researcher wants to compare one million spots in the genomes of 5,000 individuals— to find changes associated with a disease, perhaps— they won’t be able to spot patterns with their bare eyes. That’s where Halperin’s computer programs come in.

This summer, Halperin— along with UCLA colleagues Eleazar Eskin, a professor of computer science and of human genetics, and Jae-Hoon Sul, an assistant professor of psychiatry and biobehavioral sciences —won a $1.2 million grant from the National Science Foundation to design one such new approach.

The team is working to develop new ways to find a particular type of pattern—called a “low-dimensional structure”—in genomic data. Rather than, say, one single genetic mutation that causes a disease, a low-dimensional structure consists of a scattering of changes across the entire genome that might be associated with a disease, age, gender, or even a trait like height.

“Finding low-dimensional structures in datasets is something that people in machine learning and statistics have been trying to do for a hundred years,” Halperin said. “What’s unique about what we’re trying to do is that we’re solving this problem specifically as it applies to genomic data.”

Halperin and his colleagues are particularly interested in finding patterns not just in the sequences of DNA that make up the genome, but the methylation marks that stud the DNA, directing cells how and when to express genes. While your genome is relatively static throughout your life, these chemical marks along the DNA molecules can change how your genome is used by cells. Throughout the body, methylation can vary by cell type, complicating the analysis of the so-called methylome: if one person has a different pattern of methylation than another, is that due to a difference in what cell types were in their blood sample, or a true difference in methylation? That’s exactly the sort of complex question Halperin, Eskin, and Sul hope to find a way to answer.

Since launching his UCLA lab, Halperin has also turned his attention to another hot area of biology— the microbiome, or the collection of microbes that live in the human body. Computer programs are needed, he says, to help find trends in how the microbiome changes in an individual over time, or to better compare the microbiomes of related people and find similarities.

“There’s a lot of data being generated in this field and the analysis tools they have are pretty naïve compared to what people do in genetics, say, where there’s been more time to develop quantitative analysis approaches,” he said.

His lab is also working with colleagues in the Department of Anesthesiology to identify patterns within the UCLA medical records system that can help pinpoint people most at risk of complications during surgery. The program they eventually develop, Halperin said, might be able to flag patients that need a closer look by physicians before surgery.

Each time Halperin and his colleagues design a new computational or statistical approach to handle biological data, they’re not content to simply publish the approach. They also design software that makes the approach more accessible to the average biologist. “We don’t just want to develop theoretical methods,” he said. “We want to generate software packages that geneticists around the world can use on their own studies of different diseases.”

And Halperin’s work is never done; as new methods to sequence the genome are developed, and new types of biological data emerge, the demand for computational approaches to handle the data continues, he said.

“A lot of medicine is going in this direction of being more quantitative now. There’s a need for not only people in computer science and medicine who are willing to collaborate, but people who can bridge the gap and understand both sides of the equation. I want to not just develop new approaches, but make an impact in medicine.”