Data Sciences for the Life Sciences

April 1, 2014

From ResearchNext the Research Digest of UMass Amherst
Statistical, life science, and social science researchers gathered for a workshop on “Data Sciences for the Life Sciences in a High Performance Computing Environment” in February: the first formal opportunity for the researchers to learn how to effectively utilize the MGHPCC facility.

A group of more than 40 statistical, life science, and social science researchers braved the cold and dark of an early February morning to be part of two firsts in regional high performance computing efforts in the life sciences. Their participation in the workshop “Data Sciences for the Life Sciences in a High Performance Computing Environment” was the first formal opportunity for life sciences researchers to learn how to effectively utilize the Holyoke Massachusetts Green High Performance Computing Center (MGHPCC) facility for their research. The workshop was also the first offering of the new Biostatistics in Practice series, jointly sponsored by UMass Amherst and the MGHPCC.
ICB3-088-edit-1-1024x682These firsts reflect the increasing role of statistical, mathematical and computational research methods for life science research as well as the increasing need for researchers to work with large-scale data from diverse sources. Campus sponsors of the series are the Graduate Program in Biostatistics and the UMass Institute for Computational Biology, Biostatistics, and Bioinformatics (ICB3).
ICB3-071-1-1024x682Workshop participants included faculty and graduate students from UMass Amherst as well as other colleges, universities, and healthcare organizations, all curious to access the MGHPCC in order to learn state-of-the-art tools from a teaching team drawn from UMass Amherst Biostatistics faculty. Assistant professor Nicholas Reich, workshop director, ICB3 director and head of biostatistics Andrea Foulkes, and lecturer Gregory Matthews each delivered modules that together provided a foundational curriculum on statistical computing using R, an open-source and freely-available statistical programming language, in a high performance computing environment while providing practical experience in using it on the MGHPCC platform. In addition to the instructors, teaching assistants provided a high-level of individual support for workshop participants.
R is rapidly being adopted as the programming language of choice for researchers at the intersection of life and statistical sciences. As an open-source language it is affordable to researchers regardless of budget and facilitates sharing of the code for new techniques and tools. This flexibility seems to be especially valuable to researchers in biostatistics, bioinformatics, and computational biology, who are being challenged to invent new approaches and methods to cope with the analytical power to draw insights from “big data,” which can be too voluminous or complex to be understood using conventional methods alone.
On a practical level, high-performance computing is another key to drawing insight from data that is measured in terabytes or petabytes and can far exceed both the computing and storage capacities of even the most powerful desktop computers. MGHPCC director John Goodhue explains, “Moving and processing big data with conventional methods is a bit like asking a single person to move and read every book in the Library of Congress with a hand truck and reading lamp. The MGHPCC is equipped for high bandwidth communications to allow data to be moved in or out expeditiously and to connect many computing “cores” so they can work “in parallel” to handle large and/or complex datasets, thus making it possible for researchers to run programs that analyze the data in a reasonable period of time—hours or days instead of weeks.” Participants were able to tour the MGHPCC facility to gain a better appreciation of the thoughtful design as an environmentally responsible high-performance computing resource as well as its research capacity.
ICB3-050-edit-1-1024x765Speaking about the changing role of statistical and computational methodologies in life science research Andrea Foulkes notes, “Biomedical researchers are able to generate large quantities of data providing in-depth coverage within and across individuals. The MGHPCC offers state-of-the-art computational resources for data management and processing which, coupled with powerful R tools, enable researchers to turn data into knowledge.”
Looking forward to future offerings, Reich sees many opportunities. “We were pleased that the workshop was fully subscribed well in advance of the meeting. Given the level of interest we will consider holding it again soon, perhaps as early as this summer. We have other topics in mind as well, and with the start-up of the UMass Institute for Applied Life Sciences we expect the list to grow.”
All images courtesy: School of Public Health and Health Sciences, UMass, Amherst.
Story by Karen Lauter-Utgoff

Research projects

Dusty With a Chance of Star Formation
Checking the Medicine Cabinet to Interrupt COVID-19 at the Molecular Level
Not Too Hot, Not Too Cold But Still, Is It Just Right?​
Smashing Discoveries​
Microbiome Pattern Hunting
Modeling the Air we Breathe
Exploring Phytoplankton Diversity
The Computer Will See You Now
Computing the Toll of Trapped Diamondback Terrapins
Edging Towards a Greener Future
Physics-driven Drug Discovery
Modeling Plasma-Surface Interactions
Sensing Subduction Zones
Neural Networks & Earthquakes
Small Stars, Smaller Planets, Big Computing
Data Visualization using Climate Reanalyzer
Getting to Grips with Glassy Materials
Modeling Molecular Engines
Forest Mapping: When the Budworms come to Dinner
Exploring Thermoelectric Behavior at the Nanoscale
The Trickiness of Talking to Computers
A Genomic Take on Geobiology
From Grass to Gas
Teaching Computers to Identify Odors
From Games to Brains
The Trouble with Turbulence
A New Twist
A Little Bit of This... A Little Bit of That..
Looking Like an Alien!
Locking Up Computing
Modeling Supernovae
Sound Solution
Lessons in a Virtual Test Tube​
Crack Computing
Automated Real-time Medical Imaging Analysis
Towards a Smarter Greener Grid
Heading Off Head Blight
Organic Light-Harvesting Antennae
Art and AI
Excited by Photons
Tapping into an Ocean of Data
Computing Global Change
Star Power
Engineering the Human Microbiome
Computing Social Capital
Computers Diagnosing Disease
All Research Projects

Collaborative projects

ALL Collaborative PROJECTS

Outreach & Education Projects

See ALL Scholarships
100 Bigelow Street, Holyoke, MA 01040