The Trickiness of Talking to Computers

July 13, 2017

by Helen Hill for MGHPCC
James Glass is a senior research scientist at the Massachusetts Institute of Technology. Glass leads the Spoken Language Systems Group in the Computer Science and Artificial Intelligence Laboratory (CSAIL.) His research is focused on automatic speech recognition, unsupervised speech processing, and spoken language understanding. This past spring, assisted by graduate student David Harwath, Glass was the instructor for MIT's 6.345/HST.728 Automatic Speech Recognition class but this year, for the first time, students had the option of using high performance computing resources at the MGHPCC to facilitate their work.
6.345/ HST.728 is a graduate level course aimed at introducing students to the rapidly evolving field of speech recognition and spoken language processing. While the first half of the class covers concepts and computational techniques through traditional lectures, labs and problem sets, where accompanying computation is readily accommodated on MIT's public Athena computing network, the second half of the class comprises a typically much more computationally demanding final term project.
As part of the course curriculum, the students chose a current research topic to explore. Some students chose to write programs capable of automatically recognizing the language that a person was speaking, while other students created systems that were able to infer the emotional state or personality traits of a speaker. Because most of the projects relied on data-hungry and computationally intensive statistical machine learning algorithms, the MGHPCC was key in enabling the students to complete their term projects.
"Having access to the MGHPCC allowed the students to use more sophisticated models involving more complicated elements. Many of the projects draw on machine learning techniques reliant on leveraging large quantities of data to train the models. In the past we let students use our group's facilities but having recently redesigned the course to accommodate a curriculum shift more towards deep neural network models we realized, going forward, students really needed something bigger," says Glass.
Fortunately a timely comment from one of his students mentioning her great experience using MGHPCC with a different project led Glass to contact Christopher Hill the Director of MIT's Research Computing Project who's team then worked with Harwath to provide class members access to the resources they needed.
"Some of the students in the class had access to their own computing resources, but for those who didn't the availability of a facility internal to MIT, with lots of pre-installed libraries they could leverage was terrific. Of course we had other options. For example, some other classes have used Amazon Cloud, but for our purposes this seemed like a much more natural set-up and one we are eager to repeat."
"Siri. Alexa. Voice recognition software has reached a tipping point." Glass tells me. "Nonetheless there is still plenty more room for improvement."
"The ability to speak and use language is a critical skill for machines to master," he says, "and it's a very hard problem because speech is a signal that gets contaminated by noise. The physics of everybody’s vocal tract is different, your linguistic background, your dialect. The sound of your voice changes with the situation you are in, your emotional state, whether you are inside or outside: The speech signal when you say the exact same thing, its never ever the same. Interpreting context, deconstructing dialogue: Giving the students serious HPC tools to work with takes what we can teach them to a new level."
About the Researchers

James Glass is a senior research scientist at the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL) where he heads the Spoken Language Systems Group. He is also a lecturer in the Harvard-MIT Division of Health Sciences and Technology. His primary research interests are in the area of speech communication and human-computer interaction centered on automatic speech recognition and spoken language understanding.
David Harwath is a graduate student in Glass's Group doing research combining speech and visual perception based on collected data of people talking about pictures.


James Glass
Spoken Language Systems Group, MIT

Research projects

Dusty With a Chance of Star Formation
Checking the Medicine Cabinet to Interrupt COVID-19 at the Molecular Level
Not Too Hot, Not Too Cold But Still, Is It Just Right?​
Smashing Discoveries​
Microbiome Pattern Hunting
Modeling the Air we Breathe
Exploring Phytoplankton Diversity
The Computer Will See You Now
Computing the Toll of Trapped Diamondback Terrapins
Edging Towards a Greener Future
Physics-driven Drug Discovery
Modeling Plasma-Surface Interactions
Sensing Subduction Zones
Neural Networks & Earthquakes
Small Stars, Smaller Planets, Big Computing
Data Visualization using Climate Reanalyzer
Getting to Grips with Glassy Materials
Modeling Molecular Engines
Forest Mapping: When the Budworms come to Dinner
Exploring Thermoelectric Behavior at the Nanoscale
The Trickiness of Talking to Computers
A Genomic Take on Geobiology
From Grass to Gas
Teaching Computers to Identify Odors
From Games to Brains
The Trouble with Turbulence
A New Twist
A Little Bit of This... A Little Bit of That..
Looking Like an Alien!
Locking Up Computing
Modeling Supernovae
Sound Solution
Lessons in a Virtual Test Tube​
Crack Computing
Automated Real-time Medical Imaging Analysis
Towards a Smarter Greener Grid
Heading Off Head Blight
Organic Light-Harvesting Antennae
Art and AI
Excited by Photons
Tapping into an Ocean of Data
Computing Global Change
Star Power
Engineering the Human Microbiome
Computing Social Capital
Computers Diagnosing Disease
All Research Projects

Collaborative projects

ALL Collaborative PROJECTS

Outreach & Education Projects

See ALL Scholarships
100 Bigelow Street, Holyoke, MA 01040