Hui Guan’s research enhances speed, scalability, and reliability of machine learning through innovations in algorithms and systems. Her research draws insights from applications, algorithms, and high-performance computing techniques to reduce the costs of model development and enable deep learning in resource-constrained and distributed edge environments.
Edge intelligence pushes intelligent data processing using deep neural networks (DNNs) to the edge of the network, closer to data sources. It enables applications across various fields and has garnered significant attention from both industry and academia. However, the limited resources on edge platforms, such as edge servers and Internet of Things devices, hinder the ability to deliver fast and accurate responses to queries from deep learning prediction tasks. As a result, only some deep learning tasks and smaller DNN models suitable for edge deployment are feasible.
To overcome this limitation, this project explores a new adaptive approach to building deep learning systems. The systems will make real-time adjustments to the DNNs executed for prediction tasks based on the varying resource demands arising from three critical dimensions -- variable task complexity, fluctuating inference workloads, and resource contention in multi-tenant edge environments. The goal is to optimize both system efficiency and accuracy. Realizing the envisioned adaptiveness will facilitate the effective deployment of deep learning techniques across diverse applications and environments.
Over the past year, the team advanced adaptive inference with four systems: CACTUS (context-aware micro-classifiers), Proteus (dynamic model scaling), GMorph (multi-DNN fusion), and DiffServe (query-aware diffusion model serving). These systems significantly improved accuracy, latency, and throughput across diverse platforms, outperforming static and prior state-of-the-art approaches.