Track 2 Late Morning
Emerging Technology Applications on Personalized Edge LLMs
Abstract
dge-based Large Language Models (edge LLMs) can preserve the promising abilities of LLM while ensuring user data privacy. Additionally, edge LLMs can be utilized in various fields without internet connectivity constraints. However, edge LLMs face significant challenges in training, deployment, and inference. Limitations in memory storage, computational power, and data I/O operations can hinder the deployment of advanced LLMs on edge devices. These constraints often result in poor performance in customization, real-time user interaction, and adaptation to novel situations. Traditional acceleration methods, primarily designed for advanced computation platforms, may not be optimal for all types of edge devices. As a complementary solution, Compute-in-Memory (CiM) architectures based on emerging non-volatile memory (NVM) devices offer promising opportunities. These architectures, having demonstrated numerous advantages in traditional neural networks, can help overcome the computational memory bottleneck of edge devices and reduce competition for core computational resources. Through the introduction of software-hardware co-design and co-optimization methods, NVCiM can significantly enhance edge LLM performance in resource-limited environments. Moreover, NVCiM-based edge LLM systems are more cost-effective compared to LLMs running on high-performance computing devices. This makes them suitable for various personalized applications, particularly in healthcare and medical fields.
Speakers
Dr. Yiyu Shi
University of Notre Dame
Dr. Yiyu Shi is currently a professor in the Department of Computer Science and Engineering at the University of Notre Dame, the site director of National Science Foundation I/UCRC Alternative and Sustainable Intelligent Computing, and the director of the Sustainable Computing Lab (SCL). He received his B.S. in Electronic Engineering from Tsinghua University, Beijing, China in 2005, the M.S and Ph.D. degree in Electrical Engineering from the University of California, Los Angeles in 2007 and 2009 respectively. His current research interests focus on hardware intelligence and biomedical applications. In recognition of his research, more than a dozen of his papers have been nominated for or awarded as the best paper in top journals and conferences, including the 2023 ACM/IEEE William J. McCalla ICCAD Best Paper Award and 2021 IEEE Transactions on Computer-Aided Design Donald O Pederson Best Paper Award. He is also the recipient of Facebook Research Award, IBM Invention Achievement Award, NSF CAREER Award, IEEE Region 5 Outstanding Individual Achievement Award, IEEE Computer Society Mid-Career Research Achievement Award, among others. He has served on the technical program committee of many international conferences. He is the deputy editor-in-chief of IEEE VLSI CAS Newsletter, and an associate editor of various IEEE and ACM journals. He is an IEEE CEDA distinguished lecturer and an ACM distinguished speaker.
Dr. Jinjun Xiong
University at Buffalo New York
Dr. Jinjun Xiong (F’23) received his Ph.D. degree from the University of California Los Angeles (UCLA), USA, in 2006. He is currently Empire Innovation Professor with the Department of Computer Science and Engineering at University at Buffalo (UB). He is also the Director of UB’s Institute for Artificial Intelligence and Data Science (IAD), the Scientific Director for the National Artificial Intelligence Institute for Exceptional Education (AI4ExceptionalEd), and the AI Thrust Lead for the National Center for Early Literary and Responsible AI (CELaRAI). Prior to UB, he was a Senior Researcher and Program Director for AI and Hybrid Clouds Systems at the IBM Thomas J. Watson Research Center, Yorktown Heights, New York, USA. His research interests are on across-stack AI systems research, which include AI applications, algorithms, tooling, and computer architectures. Many of his research results have been adopted in industrial products and tools. His publication has won 9 Best Paper Awards and 10 Nominations for Best Paper Awards at various international conferences.
Dr. Ruiyang Qin
University of Notre Dame
Dr. Ruiyang Qin is currently an Assistant Professor in the Department of Electrical and Computer Engineering at the Villanova University. He received B.S. and M.S. in Computer Science from Georgia Institute of Technology in 2020 and 2021, respectively. He received Ph.D. in Computer Science and Engineering from University of Notre Dame, advised by Professor Yiyu Shi. He also closely works with Professor Jinjun Xiong. He published research papers on DAC, ICCAD, DATE and ASP-DAC. His research focuses on personalized edge AI for healthcare backed on emerging technologies. By cross-layer design from AI algorithms to emerging techniques like FeFET-based CiM architectures, he conducts research works to build intelligent and low-cost edge AI systems to promote the healthcare conditions to benefit the wide social groups such as senior adults with dementia or children with specific language impairment. He received Edison Innovation Fellowship from IDEA center at Notre Dame in 2024, received Institute for Artificial Intelligence and Data Science (IAD) academic fellowship from University at Buffalo in 2024, and received William J. McCalla Best Paper Nomination at ICCAD in 2024 for his work using emerging techniques to optimize edge AI system. He also serves as reviewer and PC in conferences and journals including NeurIPS, AAAI, JMIR, ICLR, AISTATS, and RO-MAN. The research works included in this tutorial are made by Ruiyang as the first author.