김남승 (Nam Sung Kim)
직함: 교수
University of Illinois, Urbana-Champaign
Distributed training of Deep Neural Networks (DNN) is an emerging technique to reduce the training time of large DNNs for sophisticated applications. In existing distributed training approaches, however, the communication time to periodically exchange parameters (i.e., weights) and gradients among nodes over the network constitutes a large fraction of the training time. To reduce the communication time, we propose an algorithm/hardware co-design, named INCEPTIONN (In-Network Computing to Exchange and Process Training Information Of Neural Network). This approach builds on two observations: (1) gradients are much more tolerant to precision loss than parameters, and (2) a central parameter server aggregating gradients from all nodes and distributing updated parameters to them creates a bottleneck. Specifically, we first propose a gradient-centric distributed training algorithm exchanging only gradients among nodes in a decentralized manner without the need for the parameter server. As such, it can better overlap communication with computation across nodes than a parameter server approach and deploy a more aggressive lossy compression algorithm to all the information exchanged among nodes. Second, exploiting unique value characteristics of gradients, we propose a lossy compression algorithm, optimized for compressing gradients. This algorithm delivers high compression ratios with practically no loss in accuracy of trained DNNs. Third, our experiments show that performing compression on the CPU increases total training time despite reduced communication time. To address this challenge, we propose an in-network computing approach that delegates the lossy compression task to an accelerator integrated within a Network Interface Card (NIC). Our experiments demonstrate that INCEPTIONN can reduce the communication time by 74.3 ∼ 79.7% and total training time of DNNs by 59.9 ∼ 69.9% with little degradation in accuracy of trained DNNs, compared to an existing distributed training approach.
I am a faculty at the University of Illinois, Urbana-Champaign and an IEEE Fellow. Prior to joining the University of Illinois in the fall of 2015, I was an associate professor at the University of Wisconsin, Madison where I was early-tenured in 2013. My interdisciplinary research incorporates device, circuit, architecture, and software for power-efficient computing. My research has been supported by National Science Foundation (NSF), Semiconductor Research Corporation (SRC), Defense Advanced Research Project Agency (DARPA), BAE Systems, AMD, IBM, Samsung, and Microsoft. Prior to joining the University of Wisconsin, Madison, I was a senior research scientist at Intel from 2004 to 2008, where I conducted research in power-efficient digital circuit and process architecture. I have published nearly 180 refereed articles to highly-selective conferences and journals in the field of digital circuit, processor architecture, and computer-aided design. The top three most frequently cited papers have more than 3500 citations and the total number of citations of all his papers exceeds 8000. I was a recipient of the IEEE Design Automation Conference (DAC) Student Design Contest Award in 2001, Intel Fellowship in 2002, the IEEE International Symposium on Microarchitecture (MICRO) Best Paper Award in 2003, NSF CAREER Award in 2010, IBM Faculty Award in 2011 and 2012, and the University of Wisconsin Villas Associates Award in 2015, ACM/IEEE Most Influential International Symposium on Computer Architecture (ISCA) Paper Award in 2017. I am a member of IEEE International Symposium on High-Performance Computer Architecture (HPCA) Hall of Fame and IEEE International Symposium on Microarchitecture (MICRO) Hall of Fame. I earned a PhD degree in Computer Science and Engineering from the University of Michigan, Ann Arbor and Master and Bachelor degrees in Electrical Engineering from the Korea Advanced Institute of Science and Technology.