The remarkable performance achieved in a variety of application areas (natural language processing, computer vision, games, etc.) has led to the emergence of heterogeneous architectures to accelerate machine learning workloads. In parallel, production deployment, model complexity and diversity pushed for higher productivity systems, more powerful programming abstractions, software and system architectures, dedicated runtime systems and numerical libraries, deployment and analysis tools. Deep learning models are generally memory and computationally intensive, for both training and inference. Accelerating these operations has obvious advantages, first by reducing the energy consumption (e.g. in data centers), and secondly, making these models usable on smaller devices at the edge of the Internet. In addition, while convolutional neural networks have motivated much of this effort, numerous applications and models involve a wider variety of operations, network architectures, and data processing. These applications and models permanently challenge computer architecture, the system stack, and programming abstractions. The high level of interest in these areas calls for a dedicated forum to discuss emerging acceleration techniques and computation paradigms for machine learning algorithms, as well as the applications of machine learning to the construction of such systems.
The workshop brings together researchers and practitioners working on computing systems for machine learning, and using machine learning to build better computing systems. It also reaches out to a wider community interested in this rapidly growing area, to raise awareness of the existing efforts, to foster collaboration and the free exchange of ideas.
This builds on the success of our previous events 2nd AccML at ISCA 2020 and 1st AccML at HiPEAC 2020.
Vice president, general manager & fellow, Machine Learning Group, Arm
Title: Enabling innovation for the AI future
AI is a once in a generation change in computing that’s expanding the capabilities of cloud server to the tiniest IoT device. Today, most ML is still performed on Arm CPUs and we continue to improve the efficiency of them, but dedicated Neural Network Processors -- NPUs -- are becoming more prevalent because they can dramatically improve the efficiency and performance of systems in a world where more and more ML must be run. For many years now, Arm has been on a mission to create the foundations to realize the opportunity of AI. The opportunities are massive, the market is open to everyone and Arm is reducing risks by providing trusted blocks of technology, and through its ecosystem, delivering further tools and technologies to get to market faster and make it easier to unlock the value.
Jem Davies is a fellow, vice president and general manager of Arm’s Machine Learning Group, focusing on machine learning and artificial intelligence solutions. He was previously general manager and vice president of technology for the Media Processing, and Imaging and Vision, Groups, where he set the future technology roadmaps and undertook technological investigations for several of Arm’s acquisitions. Prior to that he ran his own software consultancy for many years.
Associate Professor, MIT
Title: How to Evaluate Efficient Deep Neural Network Approaches
Enabling the efficient processing of deep neural networks (DNNs) has becoming increasingly important to enable the deployment of DNNs on a wide range of platforms, for a wide range of applications. To address this need, there has been a significant amount of work in recent years on designing DNN accelerators and developing approaches for efficient DNN processing that spans the computer vision, machine learning, and hardware/systems architecture communities. Given the volume of work, it would not be feasible to cover them all in a single talk. Instead, this talk will focus on *how* to evaluate these different approaches, which include the design of DNN accelerators and DNN models. It will also highlight the key metrics that should be measured and compared and present tools that can assist in the evaluation..
Vivienne Sze is an associate professor of electrical engineering and computer science at MIT. She is also the director of the Energy-Efficient Multimedia Systems research group at the Research Lab of Electronics (RLE). Sze works on computing systems that enable energy-efficient machine learning, computer vision, and video compression/processing for a wide range of applications, including autonomous navigation, digital health, and the internet of things. She is widely recognized for her leading work in these areas and has received many awards, including the AFOSR and DARPA Young Faculty Award, the Edgerton Faculty Award, several faculty awards from Google, Facebook, and Qualcomm, the 2018 Symposium on VLSI Circuits Best Student Paper Award, the 2017 CICC Outstanding Invited Paper Award, and the 2016 IEEE Micro Top Picks Award. As a member of the JCT-VC team, she received the Primetime Engineering Emmy Award for the development of the HEVC video compression standard. She co-author of the recent book entitled “Efficient Processing of Deep Neural Networks” (Morgan & Claypool, 2020). For more information about research in the Energy-Efficient Multimedia Systems Group at MIT visit: http://www.rle.mit.edu/eems/.
Chief Software Architect, Graphcore
Title: Advanced software and compilation techniques in ML
One thing that is noticeable about modern deep learning software and hardware is the amount of compiler technology involved. The demands of the field have given rise to a variety of advanced techniques for manipulating and optimizing descriptions of computation. In this talk I'll discuss the factors behind this, go through a range of these techniques -including how Graphcore is applying them to our IPU processors - and speculate on how much more this kind of software can help progress in the field.
Dave Lacey is Chief Software Architect at Graphcore overseeing the direction and design of software developed to assist engineers create the best machine learning solutions on Graphcore IPUs. He has a PhD from the University of Oxford in Computer Science and has over 19 years of experience in research and development of programming tools and applications in many areas including machine learning, HPC and embedded systems. Prior to Graphcore, he worked at the University of Oxford, the University of Warwick, Clearspeed Technology and XMOS.
Technical Director, IEEE & ST Fellow, System Research and Applications, STMicroelectronics
Title: INertial Sensor Neural Computing Acrobatics
Tiny Machine learning (TinyML) is a community focusing on Deep Learning (DL) deployment on ultra-low power devices such as micro-controllers (MCU). Unfortunately, MCU resources are far beyond the storage and computing capabilities of sensing devices, which are characterized by low cost silicon nodes, such 130 nm or 90 nm. To implement artificial neural networks one must use same technology node of sensor devices to allow super-integration. This imposes the parsimonious usage of memory and mathematical operators at the lower bit-depth, such as 1-bit for activations and weights processing. This represents a hard challenge if one objective is to achieve high accuracy compared to floating-point processing at μW power envelope. Deeply Quantized Neural Networks (DQNNs) are the most promising techniques to enable that. The extreme case is fully Binarized Neural Networks, where only 1 bit is used for representation. However, such models must be carefully designed to avoid a significant accuracy degradation. In addition to that, current MCUs are not able to exploit the advantages of DQNNs. Thus, the design of custom energy efficient HW accelerators represents the promising solution in terms of energy efficiency, especially for in-sensing neural computing. In this talk, a novel quantized NN model, namely Hybrid Neural Network (HNN), is shown. The model reach up to 99% accuracy in classifying daily human activities from MEMS inertial sensors. Its custom ultra-low power HW circuitry for the real-time execution of the HNN is presented with CMOS technologies (90, 65 nm) and implemented with FPGA with associated demo.
One year before graduating from the Polytechnic University of Milan in 1992, Danilo PAU joined STMicroelectronics, where he worked on HDMAC and MPEG2 video memory reduction, video coding, embedded graphics, and computer vision. Today, his work focuses on developing solutions for deep learning tools and applications. Since 2019 Danilo is an IEEE Fellow, serves as Industry Ambassador coordinator for IEEE Region 8 South Europe, vice-chairman of the “Intelligent Cyber-Physical Systems” Task Force within IEEE CIS, and Member of the Machine Learning, Deep Learning and AI in the CE (MDA) Technical Stream Committee IEEE Consumer Electronics Society (CESoc). With over 80 patents, 94 publications, 113 MPEG authored documents and more than 31 invited talks/seminars at various worldwide Universities and Conferences, Danilo's favorite activity remains mentoring undergraduate students, MSc engineers and PhD students from various universities in Italy, US, France and India.
|Time (CET/Brussels)||Online Event - 18th January 2021|
|9:30 AM – 9:40 AM||Welcome|
|9:40 AM – 10:30 AM||Keynote talk 1: Enabling innovation for the AI future (Jem Davies, ARM)|
|10:30 AM – 10:50 AM||Paper talk: Understanding Cache Boundness of ML Operators on ARM Processors
Bernhard Klein, Christoph Gratl, Manfred Mücke and Holger Fröning
|11:00 AM – 11:30 AM||Comfort break|
|11:30 AM – 12:10 PM||Invited talk: Advanced software and compilation techniques in ML (David Lacey, Graphcore)|
|12:10 pM – 12:30 PM||Paper talk: Using the Graphcore IPU for traditional HPC applications
Thorben Louw and Simon McIntosh-Smith
|12:30 PM – 1:00 PM||EU project presentations: VEDLIoT (Pedro Trancoso, Chalmers), ALOHA (Paolo Meloni, University of Cagliari)|
|1:00 PM – 3:00 PM||Break (Lunch + HiPEAC Keynote)|
|3:00 pM – 3:50 PM||Keynote talk 2: How to Evaluate Efficient Deep Neural Network Approaches (Vivienne Sze, MIT)|
|3:50 PM – 4:10 PM||Paper talk: NPS: A Compiler-aware Framework of Unified Network Pruning for Beyond Real-Time Mobile Acceleration
Zhengang Li, Geng Yuan, Wei Niu, Yanyu Li, Pu Zhao, Yuxuan Cai, Xuan Shen, Zheng Zhan, Zhenglun Kong, Qing Jin, Bin Ren, Yanzhi Wang and Xue Lin
|4:10 PM – 4:30 PM||Paper talk: Neural Pruning Search for Real-Time Object Detection of Autonomous Vehicles
Pu Zhao, Geng Yuan, Yuxuan Cai, Wei Niu, Bin Ren, Yanzhi Wang and Xue Lin
|4:30 PM – 5:00 PM||Comfort break|
|5:00 PM – 5:40 PM||Invited talk: INertial Sensor Neural Computing Acrobatics (Danilo Pau, STMicroelectronics)|
|5:40 PM – 6:00 PM||Paper talk: BlinkNet: Software-Defined Deep Learning Analytics with Bounded Resources
Brian Koga, Theresa Vanderweide, Xinghui Zhao and Xuechen Zhang
|6:00 PM – 6:05 PM||Closing remarks|
Topics of interest include (but are not limited to):
Novel ML systems: heterogeneous multi/many-core systems, GPUs and FPGAs;
Software ML acceleration: languages, primitives, libraries, compilers and frameworks;
Novel ML hardware accelerators and associated software;
Emerging semiconductor technologies with applications to ML hardware acceleration;
ML for the construction and tuning of systems;
Cloud and edge ML computing: hardware and software to accelerate training and inference;
Computing systems research addressing the privacy and security of ML-dominated systems;
November 8 November 30, 2020
Notification to authors:
December 4 December 15, 2020
Papers should be in double column IEEE format of between 4 and 8 pages including references. Papers should be uploaded as PDF and not anonymized.
Submissions can be made at https://easychair.org/my/conference?conf=3rdaccml.
Papers will be reviewed by the workshop's technical program committee according to criteria regarding a submission's quality, relevance to the workshop's topics, and, foremost, its potential to spark discussions about directions, insights, and solutions on the topics mentioned above. Research papers, case studies, and position papers are all welcome.
In particular, we encourage authors to keep the following options in mind when preparing submissions:
Tentative Research Ideas: Presenting your research idea early one to get feedback and enable collaborations.
Works-In-Progress: To facilitate sharing of thought-provoking ideas and high-potential though preliminary research, authors are welcome to make submissions describing early-stage, in-progress, and/or exploratory work in order to elicit feedback, discover collaboration opportunities, and generally spark discussion.