HiPEAC workshop on Accelerated Machine Learning (AccML)

HiPEAC 2020 workshop

20th January, 2020

Bologna, Italy

In the last 5 years, the remarkable performance achieved in a variety of application areas (natural language processing, computer vision, games, etc.) has led to the emergence of heterogeneous architectures to accelerate machine learning workloads. In parallel, production deployment, model complexity and diversity pushed for higher productivity systems, more powerful programming abstractions, software and system architectures, dedicated runtime systems and numerical libraries, deployment and analysis tools. Deep learning models are generally memory and computationally intensive, for both training and inference. Accelerating these operations has obvious advantages, first by reducing the energy consumption (e.g. in data centers), and secondly, making these models usable on smaller devices at the edge of the Internet. In addition, while convolutional neural networks have motivated much of this effort, numerous applications and models involve a wider variety of operations, network architectures, and data processing. These applications and models permanently challenge computer architecture, the system stack, and programming abstractions. The high level of interest in these areas calls for a dedicated forum to discuss emerging acceleration techniques and computation paradigms for machine learning algorithms, as well as the applications of machine learning to the construction of such systems.

The workshop brings together researchers and practitioners working on computing systems for machine learning, and using machine learning to build better computing systems. It also reaches out to a wider community interested in this rapidly growing area, to raise awareness of the existing efforts, to foster collaboration and the free exchange of ideas.

This builds on the success of our previous event Emerging Deep Learning Accelerators workshop at HiPEAC 2019.

Call For Contributions

Topics

Topics of interest include (but are not limited to):

Novel ML systems: heterogeneous multi/many-core systems, GPUs and FPGAs;
Software ML acceleration: languages, primitives, libraries, compilers and frameworks;
Novel ML hardware accelerators and associated software;
Emerging semiconductor technologies with applications to ML hardware acceleration;
ML for the construction and tuning of systems;
Cloud and edge ML computing: hardware and software to accelerate training and inference;
Computing systems research addressing the privacy and security of ML-dominated systems;

Important Dates

Submission deadline: ~~November 8th~~
November 22nd, 2019 (11:59 PM PDT)
Notification to authors: December 13th, 2019

Paper Format

Regular and short papers in Springer format, recommending 9 and 5 pages respectively (no firm page limits). Papers should be in PDF format and not anonymized.

Submission Site

Submissions can be made at easychair.org/conferences/?conf=accml2020.

Submission Options

Papers will be reviewed by the workshop's technical program committee according to criteria regarding a submission's quality, relevance to the workshop's topics, and, foremost, its potential to spark discussions about directions, insights, and solutions on the topics mentioned above. Research papers, case studies, and position papers are all welcome.

In particular, we encourage authors to keep the following options in mind when preparing submissions:

Tentative Research Ideas: Presenting your research idea early one to get feedback and enable collaborations.
Works-In-Progress: To facilitate sharing of thought-provoking ideas and high-potential though preliminary research, authors are welcome to make submissions describing early-stage, in-progress, and/or exploratory work in order to elicit feedback, discover collaboration opportunities, and generally spark discussion.

Keynote Speaker

Luca Benini

ETH Zürich

Title: Extreme Edge AI on Open Hardware

Abstract
Edge Artificial Intelligence (AI) is the new mega-trend, as privacy concerns and networks bandwidth/latency bottlenecks prevent cloud offloading of AI functions in many application domains, from autonomous driving to advanced prosthetics. Hence we need to push AI toward sensors and actuators. I will give an overview of recent efforts in developing systems of-on-chips based on open source hardware and capable of significant analytics and AI functions "at the extreme edge", i.e. within the limited power budget of traditional microcontrollers that can be co-located and integrated with the sensors/actuators themselves. These open, extreme edge AI platforms create an exciting playground for research and innovation.

Bio
Luca Benini holds the chair of digital Circuits and systems at ETHZ and is Full Professor at the Universita di Bologna. He received a PhD from Stanford University. In 2009-2012 he served as chief architect in STmicroelectronics France. Dr. Benini's research interests are in energy-efficient computing systems design, from embedded to high-performance. He is also active in the design ultra-low power VLSI Circuits and smart sensing micro-systems. He has published more than 1000 peer-reviewed papers and five books. He is an ERC-advanced grant winner, a Fellow of the IEEE, of the ACM and a member of the Academia Europaea. He is the recipient of the 2016 IEEE CAS Mac Van Valkenburg award and of the 2019 IEEE TCAD Donald O. Pederson Best Paper Award.

Invited Speakers

Carole-Jean Wu

Facebook AI, Arizona State University

Title: Machine Learning At Scale: Heterogeneity and Scalability Challenges for ML Systems

Abstract
Machine learning systems are being widely deployed in production datacenter infrastructure and over billions of edge devices. This talk seeks to address key system design challenges when scaling machine learning solutions to billions of people. What are key similarities and differences between cloud and edge infrastructure? The talk will conclude with open system research directions for deploying machine learning at scale.

Bio
Carole-Jean Wu is a Research Scientist at Facebook’s AI Infrastructure Research. She is also a tenured Associate Professor of CSE in Arizona State University. Carole-Jean’s research focuses in Computer and System Architectures. More recently, her research has pivoted into designing systems for machine learning. She is the leading author of “Machine Learning at Facebook: Understanding Inference at the Edge” that presents unique design challenges faced when deploying ML solutions at scale to the edge, from over billions of smartphones to Facebook’s virtual reality platforms. Carole-Jean received her Ph.D. and M.A. from Princeton and B.Sc. from Cornell.

Rune Holm

Arm

Title: Big neural networks in small spaces: Towards end-to-end optimisation for ML at the edge

Abstract
Neural networks have taken over use case after use case, from image recognition, speech recognition, image enhancement to driving cars, and show no sign of letting up. Yet so many of these use cases are done by acquiring data and sending it off to the cloud for inference. On-device ML brings unprecedented capabilities and opportunities to edge devices with improved privacy, security, and reliability. This talk explores the many aspects of system optimisation for edge ML, from training-time optimisation, to the compilation of neural networks, to the design of machine learning hardware, and looks at ways to save execution time and memory footprint while preserving accuracy.

Bio
Rune Holm has been part of the semiconductor industry for more than a decade. He started out on Mali GPUs, doing GPU microarchitecture and designing shader compilers for VLIW cores. He then moved on to research into experimental GPGPU designs and architectures targeting HPC, machine learning and computer vision. He’s currently part of the Arm Machine Learning Group, focusing on neural network accelerator architecture and compilers optimising for these designs.

Albert Cohen

Google, Paris

Title: Abstractions, Algorithms and Infrastructure for Post-Moore Optimizing Compilers

Abstract
MLIR is a recently announced open source infrastructure to accelerate innovation in machine learning (ML) and high-performance computing (HPC). It addresses the growing software and hardware fragmentation across machine learning frameworks, enabling machine learning models to be consistently represented and executed on any type of hardware. It also unifies graph representations and operators for ML and HPC. It facilitates the design and implementation of code generators, translators and optimizations at different levels of abstraction and also across application domains, hardware targets and execution environments.
We will share our vision, progress and plans in the MLIR project, zooming in on graph-level and loop nest optimization as illustrative examples.

Bio
Albert Cohen is a research scientist at Google. He worked as a research scientist at Inria from 2000 to 2018. He graduated from École Normale Supérieure de Lyon and received his PhD from the University of Versailles in 1999 (awarded two national prizes). He has also been a visiting scholar at the University of Illinois, an invited professor at Philips Research, and a visiting scientist at Facebook Artificial Intelligence Research. Albert Cohen works on parallelizing and optimizing compilers, parallel programming languages and systems, and synchronous programming for reactive control systems. He served as the general or program chair of some of the main conferences in the area and a member of the editorial board of two journals. He co-authored more than 180 peer-reviewed papers and has been the advisor for 26 PhD theses. Several research projects led by Albert Cohen resulted in effective transfer to production compilers and programming environments.

Program

Time	Event (in room Bianca A) 20th January 2020
10:00–10:05	Welcome
10:05–11:00	Keynote: Extreme Edge AI on Open Hardware (Luca Benini, ETH Zurich and U. di Bologna)
11:00–11:30	Coffee break
11:30–13:00	Invited talk: Machine Learning At Scale: Heterogeneity and Scalability Challenges for ML Systems (Carole-Jean Wu, Facebook and Arizona State University) (40 min) Paper presentations Evaluating Achievable Latency and Cost: SSD Latency Predictorsi (MittOS Model Inference) (25 min) (Olivia Weng and Andrew Chien) An On-the-Fly Feature Map Compression Engine for Background Memory Access Cost Reduction in DNN Inference (25 min) (Georg Rutishauser, Lukas Cavigelli and Luca Benini)
13:00–14:00	Lunch break
14:00–15:30	Invited talk: Big neural networks in small spaces: Towards end-to-end optimisation for ML at the edge (Rune Holm, Arm) (40 min) Paper presentations A Vertically Integrated Framework to Deploy Deep Neural Networks on Extreme Edge Devices (25 min) (Francesco Conti, Alessio Burrello, Angelo Garofalo, Davide Rossi and Luca Benini) Benchmarking Performance and Power of USB Accelerators for Inference with MLPerf (25 min) (Leandro Ariel Libutti, Francisco D. Igual, Luis Piñuel, Laura De Giusti and Marcelo Naiouf)
15:30–16:00	Coffee break
16:00–17:20	Invited talk: Abstractions, Algorithms and Infrastructure for Post-Moore Optimizing Compilers (Albert Cohen, Google France) (40 min) Paper presentations A functional pattern-based language in MLIR (25 min) (Martin Lücke, Michel Steuwer and Aaron Smith) An in-depth Study of Neural Machine Translation Performance (25 min) (Simla Burcu Harma, Mario Drumond, Babak Falsafi and Oğuz Ergin)
17:20–17:30	Closing remarks

Organizers

José Cano (University of Glasgow)

Valentin Radu (University of Edinburgh)

Marco Cornero (DeepMind)

Albert Cohen (Google)

Dominik Grewe (DeepMind)

Alex Ramirez (Google)

Program Committee

José Cano (University of Glasgow)

Albert Cohen (Google)

Marco Cornero (DeepMind)

Dominik Grewe (DeepMind)

Valentin Radu (University of Edinburgh)

Alex Ramirez (Google)

Olivier Temam (DeepMind)

Nicolas Vasilache (Google)

Dimitrios Vytiniotis (Google)

Oleksandr Zinenko (Google)

Contact

If you have any questions, please feel free to send an email to accml-info@inf.ed.ac.uk.

1st Workshop on
Accelerated Machine Learning (AccML)

Co-located with the HiPEAC 2020 Conference

HiPEAC 2020 workshop

20th January, 2020

Bologna, Italy

Call For Contributions

Topics

Important Dates

Paper Format

Submission Site

Submission Options

Keynote Speaker

Luca Benini

Invited Speakers

Carole-Jean Wu

Rune Holm

Albert Cohen

Program

Organizers

José Cano (University of Glasgow)

Valentin Radu (University of Edinburgh)

Marco Cornero (DeepMind)

Albert Cohen (Google)

Dominik Grewe (DeepMind)

Alex Ramirez (Google)

Program Committee

José Cano (University of Glasgow)

Albert Cohen (Google)

Marco Cornero (DeepMind)

Dominik Grewe (DeepMind)

Valentin Radu (University of Edinburgh)

Alex Ramirez (Google)

Olivier Temam (DeepMind)

Nicolas Vasilache (Google)

Dimitrios Vytiniotis (Google)

Oleksandr Zinenko (Google)

Contact