2nd Workshop on
Accelerated Machine Learning (AccML)

Co-located with the ISCA 2020 Conference


In this second AccML workshop, we aim to bring together researchers working in Machine Learning and System Architecture to discuss requirements, opportunities, challenges and next steps in developing novel approaches for machine learning systems.

Find Out More

ISCA 2020 workshop

31st May, 2020

Worldwide Event


In the last 5 years, the remarkable performance achieved in a variety of application areas (natural language processing, computer vision, games, etc.) has led to the emergence of heterogeneous architectures to accelerate machine learning workloads. In parallel, production deployment, model complexity and diversity pushed for higher productivity systems, more powerful programming abstractions, software and system architectures, dedicated runtime systems and numerical libraries, deployment and analysis tools. Deep learning models are generally memory and computationally intensive, for both training and inference. Accelerating these operations has obvious advantages, first by reducing the energy consumption (e.g. in data centers), and secondly, making these models usable on smaller devices at the edge of the Internet. In addition, while convolutional neural networks have motivated much of this effort, numerous applications and models involve a wider variety of operations, network architectures, and data processing. These applications and models permanently challenge computer architecture, the system stack, and programming abstractions. The high level of interest in these areas calls for a dedicated forum to discuss emerging acceleration techniques and computation paradigms for machine learning algorithms, as well as the applications of machine learning to the construction of such systems.

The workshop brings together researchers and practitioners working on computing systems for machine learning, and using machine learning to build better computing systems. It also reaches out to a wider community interested in this rapidly growing area, to raise awareness of the existing efforts, to foster collaboration and the free exchange of ideas.

This builds on the success of the First AccML at HiPEAC 2020.

Call For Contributions


Topics

Topics of interest include (but are not limited to):

  • Novel ML systems: heterogeneous multi/many-core systems, GPUs and FPGAs;

  • Novel ML hardware accelerators and associated software;

  • Emerging semiconductor technologies with applications to ML hardware acceleration;

  • ML for the construction and tuning of systems;

  • Cloud and edge ML computing: hardware and software to accelerate training and inference;

  • Computing systems research addressing the privacy and security of ML-dominated systems;



Submissions

Important Dates

Submission deadline: May 1 May 8, 2020
Notification to authors: May 15 May 20, 2020

Paper Format

Papers should be in double column IEEE format of between 4 and 8 pages including references. Papers should be uploaded as PDF and not anonymized.

Submission Site

Submissions can be made at easychair.org/conferences/?conf=2ndaccml.



Papers will be reviewed by the workshop's technical program committee according to criteria regarding a submission's quality, relevance to the workshop's topics, and, foremost, its potential to spark discussions about directions, insights, and solutions on the topics mentioned above. Research papers, case studies, and position papers are all welcome.

The workshop does not have formal proceedings, so accepted papers do not preclude publishing at future conferences and/or journals..

Invited Speakers




Antonio Gonzalez

Antonio Gonzalez

Universitat Politècnica de Catalunya

Title: Removing Ineffectual Computations in Neural Networks

Abstract

There is a growing interest in extending computing devices with the ability to analyze and understand signals and data coming from a large variety of activities in our daily live, and provide real time responses in complex situations, with the goal to emulate human perception and problem solving. Examples include personal assistants, self-driving cars, domestics robots and health-care devices just to name a few. Neural networks have proven to be an effective approach to support many of these functionalities.

Most of these systems have very limited energy budgets so the effectiveness of this approach is strongly dependent on the energy-efficiency of the adopted solution. In this talk we present several alternative directions for improving the energy-efficiency of neural networks based on identifying and removing ineffectual computations.

Bio
Antonio González (Ph.D. 1989) is a Full Professor at the Computer Architecture Department of the Universitat Politècnica de Catalunya, Barcelona (Spain), and the director of the Architecture and Compiler research group. He was the founding director of the Intel Barcelona Research Center from 2002 to 2014. His research has focused on computer architecture. In this area, Antonio holds 52 patents, has published over 370 research papers and has given over 120 invited talks. He has also made multiple contributions to the design of the architecture of several commercial microprocessors.

Antonio has been program chair for ICS, ISPASS, MICRO, HPCA and ISCA, and general chair for MICRO and HPCA among other symposia. He has served on the program committee for over 130 international symposia in the field of computer architecture, and has been Associate Editor of the IEEE Transactions on Computers, IEEE Transactions on Parallel and Distributed Systems, IEEE Computer Architecture Letters, ACM Transactions on Architecture and Code Optimization, ACM Transactions on Parallel Computing, and Journal of Embedded Computing.

Antonio’s awards include the award to the best student in computer engineering in Spain, the Rosina Ribalta award as the advisor of the best PhD project in Information Technology and Communications, the Duran Farrell award for research in technology, the Aritmel National Award of Informatics to the Computer Engineer of the Year, the King James I award for his contributions in research on new technologies, and the ICREA Academia Award. He is an IEEE Fellow.





David Kaeli

David Kaeli

Northeastern University

Title: Scaling Machine Learning Workloads on Today’s GPUs

Abstract

Machine learning applications place large computational demands on hardware resources when performing classification, regression, clustering and training. What is common in many of these applications is that the quality of the outcome or model improves as we process more data. GPUs have been shown to be an effective platform for accelerating machine learning workloads, though have limits in terms of the amount a single GPU can process. This talk will look at ongoing work in hardware compaction and multi-GPU acceleration, enabling further scaling of machine learning workloads.

Bio
David Kaeli received his BS and PhD in Electrical Engineering from Rutgers University, and an MS in Computer Engineering from Syracuse University. He is presently a COE Distinguished Full Processor on the ECE faculty at Northeastern University, Boston, MA. Dr. Kaeli has published over 350 critically reviewed publications, 7 books, and 13 patents. He serves as the Editor in Chief of ACM Transactions on Computer Architecture and Code Optimization, and an Associate Editor of the IEEE Transactions on Parallel and Distributed Systems and the Journal of Parallel and Distributed Computing. Dr. Kaeli is an IEEE Fellow and an ACM Distinguished Scientist.





Tushar Krishna

Tushar Krishna

Georgia Tech

Title: A Communication-Centric Approach for Designing Flexible DNN Accelerators

Abstract

Deep Neural Networks (DNN) have demonstrated highly promising results across computer vision and speech recognition, and are becoming foundational for ubiquitous AI. The computational complexity of these algorithms and a need for high energy-efficiency has led to a surge in research on hardware accelerators. To reduce the latency and energy costs of accessing DRAM, most DNN accelerators are spatial in nature, with hundreds of processing elements (PE) operating in parallel and communicating with each other directly.

DNNs are evolving at a rapid rate - leading to myriad layer types (convolution, attention, LSTM, MLP) of varying shape (regular and irregular). Given a DNN there can be myriad computationally efficient implementations (e.g., via pruning) - leading to structured and unstructured sparsity. Finally, a given DNN can be tiled and partitioned in myriad ways to exploit data reuse. All of the above can lead to irregular dataflow patterns within the accelerator substrate. Getting high mapping efficiency for all these cases is highly challenging in accelerators today that are often tightly coupled 2D grids with rigid near-neighbor connectivity.

First, given a target DNN, we will demonstrate a systematic methodology for understanding data reuse opportunities within the algorithm and determine the cost vs benefit for efficiently exploiting them in hardware using our dataflow + microarchitectural model called MAESTRO (MICRO 2019 + IEEE Micro Top Picks). Next, we present a systematic communication-centric methodology for accelerator design, that can provide ~100% efficiency for arbitratry DNNs shapes, sparsity ratios and mappings. We demonstrate instances of this approach with two accelerators, MAERI (ASPLOS 2018 + IEEE Micro Top Picks Hon’ mention) and SIGMA (HPCA 2020 + Best Paper Award), that show orders of magnitude better utilization over state-of-the-art baselines like NVIDIA's NVDLA and Google’s TPU.

Bio
Tushar Krishna is an Assistant Professor in the School of Electrical and Computer Engineering at Georgia Tech. He also holds the ON Semiconductor Junior Professorship. He has a Ph.D. in Electrical Engineering and Computer Science from MIT (2014), a M.S.E in Electrical Engineering from Princeton University (2009), and a B.Tech in Electrical Engineering from the Indian Institute of Technology (IIT) Delhi (2007). Before joining Georgia Tech in 2015, Dr. Krishna spent a year as a post-doctoral researcher at Intel, Massachusetts.

Dr. Krishna’s research spans computer architecture, interconnection networks, networks-on-chip (NoC) and deep learning accelerators with a focus on optimizing data movement in modern computing systems. Three of his papers have been selected for IEEE Micro’s Top Picks from Computer Architecture, one more received an honorable mention, and three have won best paper awards. He received the National Science Foundation (NSF) CRII award in 2018, and both a Google Faculty Award and a Facebook Faculty Award in 2019.





Cliff Young

Cliff Young

Google

Title: Reflections on TPUs, Current Problems in Acceleration, and What's Next

Abstract

Google's first TPU has been a remarkably successful accelerator, spawning a sequence of successors and inspiring a wave of new chips from established companies and startups. I'll start with some retrospection about what we got right and the ways in which we were lucky in building that first TPU. Then I'll pivot to the problems I think are currently hard and possibly underserved by our NN accelerator systems (to spoil: programmability, memory, and multi-tenancy). Lastly I'll speculate about where ML might take us: how much might the algorithms and computations change, the implications of the Accelerator Wall, and the virtuous feedback between algorithms and architecture that might be the basis of a true Golden Age for our field.

Bio
Cliff Young is a software engineer in the Google Brain team, where he works on codesign for deep learning accelerators. He is one of the designers of Google’s Tensor Processing Unit (TPU), which is used in production applications including Search, Maps, Photos, and Translate. TPUs also powered AlphaGo’s historic 4-1 victory over Go champion Lee Sedol. Previously, Cliff built special-purpose supercomputers for molecular dynamics at D. E. Shaw Research and worked at Bell Labs. Cliff holds AB, MS, and PhD degrees in computer science from Harvard University.

Program


Time (EDT/New York) Virtual Event - 31st May 2020
9:00 AM – 9:10 AM Welcome
9:10 AM – 10:10 AM
Invited talk: Removing Ineffectual Computations in Neural Networks (9:10 AM – 9:50 AM)
Antonio Gonzalez, Universitat Politècnica de Catalunya

Paper talk: You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference to ANN-Level Accuracy (9:50 AM – 10:10 AM)
Srivatsa P, Kyle Timothy Ng Chu, Yaswanth Tavva, Jibin Wu, Malu Zhang, Haizhou Li and Trevor E. Carlson
10:10 AM – 11:10 AM Invited talk: Scaling Machine Learning Workloads on Today’s GPUs (10:10 AM – 10:50 AM)
David Kaeli, Northeastern University


Paper talk: HCM: Hardware-Aware Complexity Metric for Neural Network Architectures (10:50 AM – 11:10 AM)
Alex Karbachevsky, Chaim Baskin, Evgenii Zheltonozhskii, Yevgeny Yermolin, Freddy Gabbay, Alexander Bronstein and Avi Mendelson
11:10 AM – 11:40 AM Break
11:40 AM – 12:40 PM Invited talk: A Communication-Centric Approach for Designing Flexible DNN Accelerators (11:40 AM – 12:20 PM)
Tushar Krishna, Georgia Tech


Paper talk: STONNE: A Detailed Architectural Simulator for Flexible Neural Network Accelerators (12:20 PM – 12:40 PM)
Francisco Muñoz-Martínez, José L. Abellán, Manuel E. Acacio and Tushar Krishna
12:40 PM – 2:00 PM Invited talk: Reflections on TPUs, Current Problems in Acceleration, and What's Next (12:40 PM – 1:20 PM)
Cliff Young, Google


Paper talk: Statistical Robustness of MCMC Accelerators (1:20 PM – 1:40 PM)
Xiangyu Zhang, Ramin Bashizade, Yicheng Wang, Cheng Lyu, Sayan Mukherjee and Alvin R. Lebeck

Paper talk: Acceleration Techniques for Sampling-based Machine Learning (1:40 PM – 2:00 PM)
Yanqi Liu, Ruth Iris Bahar and Giuseppe Calderoni
2:00 PM – 2:05 PM Closing remarks

Organizers


José Cano (University of Glasgow)

José L. Abellán (Catholic University of Murcia)

Albert Cohen (Google)

Alex Ramirez (Google)




Program Committee


José L. Abellán (Catholic University of Murcia)

Manuel E. Acacio (University of Murcia)

José Cano (University of Glasgow)

Albert Cohen (Google)

Marco Cornero (DeepMind)

David Gregg (Trinity College Dublin)

Dominik Grewe (DeepMind)

Valentin Radu (University of Edinburgh)

Alex Ramirez (Google)

Oleksandr Zinenko (Google)

Contact


If you have any questions, please feel free to send an email to accml-info@inf.ed.ac.uk.