Theme 1: Heterogeneous Computing Platforms

We envision a planet-scale distributed computing infrastructure with a myriad of heterogeneous accelerators. Accelerators will rapidly evolve with applications and, in addition, at any point in time, co-exist with earlier or later generations. Hence, we propose a new methodology to easily generate, deploy, and reconfigure Evolvable accelerators. Groups of accelerators will be organized into Ensembles distributed across one or multiple datacenters. Applications will dynamically pick (and reconfigure) the desired set of accelerators from the ensemble with minor overhead. Advanced runtime and compilation methods will reconfigure multi tenant accelerator ensembles, and map and schedule applications to them. Finally, revamped general-purpose cores will differentiate to increase performance and energy efficiency.

Inside Theme 1-2 — Design-space exploration of reconfigurable ASIC accelerators (Courtesy of Zhiru Zhang).

The computing infrastructure will include highly-heterogeneous distributed memory and storage resources. As workloads relentlessly increase their data needs, the memory reachable by processors as local memory will expand across an entire rack–creating a formidable memory wall that we will meet with novel processor structures and gracefully-degrading coherence mechanisms. To utilize heterogeneous memory and storage assets efficiently, we will develop new abstractions that allow applications to select the type of asset needed. Moreover, we will develop theory-grounded scalable algorithms to apportion these assets efficiently among thousands of competing applications in the datacenter and billions of allocation requests. Ubiquitous intelligent memory and storage blocks distributed across the memory hierarchy will be harnessed to operate in a coordinated manner.

Heterogeneous Intelligent Memory and Storage (IMS) blocks present in multiple locations of the memory hierarchy of a distributed machine. — Heterogeneous Intelligent Memory and Storage (IMS) blocks present in multiple locations of the memory hierarchy of a distributed machine (Courtesy of Steven Swanson).

Papers and Presentations:

2023

Snapshot: Fast, Userspace Crash Consistency for CXL and PM Using msync

Suyash Mahar, Mingyao Shen, Terence Kelly, Steven Swanson

2023 IEEE 41st International Conference on Computer Design (ICCD)

10.1198/ICCD58817.2023.00082

Profiling gem5 Simulator

Johnson Umeike, Neel Patel, Alex Manley, Amin Mamandipor, Heechul Yun, Mhommad Alian

IPASS 2023

10.1109/ISPASS57527.2023.00019

2024

EdgeScaler:Smart (Auto-)Scaling for the 5G Edge
Lauren Trinks, Bilal Saleem, Muhammad Shahbaz
APSys 2024

Per-Bank Bandwidth Regulation of Shared Last-Level Cache for Real-Time Systems
C. Sullivan, A. Manley, M. Alian and H. Yun
2024 IEEE Real-Time Systems Symposium (RTSS)
10.1109/RTSS62706.2024.00036

FloatAP: Supporting High-Performance Floating-Point Arithmetic in Associate Processors
Kailin Yang, Jose Martinez
2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)
10.1109/MICRO61859.2024.00055

Telepathic Datacenters: Fast RPCs using Shared CSL Memory
Suyash Mahar, Ehsan Hajyjasini, Seungjin Lee, Zifeng Zhang, Mingyao Shen, Steven Swanson
arXiv:2408.11325

Userspace Networking in gem5
J. Umeike, S. Agarwal, N. Lazarev and M. Alian
2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
10.1109/ISPASS61541.2024.00026

Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents
izheng Zhang, Ali Imran, Enkeleda Bardhi, Tushar Swamy, Nathan Zhang, Muhammad Shahbaz, Kunle Olukotun
PACMI '24: Proceedings of the 3rd Workshop on Practical Adoption Challenges of ML for Systems
10.1145/3704742.3704964

Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator

Courtney Golden, Dan Ilan, Caroline Huang, Niansong Zhang, Zhiru Zhang, and Christopher Batten

IEEE COMPUTER ARCHITECTURE LETTERS, VOL. 23, NO. 1, JANUARY-JUNE 2024

10.1109/LCA.2023.3341389

PrimeNet: Pre-Training for Irregular Multivariate Time Series

Ranak Roy Chowdhury, Jiacheng Li, Xiyuan Zhang, Dezhi Hong, Rajesh K. Gupta, Jingbo Shang

Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI ’23) Feb 7, 2023

10.1609/aaai.v37i6.25876

Proteus: HLS-based NoC Generator and Simulator

Abhimanyu Rajeshkumar Bambhaniya; Yangyu Chen; Anshuman; Rohan Banerjee; Tushar Krishna

Design, Automation and Test in Europe Conference April 2023

10.23919/DATE56975.2023.10137173

SPADE: A Flexible and Scalable Accelerator for SpMM and SDDMM

Gerasimos Gerogiannis, Serif Yesil, Damitha Lenadora, Dingyuan Cao, Charith Mendis, Josep Torrellas

ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture June 2023

10.1145/3579371.3589054

SENSEi: Input-sensitive dense-sparse primitive compositions for GNN acceleration

Damitha Lenadora, Vimarsh Sathia, Gerasimos Gerogiannis, Serif Yesil, Josep Torrellas, Charith Mendis

arxiv.org/abs/2306.15155 June 2023

FluRKA: Fast fused Low-Rank & Kernel Attention

Ahan Gupta, Yueming Yuan, Yanqi Zhou, Charith Mendis

10.48550/arXiv.2306.15799 June 2023

MXFaaS: Resource Sharing in Serverless Environments for Parallelism and Efficiency

Jovan Stojkovic, Tianyin Xu, Hubertus Franke, Josep Torrellas

ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture June 2023

10.1145/3579371.3589069

µManycore: A Cloud-Native CPU for Tail at Scale

Jovan Stojkovic, Chunao Liu, Muhammad Shahbaz, Josep Torrellas

ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture June 2023

10.1145/3579371.3589068

Arvon: A Heterogeneous System-in-Package Integrating FPGA and DSP Chiplets for Versatile Workload Acceleration

Cheng-Hsun Lu , Junkang Zhu, Tianyu Wei , Wei Tang , Zhengya Zhang

2023 Symposium on VLSI Circuits June 2023

10.1109/JSSC.2023.3343457

Towards Diverse and Coherent Augmentation for Time-Series Forecasting

Xiyuan Zhang, Ranak Roy Chowdhury, Jingbo Shang, Rajesh Gupta, Dezhi Hong

ICASSP 2023 June 2023

10.48550/arXiv.2303.14254

Unleashing the Power of Shared Label Structures for Human Activity Recognition

Xiyuan Zhang, Ranak Roy Chowdhury, Jiayun Zhang, Rajesh K. Gupta, Jingbo Shang, Dezhi Hong

CIKM 2023 October 2023

10.48550/arXiv.2301.03462

Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-Making

Gerasimos Gerogiannis, Josep Torrellas

Proceedings of the 56^th Annual IEEE/ACM International Symposium on Microarchitecture (Micro ’23)

10.1145/3613424.3623780 Oct 2023

Machine Learning Hardware Design for Efficiency, Flexibility and Scalability

Jie-Fang Zhang, Zhengya Zhang

IEEE Circuits and Systems Magazine ( IF 6.9 ) Pub Date: October 2023

10.1109/mcas.2023.3302390

Large Graph Property Prediction via Graph Segment Training

Kaidi Cao, Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Dustin Zelle, Yanqi Zhou, Charith Mendis, Jure Leskovec, Bryan Perozzi
arXiv:2305.12322 Nov 2023

TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs

Charith Mendis, Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Bahare Fatem, Bryan Perozzi, & Kaidi Cao

Workshop on Graph Learning Benchmarks Dec 2023

10.48550/arXiv.2308.13490

Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training

Hongzheng Chen, Codi Hao Yu, Shuai Zheng, Zhen Zhang, Zhiru Zhang, Yida Wang
arXiv:2302.08005 December 2023

An Intermediate Language for General Sparse Format Customization

Jie Liu, Zhongyuan Zhao, Zijian Ding, Benjamin Brock, Hongbo Rong, Zhiru Zhang

IEEE Computer Architecture Letters (Volume: 22, Issue: 2, July-Dec. 2023)

10.1109/LCA.2023.3262610

2024

Contextual Inference From Sparse Shopping Transactions Based on Motif Patterns
J. Zhang, X. Zhang, D. Hong, R. K. Gupta and J. Shang
IEEE Transactions on Knowledge and Data Engineering
0.1109/TKDE.2024.3452638

SmoothE: Differentiable E-Graph Extraction
Cai, Yaohui & Yang, Kaixin & Deng, Chenhui & Yu, Cunxi & Zhang, Zhiru
ASPLOS '25: 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
10.1145/3669940.3707262

ARIES: an Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines
Jinming Zhuang, Shaojie Xiang, Hongzheng Chen, Niansong Zhang, Zhuoping Yang, Tony Mao, Zhiru Zhang, Peipei Zhou
International Symposium on Field-Programmable Gate Arrays (FPGA)

Design Approach for Die-to-Die Interfaces to Enable Energy-Efficient Chiplet Systems

Vikram Jain, Wei Tang, Zuoguo Wu, Viansa Schmulbach, Sophia Shao, Zhengya Zhang, Borivoje Nikolic
ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design
10.1145/3665314.3680473

Dynamo LLM Designing LLM Inference Clusters for Performance and Energy Efficiency
Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, Esha Choukse
arXiv.org

arXiv:2408.00741

SPLAT: A framework for optimised GPU code-generation for SParse reguLar ATtention
Ahan Gupta, Yueming Yuan, Devansh Jain, Yuhao Ge, David Aponte, Yanqi Zhou, Charith Mendis

10.48550/arXiv.2407.16847

Quick, Thorough and Scalable Pre-silicon Verification with G-QED
Saranyu Chattopadhyay, Subhasish Mitra
TechCon 2024

Towards Efficient Temporal Graph Learning: Algorithms, Frameworks, and Tools
Ruijie Wang, Wanyu Zhao, Dachun Sun, Charith Mendis, Tarek Abdelzaher
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
10.1145/3627673.367910

Two-Face: Combining Collective and One-Sided Communication for Efficient Distributed SpMM
Charles Block, Gerasimos Gerogiannis, Charith Mendis, Ariful Azad, Josep Torrellas
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
10.1145/3620665.3640427

Energy-efficient parallel interconnects for chiplet integration
W. Tang, C. Liu and Z. Zhang,
IEEE Micro 2024
10.1109/MM.2024.3450841

Sparsity At Scale -- Towards Efficient Distributed Sparse Accelerators
Gerasimos Gerogiannis, Charles Block, Josep Torrellas
TechcCon 2024

Mosaic Harnessing Micro-architectural Resources of Servers in Serverless Environments
J. Stojkovic, E. Choukse, E. Saurez, Í. Goiri and J. Torrellas
2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)
10.1109/MICRO61859.2024.00103

Playground A Safe Building Operating System
X. Fu, Y. Liu, J. Koh, D. Hong, R. Gupta and G. Fierro
ACM/IEEE 15th International Conference on Cyber-Physical Systems (ICCPS)
10.1109/ICCPS61052.2024.00017

Practical Online Reinforcement Learning for Microprocessors with Micro-Armed Bandit
Gerasimos Gerogiannis, Josep Torrellas
IEEE Micro 2024
10.1109/MM.2024.3408719

Defensive ML Adversarial Machine Learning as a Practical Architectural Defense for Side Channels
Hyoungwook Nam, Raghavendra Pradyumna Pothukuchi, Bo Li, Nam Sung Kim, Josep Torrellas
International Conference on Parallel Architectures and Compilation Techniques
10.1145/3656019.3676952

Distributed Memory Parallel Algorithms for Sparse Matrix and Sparse Tall and Skinny Matrix Multiplication
Isuru Ranawaka, Md Taufique Hussain, Charles Block, Gerasimos Gerogiannis, Josep Torrellas, Ariful Azad
SC24: International Conference for High Performance Computing, Networking, Storage and Analysis
10.1109/SC41406.2024.00052

A Smart Cache for a SmartNIC!
Annus Zulfiqar; Ali Imran; Venkat Kunaparaju; Ben Pfaff; Gianni Antichi; Muhammad Shahbaz
2024 IEEE Hot Chips 36 Symposium (HCS)
10.1109/HCS61935.2024.1066488

Large Language Models for Time Series: A Survey
Xiyuan Zhang, Ranak Roy Chowdhury, Rajesh K. Gupta, Jingbo Shang
arXiv:2402.01801

How Few Davids Improve One Goliath: Federated Learning in Resource-Skewed Edge Computing Environments
Zhang, Jiayun and Li, Shuheng and Huang, Haiyu and Wang, Zihan and Fu, Xiaohan and Hong, Dezhi and Gupta, Rajesh K. and Shang, Jingbo
WWW '24: Proceedings of the ACM Web Conference 2024
10.1145/3589334.364554

SmartPAF: Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption
Jianming Tong, Jing Dang, Anupam Golder, Callie Hao, A. Raychowdhury, Tushar Krishna
Conference on Machine Learning and Systems
10.48550/arXiv.2404.03216

FEATHER, A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching
J. Tong, A. Itagi, P. Chatarasi and T. Krishna
2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
10.1109/ISCA59077.2024.00024

Hydride: Synthesis based compiler for modern architectures
Thirimadura Charith Mendis, Stefanos Baziotis
ASPLOS 24

Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, Zhiru Zhang
Proceedings of the ACM on Programming Languages, Volume 8, Issue PLDI
10.1145/3656401

Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang
ACM Transactions on Reconfigurable Technology and Systems, Volume 18, Issue 1
10.1145/365617

Hades Hardware Assisted Distributed Transactions Age of Fast Networks and SmartNICS
A. Kokolis, A. Psistakis, B. Reidys, J. Huang and J. Torrellas
2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
10.1109/ISCA59077.2024.00062

EcoFaaS: Rethinking the Design of Serverless Environments for Energy Efficiency
Jovan Stojkovic; Nikoleta Iliakopoulou; Tianyin Xu; Hubertus Franke, Josep Torrellas
2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
10.1109/ISCA59077.2024.00042

SmartOClock: Workload- and Risk-Aware Overclocking in the Cloud
Jovan Stojkovic, Pulkit Misra, Íñigo Goiri, Sam Whitlock, Esha Choukse, Mayukh Das, Chetan Bansal, Jason Lee, Zoey Sun, Haoran Qiu*, Reed Zimmermann†, Savyasachi Samal, Brijesh Warrier, Ashish Raniwala, Ricardo Bianchini
2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
10.1109/ISCA59077.2024.00040

TetriX: Flexible Architecture and Optimal Mapping for Tensorized Neural Network Processing
J. -F. Zhang, C. -H. Lu and Z. Zhang
IEEE Transactions on Computers
10.1109/TC.2024.3365936

Polynormer: Polynomial-Expressive Graph Transformer in Linear Time
Chenhui Deng, Zichao Yue, Zhiru Zhang
arXiv:2403.01232

Less is More: Hop-Wise Graph Attention for Scalable and Generalizable Learning on Circuits
Chenhui Deng, Zichao Yue, Cunxi Yu, Gokce Sarar, Ryan Carey, Rajeev Jain, Zhiru Zhang
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference
10.1145/3649329.365738