Theme 1: Heterogeneous Computing Platforms
We envision a planet-scale distributed computing infrastructure with a myriad of heterogeneous accelerators. Accelerators will rapidly evolve with applications and, in addition, at any point in time, co-exist with earlier or later generations. Hence, we propose a new methodology to easily generate, deploy, and reconfigure Evolvable accelerators. Groups of accelerators will be organized into Ensembles distributed across one or multiple datacenters. Applications will dynamically pick (and reconfigure) the desired set of accelerators from the ensemble with minor overhead. Advanced runtime and compilation methods will reconfigure multi tenant accelerator ensembles, and map and schedule applications to them. Finally, revamped general-purpose cores will differentiate to increase performance and energy efficiency.
The computing infrastructure will include highly-heterogeneous distributed memory and storage resources. As workloads relentlessly increase their data needs, the memory reachable by processors as local memory will expand across an entire rack–creating a formidable memory wall that we will meet with novel processor structures and gracefully-degrading coherence mechanisms. To utilize heterogeneous memory and storage assets efficiently, we will develop new abstractions that allow applications to select the type of asset needed. Moreover, we will develop theory-grounded scalable algorithms to apportion these assets efficiently among thousands of competing applications in the datacenter and billions of allocation requests. Ubiquitous intelligent memory and storage blocks distributed across the memory hierarchy will be harnessed to operate in a coordinated manner.
Papers and Presentations:
Snapshot: Fast, Userspace Crash Consistency for CXL and PM Using msync
Suyash Mahar, Mingyao Shen, Terence Kelly, Steven Swanson
2023 IEEE 41st International Conference on Computer Design (ICCD)
10.1198/ICCD58817.2023.00082
Profiling gem5 Simulator
Johnson Umeike, Neel Patel, Alex Manley, Amin Mamandipor, Heechul Yun, Mhommad Alian
IPASS 2023
10.1109/ISPASS57527.2023.00019
2024
EdgeScaler:Smart (Auto-)Scaling for the 5G Edge
Lauren Trinks, Bilal Saleem, Muhammad Shahbaz
APSys 2024
Per-Bank Bandwidth Regulation of Shared Last-Level Cache for Real-Time Systems
C. Sullivan, A. Manley, M. Alian and H. Yun
2024 IEEE Real-Time Systems Symposium (RTSS)
10.1109/RTSS62706.2024.00036
FloatAP: Supporting High-Performance Floating-Point Arithmetic in Associate Processors
Kailin Yang, Jose Martinez
2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)
10.1109/MICRO61859.2024.00055
Telepathic Datacenters: Fast RPCs using Shared CSL Memory
Suyash Mahar, Ehsan Hajyjasini, Seungjin Lee, Zifeng Zhang, Mingyao Shen, Steven Swanson
arXiv:2408.11325
Userspace Networking in gem5
J. Umeike, S. Agarwal, N. Lazarev and M. Alian
2024 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)
10.1109/ISPASS61541.2024.00026
Caravan: Practical Online Learning of In-Network ML Models with Labeling Agents
izheng Zhang, Ali Imran, Enkeleda Bardhi, Tushar Swamy, Nathan Zhang, Muhammad Shahbaz, Kunle Olukotun
PACMI '24: Proceedings of the 3rd Workshop on Practical Adoption Challenges of ML for Systems
10.1145/3704742.3704964
Supporting a Virtual Vector Instruction Set on a Commercial Compute-in-SRAM Accelerator
Courtney Golden, Dan Ilan, Caroline Huang, Niansong Zhang, Zhiru Zhang, and Christopher Batten
IEEE COMPUTER ARCHITECTURE LETTERS, VOL. 23, NO. 1, JANUARY-JUNE 2024
10.1109/LCA.2023.3341389
PrimeNet: Pre-Training for Irregular Multivariate Time Series
Ranak Roy Chowdhury, Jiacheng Li, Xiyuan Zhang, Dezhi Hong, Rajesh K. Gupta, Jingbo Shang
Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI ’23) Feb 7, 2023
10.1609/aaai.v37i6.25876
Proteus: HLS-based NoC Generator and Simulator
Abhimanyu Rajeshkumar BambhaniyaYangyu ChenAnshumanRohan BanerjeeTushar Krishna
Design, Automation and Test in Europe Conference April 2023
10.23919/DATE56975.2023.10137173
SPADE: A Flexible and Scalable Accelerator for SpMM and SDDMM
Gerasimos Gerogiannis, Serif Yesil, Damitha Lenadora, Dingyuan Cao, Charith Mendis, Josep Torrellas
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture June 2023
10.1145/3579371.3589054
SENSEi: Input-sensitive dense-sparse primitive compositions for GNN acceleration
Damitha Lenadora, Vimarsh Sathia, Gerasimos Gerogiannis, Serif Yesil, Josep Torrellas, Charith Mendis
arxiv.org/abs/2306.15155 June 2023
FluRKA: Fast fused Low-Rank & Kernel Attention
Ahan Gupta, Yueming Yuan, Yanqi Zhou, Charith Mendis
10.48550/arXiv.2306.15799 June 2023
MXFaaS: Resource Sharing in Serverless Environments for Parallelism and Efficiency
Jovan Stojkovic, Tianyin Xu, Hubertus Franke, Josep Torrellas
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture June 2023
10.1145/3579371.3589069
µManycore: A Cloud-Native CPU for Tail at Scale
Jovan Stojkovic, Chunao Liu, Muhammad Shahbaz, Josep Torrellas
ISCA '23: Proceedings of the 50th Annual International Symposium on Computer Architecture June 2023
10.1145/3579371.3589068
Arvon: A Heterogeneous System-in-Package Integrating FPGA and DSP Chiplets for Versatile Workload Acceleration
Cheng-Hsun Lu , Junkang Zhu, Tianyu Wei , Wei Tang , Zhengya Zhang
2023 Symposium on VLSI Circuits June 2023
10.1109/JSSC.2023.3343457
Towards Diverse and Coherent Augmentation for Time-Series Forecasting
Xiyuan Zhang, Ranak Roy Chowdhury, Jingbo Shang, Rajesh Gupta, Dezhi Hong
ICASSP 2023 June 2023
10.48550/arXiv.2303.14254
Unleashing the Power of Shared Label Structures for Human Activity Recognition
Xiyuan Zhang, Ranak Roy Chowdhury, Jiayun Zhang, Rajesh K. Gupta, Jingbo Shang, Dezhi Hong
CIKM 2023 October 2023
10.48550/arXiv.2301.03462
Micro-Armed Bandit: Lightweight & Reusable Reinforcement Learning for Microarchitecture Decision-Making
Gerasimos Gerogiannis, Josep Torrellas
Proceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture (Micro ’23)
10.1145/3613424.3623780 Oct 2023
Machine Learning Hardware Design for Efficiency, Flexibility and Scalability
Jie-Fang Zhang, Zhengya Zhang
IEEE Circuits and Systems Magazine ( IF 6.9 ) Pub Date: October 2023
10.1109/mcas.2023.3302390
Large Graph Property Prediction via Graph Segment Training
Kaidi Cao, Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Dustin Zelle, Yanqi Zhou, Charith Mendis, Jure Leskovec, Bryan Perozzi
arXiv:2305.12322 Nov 2023
TpuGraphs: A Performance Prediction Dataset on Large Tensor Computational Graphs
Charith Mendis, Phitchaya Mangpo Phothilimthana, Sami Abu-El-Haija, Bahare Fatem, Bryan Perozzi, & Kaidi Cao
Workshop on Graph Learning Benchmarks Dec 2023
10.48550/arXiv.2308.13490
Slapo: A Schedule Language for Progressive Optimization of Large Deep Learning Model Training
Hongzheng Chen, Codi Hao Yu, Shuai Zheng, Zhen Zhang, Zhiru Zhang, Yida Wang
arXiv:2302.08005 December 2023
An Intermediate Language for General Sparse Format Customization
Jie Liu, Zhongyuan Zhao, Zijian Ding, Benjamin Brock, Hongbo Rong, Zhiru Zhang
IEEE Computer Architecture Letters (Volume: 22, Issue: 2, July-Dec. 2023)
10.1109/LCA.2023.3262610
2024
Contextual Inference From Sparse Shopping Transactions Based on Motif Patterns
J. Zhang, X. Zhang, D. Hong, R. K. Gupta and J. Shang
IEEE Transactions on Knowledge and Data Engineering
0.1109/TKDE.2024.3452638
SmoothE: Differentiable E-Graph Extraction
Cai, Yaohui & Yang, Kaixin & Deng, Chenhui & Yu, Cunxi & Zhang, Zhiru
ASPLOS '25: 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems
10.1145/3669940.3707262
ARIES: an Agile MLIR-Based Compilation Flow for Reconfigurable Devices with AI Engines
Jinming Zhuang, Shaojie Xiang, Hongzheng Chen, Niansong Zhang, Zhuoping Yang, Tony Mao, Zhiru Zhang, Peipei Zhou
International Symposium on Field-Programmable Gate Arrays (FPGA)
Design Approach for Die-to-Die Interfaces to Enable Energy-Efficient Chiplet Systems
Vikram Jain, Wei Tang, Zuoguo Wu, Viansa Schmulbach, Sophia Shao, Zhengya Zhang, Borivoje Nikolic
ISLPED '24: Proceedings of the 29th ACM/IEEE International Symposium on Low Power Electronics and Design
10.1145/3665314.3680473
Dynamo LLM Designing LLM Inference Clusters for Performance and Energy Efficiency
Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, Esha Choukse
arXiv.org
arXiv:2408.00741
SPLAT: A framework for optimised GPU code-generation for SParse reguLar ATtention
Ahan Gupta, Yueming Yuan, Devansh Jain, Yuhao Ge, David Aponte, Yanqi Zhou, Charith Mendis
10.48550/arXiv.2407.16847
Quick, Thorough and Scalable Pre-silicon Verification with G-QED
Saranyu Chattopadhyay, Subhasish Mitra
TechCon 2024
Towards Efficient Temporal Graph Learning: Algorithms, Frameworks, and Tools
Ruijie Wang, Wanyu Zhao, Dachun Sun, Charith Mendis, Tarek Abdelzaher
CIKM '24: Proceedings of the 33rd ACM International Conference on Information and Knowledge Management
10.1145/3627673.367910
Two-Face: Combining Collective and One-Sided Communication for Efficient Distributed SpMM
Charles Block, Gerasimos Gerogiannis, Charith Mendis, Ariful Azad, Josep Torrellas
ASPLOS '24: Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2
10.1145/3620665.3640427
Energy-efficient parallel interconnects for chiplet integration
W. Tang, C. Liu and Z. Zhang,
IEEE Micro 2024
10.1109/MM.2024.3450841
Sparsity At Scale -- Towards Efficient Distributed Sparse Accelerators
Gerasimos Gerogiannis, Charles Block, Josep Torrellas
TechcCon 2024
Mosaic Harnessing Micro-architectural Resources of Servers in Serverless Environments
J. Stojkovic, E. Choukse, E. Saurez, Í. Goiri and J. Torrellas
2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)
10.1109/MICRO61859.2024.00103
Playground A Safe Building Operating System
X. Fu, Y. Liu, J. Koh, D. Hong, R. Gupta and G. Fierro
ACM/IEEE 15th International Conference on Cyber-Physical Systems (ICCPS)
10.1109/ICCPS61052.2024.00017
Practical Online Reinforcement Learning for Microprocessors with Micro-Armed Bandit
Gerasimos Gerogiannis, Josep Torrellas
IEEE Micro 2024
10.1109/MM.2024.3408719
Defensive ML Adversarial Machine Learning as a Practical Architectural Defense for Side Channels
Hyoungwook Nam, Raghavendra Pradyumna Pothukuchi, Bo Li, Nam Sung Kim, Josep Torrellas
International Conference on Parallel Architectures and Compilation Techniques
10.1145/3656019.3676952
Distributed Memory Parallel Algorithms for Sparse Matrix and Sparse Tall and Skinny Matrix Multiplication
Isuru Ranawaka, Md Taufique Hussain, Charles Block, Gerasimos Gerogiannis, Josep Torrellas, Ariful Azad
SC24: International Conference for High Performance Computing, Networking, Storage and Analysis
10.1109/SC41406.2024.00052
A Smart Cache for a SmartNIC!
Annus Zulfiqar; Ali Imran; Venkat Kunaparaju; Ben Pfaff; Gianni Antichi; Muhammad Shahbaz
2024 IEEE Hot Chips 36 Symposium (HCS)
10.1109/HCS61935.2024.1066488
Large Language Models for Time Series: A Survey
Xiyuan Zhang, Ranak Roy Chowdhury, Rajesh K. Gupta, Jingbo Shang
arXiv:2402.01801
How Few Davids Improve One Goliath: Federated Learning in Resource-Skewed Edge Computing Environments
Zhang, Jiayun and Li, Shuheng and Huang, Haiyu and Wang, Zihan and Fu, Xiaohan and Hong, Dezhi and Gupta, Rajesh K. and Shang, Jingbo
WWW '24: Proceedings of the ACM Web Conference 2024
10.1145/3589334.364554
SmartPAF: Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption
Jianming Tong, Jing Dang, Anupam Golder, Callie Hao, A. Raychowdhury, Tushar Krishna
Conference on Machine Learning and Systems
10.48550/arXiv.2404.03216
FEATHER, A Reconfigurable Accelerator with Data Reordering Support for Low-Cost On-Chip Dataflow Switching
J. Tong, A. Itagi, P. Chatarasi and T. Krishna
2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
10.1109/ISCA59077.2024.00024
Hydride: Synthesis based compiler for modern architectures
Thirimadura Charith Mendis, Stefanos Baziotis
ASPLOS 24
Allo: A Programming Model for Composable Accelerator Design
Hongzheng Chen, Niansong Zhang, Shaojie Xiang, Zhichen Zeng, Mengjia Dai, Zhiru Zhang
Proceedings of the ACM on Programming Languages, Volume 8, Issue PLDI
10.1145/3656401
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen, Jiahao Zhang, Yixiao Du, Shaojie Xiang, Zichao Yue, Niansong Zhang, Yaohui Cai, Zhiru Zhang
ACM Transactions on Reconfigurable Technology and Systems, Volume 18, Issue 1
10.1145/365617
Hades Hardware Assisted Distributed Transactions Age of Fast Networks and SmartNICS
A. Kokolis, A. Psistakis, B. Reidys, J. Huang and J. Torrellas
2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
10.1109/ISCA59077.2024.00062
EcoFaaS: Rethinking the Design of Serverless Environments for Energy Efficiency
Jovan Stojkovic; Nikoleta Iliakopoulou; Tianyin Xu; Hubertus Franke, Josep Torrellas
2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
10.1109/ISCA59077.2024.00042
SmartOClock: Workload- and Risk-Aware Overclocking in the Cloud
Jovan Stojkovic, Pulkit Misra, Íñigo Goiri, Sam Whitlock, Esha Choukse, Mayukh Das, Chetan Bansal, Jason Lee, Zoey Sun, Haoran Qiu*, Reed Zimmermann†, Savyasachi Samal, Brijesh Warrier, Ashish Raniwala, Ricardo Bianchini
2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)
10.1109/ISCA59077.2024.00040
TetriX: Flexible Architecture and Optimal Mapping for Tensorized Neural Network Processing
J. -F. Zhang, C. -H. Lu and Z. Zhang
IEEE Transactions on Computers
10.1109/TC.2024.3365936
Polynormer: Polynomial-Expressive Graph Transformer in Linear Time
Chenhui Deng, Zichao Yue, Zhiru Zhang
arXiv:2403.01232
Less is More: Hop-Wise Graph Attention for Scalable and Generalizable Learning on Circuits
Chenhui Deng, Zichao Yue, Cunxi Yu, Gokce Sarar, Ryan Carey, Rajeev Jain, Zhiru Zhang
DAC '24: Proceedings of the 61st ACM/IEEE Design Automation Conference
10.1145/3649329.365738