ACM International Conference on Supercomputing 2026

6-9 July 2026 Belfast, Northern Ireland, United Kingdom

ICS 2026 Program

All program times are in British Summer Time (BST, UTC+1). Workshop session locations are currently denoted Room [1-6] and will be specified later. Paper session locations are currently denoted Room [ABC] and will be specified later.

Monday Workshop Overview

Day

Time

Room 1

Room 2

Room 3

Room 4

Room 5

Room 6

Monday

08:00-09:00

Registration

09:00-10:30

Workshop: MCCSys

Workshop: Benchmark

Workshop: AI4HPCC

Workshop: Arch4Health

Workshop: WOCC'26

Workshop: Ramulator and DRAM Bender

10:30-11:00

Coffee

11:00-12:30

Workshop: MCCSys

Workshop: Benchmark

Workshop: AI4HPCC

Workshop: Arch4Health

Workshop: WOCC'26

Workshop: Ramulator and DRAM Bender

12:30-13:30

Lunch

13:30-15:00

Workshop: MCCSys

Workshop: Benchmark

Workshop: AI4HPCC

Workshop: Arch4Health

Workshop: PhysQ

Workshop: NextAccel

15:00-15:30

Coffee

15:30-17:00

Workshop: MCCSys

Workshop: Benchmark

Workshop: AI4HPCC

Workshop: Arch4Health

Workshop: PhysQ
(runs until 18:00)

Workshop: NextAccel


Tuesday-Wednesday-Thursday Session Overview

Day

Time

Room A

Room B

Room C

Tuesday

07:00-08:00

Registration

08:10-09:20

Opening + Keynote (Room A)
Scaling AI Computing Sustainably: A Journey Towards Sustainable AI
Dr. Carole Jean Wu, Director of AI Research at Meta

09:20-10:20

Best Paper Candidates (Plenary, Room A)

10:20-10:50

Coffee

10:50-12:10

Runtime Scheduling and Adaptive Execution

Compiler, Code Generation and Autotuning

Energy & Sustainability

12:10-13:40

Lunch

13:40-15:00

Performance Modeling & Insight

Fortran Mini-Workshop

I/O & Storage

15:00-15:30

Coffee

Day

Time

Room A

Room B

Room C

Wednesday

07:30-08:10

Registration

08:10-9:20

Keynote (Room A)
An Open-Source-First Approach: Multi-Level Compiler Design for AI and HPC
Dr Tobias Grosser, Associate Professor in Compiler Design, University of Cambridge

09:20-10:20

Lightning Talks/Posters (Plenary, Room A)
These are lightning talks based on the accepted posters

10:20-10:50

Coffee

10:50-12:10

AI Training Systems

Graph Search & Paths

Communication & Collectives

12:10-13:40

Lunch + Poster Session

13:40-15:00

LLM Serving

Graph Analytics

Near-Memory Architectures

15:00-15:30

Coffee

15:30-17:10

Data Analytics & Compression

Graph Traversal & Connectivity

CXL & Memory Systems

Day

Time

Room A

Room B

Room C

Thursday

07:30-08:10

Registration

08:10-9:20

Keynote (Room A)
Destination Earth: Digital twins of the earth system on Europe’s most powerful supercomputers
Dr Ioan Hadade, Principal Computational Scientist and Team Leader of the HPC Applications team at the European Centre for Medium-Range Weather Forecasts (ECMWF)

09:20-10:20

Cross-Layer Performance Optimization

(No sessions)

10:20-10:50

Coffee

10:50-12:10

AI Inference

GPU-Accelerated Query and Geometry

Resilience and Error Detection

12:10-13:40

Lunch

13:40-14:40

AI Kernels & Parallelism

Sparse & Tensor Kernels

Energy-Aware Systems

15:00-15:30

Coffee

15:30-17:10

Efficient Privacy Computing

Numerical & Scientific Kernels

Quantum Computing


ICS 2026 Paper Session Details

Tuesday 7 July 2026

Time

Room A

Room B

Room C

09:20- 10:20

Best Paper Candidates

Coordinating GPU Data Centers and Power Grid Regulation Service for Exogenous Carbon Benefits
A. Jahanshahi, S. Golrouye, O. Anderson, N. Yu, D. Wong

FLYING SERVING: On-the-Fly Parallelism Switching for Large Language Model Serving
S. Gao, J. Yin, F. Wang, W. Dong

OCTANE: Breaking the Neighbor-List Bottleneck in GPU Molecular Dynamics
H. Toutouni, S. Chakraborty, Y. Tu, J. Huang
(No sessions)

10:50- 12:10

Runtime Scheduling and Adaptive Execution

Barrier-Aware Task Scheduling for Bulk-Synchronous Parallel Architectures
T. Noack, A. Koch

FaaSlim: Partial Caching of Snapshot-based VMs for Serverless Computing
S. Eom, C. Park, G. Lee, H. Moon, Y. Choi

Block-Aware Adaptive State Management for Optimistic Parallel Discrete Event Simulation
X. Peng, Q. Wang, G. Liu, C. Hong, R. Xia, Z. Sun, X. Chen, Q. Zhang, J. Liu

Lock Shielding: A General Technique for Misuse-Resilient Locks
V. Shahare, M. Chabbi, N. Hegde

Compiler, Code Generation and Autotuning

GRASP: Optimizing VLIW Instruction Scheduling via Graph Reinforcement Learning
Z. Wang, W. Tong, J. Fang, Y. Zhang, W. Wang, J. Ren, Z. Tang

Continuation-Preserving Tiling for Pointer-Chasing Optimization in Structured Mutual Recursion
A. Kumar, V. Singh, S. Biswas

S2VEC: Compiler-Driven Stream Specialization for Linearized Vectorization
L. Crespo, A. Fernandes, G. Falcao, P. Tomás, N. Roma, N. Neves

CKTI: A Domain-Specific Compiler for Lowering CUDA Kernels to Triton-IR
C. Shi, R. Chen, Y. Sun, Y. Sui, J. Zhang, Y. Xie, M. Wang, S. Ming, S. Zhang, Y. Zhang

Energy & Sustainability

Wattchmen: Watching the Wattchers – High Fidelity, Flexible GPU Energy Modeling
B. Tran, M. Sinclair, S. Venkataraman, M. Maiterth, W. Shin

Agile QoS-aware Dynamic Power Management with eBPF Governors
M. Rezvani, D. Wong

SmartCap: Coordinated CPU–GPU Power Capping for Performance-Assurance Energy Efficiency
Z. Zheng, Z. Lan, X. Wu, V. Taylor, M. Papka

CATS: Correlation-aware Task Scheduling for GPU Power Optimization in AI Data Centers
S. Subramaniyan, X. Wang

13:40- 15:00

Performance Modeling & Insight

TenProf: A Tensor-Centric Profiler for Deep Learning Workload Analysis and Optimization
X. Ding, K. Zhou, Y. Hao, P. Su

GRASP: Fine-grained and Adaptive Sampled Simulation for GPU Performance Modeling
L. Chao, Z. Huang, P. Cai, J. Xue, T. Xiong, R. Xue

Mantis: Decoding HPC Telemetry Data for Robust System Prediction
Y. Lu, J. Ren, E. Smirni

ViSim: A Lightweight SpMV Performance Simulator via Statistical and Visual Residual Learning
S. Zhu, W. Huangfu, G. Chu

Fortran Mini-Workshop:

An interactive discussion of a recent survey of the international Fortran ecosystem led by Austen Rainer and Andrew Brown

I/O & Storage

Harmonia: Enhancing Data Placement and Migration in Hybrid Storage Systems via Multi-Agent Reinforcement Learning
R. Nadig, V. Arulchelvan, R. Bera, T. Shahroodi, G. Singh, A. Kakolyris, I. Yuksel, M. Sadrosadati, J. Park, O. Mutlu

ColdMap: Compaction-Aware Cost-Benefit Zone Cleaning for ZNS-Based Key-Value Stores
S. Byeon, K. Min, J. Park, S. Lee, H. Kim, J. Han, J. Hwang, Z. Cao, Y. Kim

CoCache: Accelerating Reads in KV Stores via Cooperative Metadata and Data Cache Management
H. Tang, W. Zhu, Q. Zhang, J. Zhang, J. Jiang, Z. Zhang, H. Zhang, Y. Li, Y. Xu

TOTO: Transparent I/O Tuning for HPC Applications
F. Boito, L. Teylo, M. Popov, L. Aimi, A. Bandet, L. Pilla, G. Pallez

Wednesday 8 July 2026

Time

Room A

Room B

Room C

09:20- 10:20

Lightning Talks/Posters

These lightning talks are based on the accepted posters
(No sessions)

10:50- 12:10

AI Training Systems

Closing the Efficiency Gap: AI Datacenter Co-design Roadmap for Scalable Training of LLMs
J. Tithi, H. Wu, J. Park, A. Abuhatzera, F. Petrini, T. Krishna

COMETS: Cost-effective Multi-node Efficient Training System with Memory Pooling and Sharing
H. Chen, S. Yang, M. Soltaniyeh, S. Pei, A. Chang, B. Kim, C. Hao

SPPO: Making Million-Token LLM Training Practical on Modest GPU Clusters
Q. chen, S. Li, W. GAO, P. Sun, Y. Wen, T. Zhang

Rudder: Steering Prefetching in Distributed GNN Training using LLM Agents
A. Sarkar, S. Ghosh, N. Tallent, A. Chadha, T. Roosta, A. Jannesari

Graph Search & Paths

G-PathGen: An Efficient GPU-Parallel k-Critical Path Generation Algorithm
C. Chang, Y. Chung, C. Chiu, W. Lee, B. Zhang, U. Schlichtmann, I. Lin, X. Yu, T. Huang

Parallel Bidirectional A* Search for GPU-Accelerated Pathfinding
H. Al Khansa, J. Luna, A. Mouawad, I. Hajj

MPMOS: Massively Parallel Multi-Objective Shortest Paths
L. Gold, D. Sidoti, K. Pattipati, O. Khan

DistroMatch: Distributed Disjoint Weighted Matchings in Demand-Aware Reconfigurable Optical Datacenters
S. Heck, K. Hanauer, S. Schmid

Communication & Collectives

SHIRO: Near-Optimal Communication Strategies for Distributed Sparse Matrix Multiplication
C. Zhuang, L. Zhang, B. Brock, D. Wu, P. Chen, T. Endo, S. Matsuoka, M. Wahib

PACER: A Userspace Network Rate Controller in MPI with Adaptive Compression for Parallel Applications
Y. Li, D. Ng, A. Kashyap, S. Di, G. Li, X. Lu

Skew-aware Adaptive All-to-allv Algorithms for Dynamic Deep Learning Workloads
C. Wei, A. Bhatele

StencilMD: Optimizing Communication in Molecular Dynamics Simulations
R. Deng, T. Schardl

13:40- 15:00

LLM Serving

Taming Dynamic Diffusion LLM Inference through Virtual Static Execution
J. Zhu, H. Wu, Y. Li, H. Wang, R. Li, J. Zhai

SparseServe: Unlocking Parallelism for Dynamic Sparse Attention in Long-Context LLM Serving
Q. Zhou, P. Yin, P. Zuo, C. Wang, J. Cheng

InferFast: Bridging the Gap Between Unstructured LLM Sparsity and Practical GPU Throughput
Z. Shen, W. Bu, X. He, K. Sheng, H. Chen

dLLM-Serve: Bridging the Memory Gap in Diffusion Language Model Serving
J. Fan, Y. Zhang, X. Li, D. Nikolopoulos

Graph Analytics

DCSM: Enabling Inter-Batch Parallelism for Continuous Subgraph Matching on GPU
Y. Wei, P. Jiang

SVSIG: Incremental Streaming Graph Processing with Source Vertex Suppression
J. Huang, X. Yan, D. Fu, H. Bian, T. Cao, Z. Li

GAAF: Fast and Scalable Graph-based Vector Similarity Search with Any-Match Label Filtering
M. Ma, X. Yin, J. Qiu

HoloGraph: Bridging the Throughput Gap in Heterogeneous Graph Pattern Matching via Workload-Aware Steering
M. Haotian, W. Hsu, Y. Chung

Near-Memory Architectures

DANMP: Accelerating Multi-Scale Deformable Attention Using Near-Memory-Processing Architecture
H. Li, Q. Wang, B. Gao, D. Chen, Y. Huang, X. Xin

RPFC: A Router Partitioning and Forward Channel Routing Framework for 2.5D MCM System
S. Tao, Z. Guo, T. Liu, J. Wang

UpDown: Efficient Manycore based on Many Threading and Scalable Memory Parallelism
A. Rajasukumar, R. Xu, T. Zhang, Y. Wang, T. Su, M. Nourian, J. Ding, J. Su, R. Khandelwal, A. Fell, D. Gleich, Y. Li, H. Hoffmann, A. Chien

Clutch: High Performance Vector-Scalar Comparison using DRAM via Chunked Temporal Coding
D. Tokuda, T. Kubo, I. Yuksel, A. Olgun, H. Luo, T. Nagatani, G. De Oliveira Junior, A. Yağlıkçı, M. Sadrosadati, O. Mutlu, S. Takamaeda-Yamazaki

15:30- 17:10

Data Analytics & Compression

GPZ: GPU-Accelerated Lossy Compressor for Particle Data
R. Li, Y. Huang, L. Zhang, Z. Yang, S. Di, B. Zhang, J. Huang, J. Liu, J. Tian, G. Li, F. Song, H. Guo, F. Cappello, K. Zhao

GFAz: State-of-the-Art Graphical Fragment Assembly Compression
T. Yang, Y. Liu, B. Jiang, X. Shi, S. Jin

Optimizing Streaming Tensor Decomposition on GPU
W. Lin, J. Sheng, S. Feng, M. Dun, H. Cao, Q. Sun

DA-MLAD: Drift-Decomposed Meta-Learning for Continual Log Anomaly Detection in Supercomputing Systems
K. Tan, Y. Du, D. Zhan, Y. Xie, H. Yu, B. Zhao, H. Liu

TADS: Trend-Aware Dynamic Load Balancing for Large-Scale SNN Simulations with Delay-Sharded Graph Infrastructure
H. Huang, S. Pang, Y. Zeng, G. Feng, Z. Chen, Y. Lu

Graph Traversal & Connectivity

cuMIS: A Unified Scalable Framework for Computing Maximal Independent Sets on Trillion-Edge Graphs
J. Nke, S. Kang, B. Rees, C. Lee

BLEST: Blazingly Efficient BFS using Tensor Cores
D. Elbek, K. Kaya

CORE-BFS: Communication-Optimized REctangular-partitioned BFS Achieving 160.845 TeraTEPS on Frontier Supercomputer
H. Yang, H. Lu, M. Matheson, F. Wang, H. Liu

DynLP: Parallel Dynamic Batch Update for Label Propagation in Graph-based Semi-Supervised Learning
S. Shovan, A. Khanda, S. Ferdous, S. Das, M. Halappanavar

Parametric Mappings for Distributed-Memory Tensor Computations
B. Wu, M. Kong

CXL & Memory Systems

CXL-CCL: Inter-Node Collective GPU-Communication Using a CXL Shared Memory Pool
D. Xu, H. Meng, X. Chen, D. Zhu, W. Tang, F. Liu, L. Xie, W. Xiang, R. Shi, Y. Li, H. Hu, H. Zhang, D. Li, J. Jiang

IBEX: Internal Bandwidth‑Efficient Compression Architecture for Scalable CXL Memory Expansion
Y. Ko, H. Park, H. Lee, H. Lee

Clone: A Collaborative Multi-device System for Retrieval-Augmented Generation over CXL
S. Ko, W. Doh, E. Na, H. Shim, S. Yun, J. So, Y. Kwon, S. Park, S. Roh, M. Yoon, T. Song, E. Lee, J. Ahn

Anchoring Whole-System Persistence and Resilience in CXL
Y. Zhou, J. Zeng, C. Jung

Griffin: Coherency-Aware Task Scheduling and Memory Allocation for CXL Interconnects
S. Lee, K. Diab, D. Tootaghaj, L. Cao, P. Sharma, A. Gavrilovska

Thursday 9 July 2026

Time

Room A

Room B

Room C

09:20- 10:20

Cross-Layer Performance Optimization

THAC: Unlocking Performance in Parallel HPC Applications via UQ-Aware Automated Approximation
Z. Zhao, B. Wang, B. Yang, X. Chen, J. Liu, Q. Wang

Cross-Architecture Autotuning for Single-Source Heterogeneous Programming Models
H. Abram, N. Papadopoulou, J. Domke, M. Pericàs

Look Before You Leap : Precision Instruction Supply via SmartScout
X. Zhang, P. Qu, T. Zhang, F. Su, Z. Pan, Y. Zhang

(No sessions)

10:50- 12:10

AI Inference

LayerScope: Predictive Cross-Layer Scheduling for Efficient Multi-Batch MoE Inference on Legacy Servers
E. Yu, D. Dong, Z. Zhang, Z. Bai, W. Yang, H. Wang, D. Li, Y. Wu, L. Xiangke

Aurora: A Disaggregated GPU-PNM-PIM System for High-Throughput Mixed-Length LLM Inference
H. Kim, S. Yu, M. Kim, J. Lee, H. Sung, E. Lee

HOPO: Accelerating Multimodal Neural Networks Inference via Holistic Parallelism Optimization
Y. Zheng, J. Sun, H. Li, G. Sun, J. Li

Latency-SLO-Aware Memory Offloading for Large Language Model Inference
C. Ma, H. Zhao, Z. Ye, Z. Yang, T. Fu, J. Han, J. Zhang, Y. Luo, X. Wang, Z. Wang, Y. Li, D. Zhou

GPU-Accelerated Query and Geometry

Parallel Query Processing through Optimal Key Grouping on GPU-Based B+-Trees
Z. Chen, J. Li, J. Meng, N. Pitaksirianan, Y. Tu, B. Zeng, C. Dong

X-HD: Fast Hausdorff Distance Computation with Ray Tracing
L. Geng, Z. Yuan, R. Lee, X. Zhang, F. Wang

Rethinking Collision Detection on GPU Ray Tracing Architecture
D. Mandarapu, I. Fuksman, A. Pelenitsyn, G. Bernstein, M. Kulkarni

Resilience and Error Detection

Not All Errors Are Equal: A Systematic Study of Error Propagation in Large Language Model Inference
Y. Huang, S. Di, G. Li

SpinTune: Improving the Reliability of Quantum Sensor Networks for Practical Quantum-Classical Utility
J. Ludmir, N. DiBrita, J. Han, P. Tirthak

Harnessing MPI mutations for AI error detection
A. Auville, T. Jammer, E. Petit, P. Castro, E. Saillard, M. Popov

StreamGuard: Low-Overhead Resilience for Real-time HPC Data Streams
H. Nguyen, B. Nicolae, T. Bicer, A. Gueroudji, M. Dorier, K. Chard, I. Foster

13:40- 15:00

AI Kernels & Parallelism

HPMD: Enabling Hybrid Parallelism with Multi-Dimensional Adaptive DNN Training
G. Yun, Y. Choi

EPLoN: Exploiting Efficient Parallelism with Selective Rematerialization for Lightning Attention on Ascend NPU
H. Bao, Z. Su, A. Setyaev, S. Kamenev, A. Gneushev, K. Zhao, J. Xiao, H. Lin, A. Bistrigova, S. Buzykanov, E. Tetin, G. Tan, B. Liu, X. Zou, Z. Dong, C. Korikov, X. Yu, Z. Hu

DynSpAttn: Efficient Attention via Dual-Side Dynamic Sparsity on Sparse Tensor Cores
R. Fan, X. Yu, Z. Li, W. Luo, G. Gong, X. Chu

Three Birds, One Stone: Fast, Accurate-aware and Cost-Efficient Accelerator for Ternary LLM
W. Jung, J. Kang, S. Shin, H. Um, J. Lim, G. Koo, Y. Park, S. Park, T. Suh

Sparse & Tensor Kernels

Communication-Avoiding SpGEMM via Trident Partitioning on Hierarchical GPU Interconnect
J. Bellavita, L. Pichetti, T. Pasquali, F. Vella, G. Guidi

Ocean: Fast Estimation-Based Sparse General Matrix-Matrix Multiplication on GPU
Y. Li, G. Guidi

PolyKAN: A High-Performance and Universal GPU Operator Library for Polynomial Kolmogorov-Arnold Networks
M. Yu, H. Zhong, J. Jiang, D. Huang, Y. Lu

Energy-Aware Systems

DEFT: Joint Task Placement and DVFS for Energy-Efficient Multi-GPU Runtimes
J. Chen, M. Pericàs

Phase-aware Peak Power Reduction for Minimizing the Capital Expense of LLM Inference
S. Wu, Y. Ma, X. Wang

Exploiting Hybrid Energy Storage to Minimize the Carbon Footprint of AI Data Centers
S. Wu, X. Wang

The Performance-Power Frontier: A Model-Driven Approach to Energy-Aware Application Optimisation
S. Pasupuleti, S. Wright

15:30- 17:10

Efficient Privacy Computing

MegaZK: A Memory Efficient GPU System Accelerating End-to-end Zero-Knowledge Proof
M. Li, Y. Yu, B. Wang, X. Fan, M. Gao, S. Deng

SumcheckPIM: An Efficient HBM-Based PIM Architecture for Linear Complexity Zero Knowledge Proofs
C. Kim, T. Kang, S. Shin, T. Suh, Y. Yang, G. Koo

GPIR: Enabling Practical Private Information Retrieval with GPUs
H. Ji, H. Yu, J. Kim, W. Choi, G. Suh, J. Ahn

Scaling Long-Sequence Homomorphic Encrypted Transformer Inference via Hybrid Parallelism on Multi-GPU Systems
Z. Gong, R. Ran, F. Yao, W. Wen

CipherSkip: Efficient Sparse Matrix Multiplication with FHE
W. Xiong, H. Zhou, Y. Ye, R. Jin, L. Xu

Numerical & Scientific Kernels

WindStencil: Unleashing GPU Potential for High-Order Stencil Computation in High-Performance Inviscid CFD Simulations
X. Zhang, H. Zhang, X. Liu, J. Li, R. Jin, J. Zhang, W. Yuan, S. Liang, Z. Lu

Cheetah: Optimizing Execution Pipelines for Matrix-Free Finite Element Operators on GPUs
J. Ren, H. Ltaief, S. Zampini, D. Keyes

AdaPolySI: Adaptive Polynomial Filtered Subspace Iteration for Hermitian Interior Eigenvalue Problems
Y. Ni, X. Xu, S. Li, J. Zhang, J. Chen, J. Wang, J. Roman

Non-Delayed Cholesky Factorization
Y. Luo, S. Zhang, W. Liu

Parallel Quadratic Selected Inversion in Quantum Transport Simulation
V. Maillou, M. Bollhofer, O. Schenk, A. Ziogas, M. Luisier

Quantum Computing

quEStab: Towards Scalable Quantum Circuit Simulation on Multi-GPU using an Extended Stabilizer Formalism
H. Shin, S. Lee, Y. Kim

EZCache: A Hierarchical Memory System for Zoned Neutral Atom Quantum Computers
J. Zhong, Y. Deng, H. Jiang, J. Feng

TuniQ: Autotuning Compilation Passes for Quantum Workloads at Scale for Effectiveness and Efficiency
M. Hasanat, J. Ludmir, T. Patel, R. Roy

C-3PQ: A Closeness Centrality-based Circuit Partitioner for Quantum Simulations
D. Popovici, H. Lee, N. Yoshioka, M. Ben, N. Ito, K. Klymko, D. Camps, A. Butko

Diagonal-Budgeted Trotterization for Efficient Quantum Hamiltonian Simulation
S. Chundury, B. Burgstahler, J. Li, I. Suh, F. Mueller