Research

Research Overview
My research focuses on memory and storage systems for datacenter-scale workloads. I build simulation and profiling tools to understand performance bottlenecks across the GPU memory hierarchy, interconnects, and emerging technologies like CXL — with the goal of guiding efficient system design for AI and scientific computing.
Current Research
Memory Hierarchy Optimization for LLM Inference
LLM inference is increasingly memory-bound: KV cache growth from long contexts, multi-turn interactions, and multi-agent workloads competes for scarce HBM, while communication overheads (MoE routing, all-reduce) are tightly coupled with parallelism policies.
I am building a high-fidelity discrete-event simulator that models the full GPU serving stack — memory hierarchies (HBM/DRAM/Disk), tensor parallelism, interconnects (PCIe/NVLink), and vLLM-style scheduling (Continuous Batching, PagedAttention) — to enable system-level what-if analysis without exhaustive real-hardware experiments. The simulator is validated against real H100 measurements with 1.5% micro-kernel error and nsys-comparable component metrics.
Scientific Computing Workload Characterization
I study system-level bottlenecks in GPU-accelerated scientific applications to guide infrastructure design. My first work in this area focuses on AlphaFold3, systematically profiling its compute, memory, and scaling behavior across GPU configurations.
Publication:
- “AlphaFold3 Workload Characterization: A Comprehensive Analysis of Bottlenecks and Performance Scaling” - IISWC 2025
- Jinpyo Kim, Mingi Kwon, Jishen Zhao
- AFSysBench benchmark suite
Other Research Interests
- CXL and memory disaggregation — heterogeneous memory tiering, cache coherence, and pooling for AI/HPC workloads
- Non-volatile memory systems — crash consistency, write ordering, and wear leveling for persistent memory
- Interconnect characterization — CXL, NVLink-C2C, AMD Infinity Fabric (Heimdall benchmark suite)
Tools
- AFSysBench — AlphaFold3 workload profiling benchmark
- Heimdall — Cache-coherent heterogeneous system benchmark suite
