LIBRARY | SUJIN KANG

June 3, 2026

SAND: A New Programming Abstraction for Video-based Deep Learning

solve heavy preprocessing problem with abstraction & planning + priority scheduling to reuse

#paper-review #SOSP #2025

June 3, 2026

Tiga: Accelerating Geo-Distributed Transactions with Synchronized Clocks

#paper-review #SOSP #2025

June 3, 2026

Sleeping with One Eye Open:Fast, Sustainable Storage with Sandman

#paper-review #SOSP #2025

May 27, 2026

cache_ext: Customizing the Page Cache with eBPF

OS page cache evicition policy: only 1 -> supports multiple policy, FLEXIBILITY

#paper-review #SOSP #2025 #flexlibility #eBPF

May 22, 2026

A large scale analysis of hundreds of in-memory cache clusters at Twitter

5 important facts about in-memory caching

#paper-review #OSDI #2020 #survey

May 20, 2026

Jenga: Effective Memory Management for Serving LLM with Heterogeneity

a memory allocation framework for managing heterogeneous embeddings by leveraging layer properties

#paper-review #SOSP #2025 #add-granularity

May 11, 2026

PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications

We only need to store 1 layer. Let's not do unnecessary things and UTILIZE this condition.

#paper-review #SOSP #2025 #remove-unnecessary

May 6, 2026

IC-Cache: Efficient Large Language Model Serving via In-context Caching

In-Context Caching system/ leverage caching, let small LLM serves like a giant.

#paper-review #SOSP #2025 #caching #off-loading

April 29, 2026

COpter: Efficient Large-Scale Resource-Allocation via Continual Optimization

remove overhead of round-based resource allocation -> sequence of interconnected problems

#paper-review #SOSP #2025

April 22, 2026

Mitigating Application Resource Overload with Targeted Task Cancellation

Eliminate Head-Of-Line Blocking request to solve resource overload problem.

#paper-review #SOSP #2025 #Head-of-Line-Blocking

February 6, 2026

Kinetic Modeling of Data Eviction in Cache

AET — composable, linear-time Miss Ratio Curve profiling via average eviction time sampling

#paper-review #ATC #2016

February 5, 2026

Sailor: Automating Distributed Training over Dynamic, Heterogeneous, and Geo-distributed Clusters

Co-optimize resource allocation and parallelization plan for heterogeneous GPU training with accurate simulation

#paper-review #SOSP #2025

January 28, 2026

Robust LLM Training Infrastructure at ByteDance

Automated failure diagnosis and recovery for large-scale LLM training — minimize unproductive time across 16K+ GPUs

#paper-review #SOSP #2025

January 15, 2026

Spirit: Fair Allocation of Interdependent Resources in Remote Memory Systems

Fair allocation when resources are interdependent — remote memory bandwidth, capacity, and compute interact

#paper-review #SOSP #2025

January 7, 2026

Oasis: Pooling PCIe Devices Over CXL to Boost Utilization

Use CXL shared memory as a message channel to pool PCIe devices (NICs, SSDs) across hosts in a CXL pod

#paper-review #SOSP #2025

October 23, 2025

Disentangling the Dual Role of NIC Receive Rings

Split Rx ring into allocation (Ax) and reception (Bx) rings → reduce I/O working set, improve throughput up to 37%

#paper-review #OSDI #2025

October 10, 2025

Criticality-Aware Instruction-Centric Bandwidth Partitioning for Data Center Applications

Static partitioning can't react quickly to load changes → use fine-grained control signals and core allocation instead

#paper-review #HPCA #2025

October 6, 2025

Extending Applications Safely and Efficiently

EIM model abstracts extension resources for fine-grained safety/interconnectedness tradeoffs; bpftime enforces it efficiently

#paper-review #OSDI #2025

October 5, 2025

Tile Size Selection Using Cache Organization and Data Layout

Select tile sizes that fit data working sets in cache, accounting for layout and associativity

#paper-review #PLDI

October 5, 2025

SOCK: Rapid Task Provisioning with Serverless-Optimized Containers

Lean containers + generalized Zygote provisioning + three-tier package-aware caching → 45× cold-start speedup

#paper-review #ATC #2018

October 5, 2025

Shared Address Translation Revisited

Share page tables across processes for shared libraries → reduce TLB overhead and page faults on Android

#paper-review #EUROSYS #2016

October 5, 2025

SEUSS: Skip Redundant Paths to Make Serverless Fast

Deploy serverless functions from unikernel snapshots to skip boot, runtime init, and import overhead — 51× throughput improvement

#paper-review #EUROSYS #2020

October 5, 2025

Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider

Characterize FaaS workloads → propose per-function keep-alive and pre-warm policies to cut cold starts

#paper-review #ATC

October 5, 2025

Replayable Execution Optimized for Page Sharing for a Managed Runtime Environment

Checkpoint with GC-compacted heap → share pages across containers → speedy restoration with fewer page faults

#paper-review #EUROSYS

October 5, 2025

Benchmarking, Analysis, and Optimization of Serverless Function Snapshots

Functions access stable working sets across invocations → prefetch from disk to cut cold-start latency by 3.7×

#paper-review #ASPLOS #2022

October 5, 2025

Prebaking Functions to Warm the Serverless Cold Start

The timing of the snapshot determines cold-start latency — prebake at the right execution point

#paper-review #Middleware

October 5, 2025

Reducing Minor Page Fault Overheads through Enhanced Page Walker

Hardware-software co-design offloads minor page fault critical path → 33× latency improvement, 6.6% runtime improvement

#paper-review #Journal

October 5, 2025

MEGA: Overcoming Traditional Problems with OS Huge Page Management

Analyze and address the fundamental problems with Linux huge page management — fragmentation, bloat, fault latency, non-swappability

#paper-review #SYSTOR #2019

October 5, 2025

Coordinated and Efficient Huge Page Management with Ingens

Treat memory contiguity as a first-class resource; track utilization and access frequency for principled huge page management

#paper-review #OSDI #2016

October 5, 2025

Memory Efficient Fork-based Checkpointing Mechanism for In-Memory Database Systems

Fork-based checkpointing with efficient copy-on-write management to minimize memory overhead for in-memory databases

#paper-review #SAC

October 5, 2025

FlashCube: Fast Provisioning of Serverless Functions with Streamlined Container Runtimes

Streamline container runtimes to eliminate unnecessary initialization overhead for fast serverless provisioning

#paper-review #PLOS

October 5, 2025

Parallelizing Packet Processing in Container Overlay Networks

Serialized softirqs on a single core bottleneck overlay networks → pipeline them across multiple cores with Falcon

#paper-review #EUROSYS

October 5, 2025

FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute

Decentralize container provisioning across host VMs in function-tree structures to eliminate cold-start bottleneck

#paper-review #ATC #2021

October 5, 2025

Architectural Implications of Function-as-a-Service Computing

FaaS containerization brings up to 20× slowdown vs. native; cold start can exceed 10× a function's execution time

#paper-review #MICRO #2019

October 5, 2025

Cold Start Influencing Factors in Function as a Service

Programming language, package size, and memory/CPU settings significantly affect FaaS cold-start latency

#paper-review #UCC

October 5, 2025

Caladan: Mitigating Interference at Microsecond Timescales

Dedicated scheduler core + fast core allocation reacts to interference in microseconds — no slow hardware partitioning

#paper-review #OSDI #2020

/LIBRARY

SAND: A New Programming Abstraction for Video-based Deep Learning

Tiga: Accelerating Geo-Distributed Transactions with Synchronized Clocks

Sleeping with One Eye Open:Fast, Sustainable Storage with Sandman

cache_ext: Customizing the Page Cache with eBPF

A large scale analysis of hundreds of in-memory cache clusters at Twitter

Jenga: Effective Memory Management for Serving LLM with Heterogeneity

PrefillOnly: An Inference Engine for Prefill-only Workloads in Large Language Model Applications

IC-Cache: Efficient Large Language Model Serving via In-context Caching

COpter: Efficient Large-Scale Resource-Allocation via Continual Optimization

Mitigating Application Resource Overload with Targeted Task Cancellation

Kinetic Modeling of Data Eviction in Cache

Sailor: Automating Distributed Training over Dynamic, Heterogeneous, and Geo-distributed Clusters

Robust LLM Training Infrastructure at ByteDance

Spirit: Fair Allocation of Interdependent Resources in Remote Memory Systems

Oasis: Pooling PCIe Devices Over CXL to Boost Utilization

Disentangling the Dual Role of NIC Receive Rings

Criticality-Aware Instruction-Centric Bandwidth Partitioning for Data Center Applications

Extending Applications Safely and Efficiently

Tile Size Selection Using Cache Organization and Data Layout

SOCK: Rapid Task Provisioning with Serverless-Optimized Containers

Shared Address Translation Revisited

SEUSS: Skip Redundant Paths to Make Serverless Fast

Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider

Replayable Execution Optimized for Page Sharing for a Managed Runtime Environment

Benchmarking, Analysis, and Optimization of Serverless Function Snapshots

Prebaking Functions to Warm the Serverless Cold Start

Reducing Minor Page Fault Overheads through Enhanced Page Walker

MEGA: Overcoming Traditional Problems with OS Huge Page Management

Coordinated and Efficient Huge Page Management with Ingens

Memory Efficient Fork-based Checkpointing Mechanism for In-Memory Database Systems

FlashCube: Fast Provisioning of Serverless Functions with Streamlined Container Runtimes

Parallelizing Packet Processing in Container Overlay Networks

FaaSNet: Scalable and Fast Provisioning of Custom Serverless Container Runtimes at Alibaba Cloud Function Compute

Architectural Implications of Function-as-a-Service Computing

Cold Start Influencing Factors in Function as a Service

Caladan: Mitigating Interference at Microsecond Timescales