October 5, 2025

Caladan: Mitigating Interference at Microsecond Timescales

Dedicated scheduler core + fast core allocation reacts to interference in microseconds — no slow hardware partitioning

Venue: OSDI 2020
Link: USENIX OSDI ‘20

Topic: Hardware resource partitioning reacts too slowly (tens of seconds) to interference between latency-critical and best-effort tasks. Caladan uses a dedicated scheduler core that monitors fine-grained control signals and adjusts core allocations in microseconds.

Summary

Data centers co-locate multiple tasks on one machine for utilization, but interference between tasks spikes tail latency. Hardware partitioning (static or dynamic) is too slow — static wastes utilization, dynamic takes seconds to converge. Caladan replaces partitioning with core allocation driven by real-time control signals, reacting to interference in microseconds.

Background

The interference problem

End-to-end response time is determined by the slowest individual request → tail latency matters.
Co-located tasks compete for: CPU cores, memory bandwidth, LLC, execution units.
Interference → latency spikes.

Types of CPU interference

Hyperthreading interference (within one physical core): execution unit contention.
Memory bandwidth interference: affects all cores on the same physical CPU.
LLC interference: affects all cores on the same physical CPU.

Limitation of partitioning

Static partitioning: allocate enough for peak load → poor CPU utilization.
Dynamic partitioning: takes dozens of seconds to converge → misses latency spikes.

Key Idea

Core allocation as the isolation mechanism

Instead of partitioning resources, use control signals to drive fast core allocation.
Core allocation avoids the reaction-time constraints of hardware partitioning.

Two challenges

Sensitivity: many interference types → need signals that accurately detect each type.
Scalability: must gather signals and adjust allocations faster than interference builds up.

Design

Dedicated scheduler core

A centralized core dedicated to collecting control signals and making allocation decisions.
Distinguishes Latency-Critical (LC) tasks from Best-Effort (BE) tasks.

Scheduler workflow

Collect fine-grained measurements: memory bandwidth usage, request processing times.
Detect interference: memory bandwidth saturation, hyperthreading contention.
Adjust core allocations: grant more cores to LC tasks experiencing queueing delays.
Restrict BE tasks from CPU cores until interference is resolved.

Controllers

Top-level core allocator: grants more cores to tasks experiencing queueing delays.
Memory Bandwidth Controller: uses available memory bandwidth while avoiding saturation.
Hyperthread Controller: detects hyperthreading interference → bans use of sibling hyperthread until the current request finishes.

KSCHED: Fast and Scalable Scheduling

Linux kernel module for performing scheduling across many cores simultaneously in microseconds.
Shifts scheduling work from the scheduler core → to tasks’ cores.
Leverages hardware multicast IPIs to amortize interrupt sending cost.
Provides a fully asynchronous, non-blocking scheduler interface.

KSCHED workflow

Amortize interrupt cost via multicast IPI.
Offload scheduling work to task cores.
Non-blocking API: scheduler core can handle many inflight operations concurrently.

Insights

Core allocation is more effective than resource partitioning for achieving performance isolation under dynamic loads.
Defining the right control signals is the key to detecting and distinguishing different interference types quickly.

Meeting Notes

(to be filled)