Parallelizing Packet Processing in Container Overlay Networks
Serialized softirqs on a single core bottleneck overlay networks → pipeline them across multiple cores with Falcon
Venue: EuroSys
Topic: Container overlay networks cause significant throughput and latency degradation compared to physical networks. The main bottleneck is serialization of softirqs on a single core. Falcon pipelines softirqs across multiple cores to remove this bottleneck.
Summary
Overlay networks are widely adopted in production container environments but cause significant performance degradation (throughput and latency) compared to physical networks. Root cause: a large number of software interrupts (softirqs) associated with different network devices of a single flow are serialized on a single core. Falcon prevents this serialization via softirq pipelining, splitting, and dynamic balancing — enabling fine-grained, low-cost flow parallelization on multicore machines.
Background
Overlay networks and performance
- Container overlay networks (e.g., VXLAN-based) encapsulate packets and process them through additional virtual network devices.
- Each virtual device generates softirqs that must be processed.
- In single-flow scenarios, all these softirqs get serialized on one core → CPU bottleneck.
The bottleneck
- Large number of softirqs per flow, all processed on a single core.
- Core becomes a serialization point → limits throughput and increases latency.
Key Idea
Falcon: softirq pipelining
Distribute softirq processing for a single flow across multiple cores to prevent any one core from being overwhelmed.
Three core designs:
- Softirq pipelining: chain softirq processing stages across cores.
- Softirq splitting: divide softirq work within a stage across cores.
- Dynamic balancing: adapt the distribution based on load.
Design
- Falcon operates at the granularity of individual flows.
- Fine-grained parallelization: low overhead, no need to modify applications.
- Multicore machines: softirqs associated with different network devices of the same flow are dispatched to different cores.
Evaluation
- Significant throughput and latency improvements over baseline in-kernel networking for container overlay networks.
- Approaches physical network performance by eliminating the serialization bottleneck.
Meeting Notes
(to be filled)