cache_ext: Customizing the Page Cache with eBPF

OS page cache evicition policy: only 1 -> supports multiple policy, FLEXIBILITY

Featured image

Venue: SOSP 2025 doi

Topic: cache eviction, flexibility, add layer, eBPF


Summary

OS page cache evicition policy: one-size-fits-all -> supports multiple policy, FLEXIBILITY provide interface so that programmers don’t need to modify kernel to use different policy

Contribution

achieve compatibility with current linux environment + multiple page eviction strategies (even app characteristics aware) by leveraging eBPF’s kfuncs & struct_ops


Background


Goal: allows developers to customize the page cache without modifying the kernel

flexible eBPF-based framework for the Linux page cache


Challenges

Scalability

custom page cache policies must run with low overhead. modern storage devices support millions of IOPS with low latency

Flexibility

caching algorithms are very diverse & use complex data structures

Isolation & sharing

different applications’ policies do not interfere with each other

Security

dealing with invalid references which could lead to kernel crashes or security breaches


Design

High Level

Solve

Scalability: custom page cache policies must run with low overhead

eBPF-based policies run in the kernel: avoids expensive & frequent synchronization between the kernel and userspace.

Flexibility: should supports caching algorithms: very diverse & use complex data structures

allows applications to define

Isolation & Sharing: different applications’ policies do not interfere with each other

identify cgroups as a natural isolation boundary. each cgroup implement its own eviction policy without interfering with other cgroups

Security: dealing with invalid references which could lead to kernel crashes or security breaches

maintains a registry of valid page references, which is used to validate the page references returned by the user-defined policies.


Implementation

Policy Functions

allows applications to define custom eviction policies as policy functions

Application-Informed Eviction

applications to use eviction algorithms tailored to their design.
tag the pages, and differentiate priority

Application-Informed Admission Filter

act like a bouncer. if something doesn’t need to be cached, (large, streaming) then don’t save that to cache


Evaluation

it is beneficial for applications to customize the page cache to match their workloads’ unique properties,


Related Work

eBPF policy customization in Linux

P2Cache

can’t support eviction policies that don’t rely on LRU queue or require multiple queues.

PageFlex

not eviction: swapping & prefetching

FetchBPF

not eviction: customize prefetching policies

sched_ext: eBPF framework for custom Linux scheduling policies

more performant for process scheduling than page eviction.

Page cache customization: Extensible kernels

allow applications to customize kernel interfaces and policies.

Page cache customization: Extensible file systems

lack of compatibility: wouldn’t work with existing Linux or legacy file systems.

Userspace caches

: simply implement their own userspace cache, and bypass the OS page cache with direct I/O

Linux: not very meaningful

add customization options to its LRU policy: add new MGLRU policy to replace the old one MGLRU sometimes underperforms the default LRU algorithm. -> no golden rule. we need more granularity following the workloads.


Inputs

no one-size-fits-all policy that improves all workloads

customization is necessary to maximize performance.

cgroups

a Linux mechanism that isolate resource usage for groups of processes

2 ways to access page cache

Pages in the page cache can be accessed through memory mappings (in which case the page table entry access bit will be set) or through file-based interfaces (e.g., read()).

folio

page container that can hold multiple pages

LRU is notoriously bad for scan-like access patterns.

madvise & fadvise

don’t work well sometimes: their actual behavior is highly dependent on the kernel implementation, can yield unexpected results.

sandbox: a restricted environment where code can run safely without affecting the rest of the system

eBPF

allows userspace functions to run in a sandbox within the Linux kernel in a safe and controlled manner.

struct_ops: exposes an interface of function-pointer callbacks to userspace.

struct_ops makes it much easier to introduce new user-defined policies by minimizing complex verifier changes.

kfuncs: eBPF programs interact with the kernel via kfuncs

Userspace-offload overhead

can be expensive. not true : userspace == fast

many of selection policies can be implemeted using linked lists

either exactly or approximately using linked lists, where the policy iterates over one or more lists and evicts items based on a calculated per-item score.

cgroup isolation

common pattern of deploying modern applications via containers: isolate each application in its own memory cgroup.

Eviction policies

S3-FIFO, Least Hit Density, Multi-Generational LRU, LFU, MRU, FIFO


Questions

customization can be dangerous when it happens with arbitary situation

maybe we can use it with memory allocation (something requires multiple policies)

it’s decades ago maybe there would be some changes that we can examine.


Thoughts.

there is no “one-size-fits-all” policy

there is no one-size-fits-all policy that improves all workloads – customization is necessary to maximize performance. -> maybe this one also helps our ideas too. only controlling core and let cache and membw freely is not okay.

Better finishing OS textbook… my file system knowledge is not enough

the kernel provides madvise() and fadvise() system calls. These interfaces allow userspace applications to give hints to the kernel about how to handle certain ranges of memory or files.

But I think I read that in most case LRU works in real world situation. HHmmmmmmmm

with chatgpt, now there is literally no excuse that saying “I couldn’t understand the paper”.. hahaha

Finding a blank space in academia & doing good implemention

so it’s not like new idea. everyone knows linux’s single LRU eviction strategy, and there were tries with eBPF. But they failed handling heterogeniousity or not handling page cache eviction.

so they implemented the layer to link eBPF and customed eviction policy management, kinda from scratch using struct_ops and kfuncs.

eBPF is relatively new tech. And i guess it would be quite recent that LRU really doesn’t severely work well. (actually i still a bit doubt but anyway…) Guess they succeed to catch the right timing and made a good implementation.