May 27, 2026

cache_ext: Customizing the Page Cache with eBPF

OS page cache evicition policy: only 1 -> supports multiple policy, FLEXIBILITY

Venue: SOSP 2025 doi

Topic: cache eviction, flexibility, add layer, eBPF

Summary

OS page cache evicition policy: one-size-fits-all -> supports multiple policy, FLEXIBILITY provide interface so that programmers don’t need to modify kernel to use different policy

Contribution

achieve compatibility with current linux environment + multiple page eviction strategies (even app characteristics aware) by leveraging eBPF’s kfuncs & struct_ops

Background

OS page cache reduces excessive accesses to storage.
OS page cache eviction policy is: one-size-fits-all (approximate LRU) -> performs poor
Applying different eviction policies is difficult in the page cache: modifying kernel code is complex.

Goal: allows developers to customize the page cache without modifying the kernel

flexible eBPF-based framework for the Linux page cache

Challenges

Scalability

custom page cache policies must run with low overhead. modern storage devices support millions of IOPS with low latency

Flexibility

caching algorithms are very diverse & use complex data structures

different applications’ policies do not interfere with each other

Security

dealing with invalid references which could lead to kernel crashes or security breaches

Design

High Level

allows users to run custom eviction policy functions (implemented as eBPF functions in the kernel)
The policy functions
1. are triggered by particular events (e.g., folio eviction, access, admission)
2. operate on a user-specified number of variable-sized eviction lists (store pointers to folios managed by the policy)
The policy functions determine which folios to admit or evict to and from the lists based on metadata (e.g., access frequency, recency, which thread accessed the folio), which is stored in eBPF maps.
Upon eviction, cache_ext runs a user-defined eviction function to propose a set of eviction candidates for the kernel to evict.

Solve

Scalability: custom page cache policies must run with low overhead

eBPF-based policies run in the kernel: avoids expensive & frequent synchronization between the kernel and userspace.

Flexibility: should supports caching algorithms: very diverse & use complex data structures

allows applications to define

one or more variable-sized lists of pages
a set of policy functions (e.g., admission, eviction) that operate on these lists, which can be used to express a wide range of eviction policies.

identify cgroups as a natural isolation boundary. each cgroup implement its own eviction policy without interfering with other cgroups

Security: dealing with invalid references which could lead to kernel crashes or security breaches

maintains a registry of valid page references, which is used to validate the page references returned by the user-defined policies.

Implementation

takes advantage of eBPF: a Linux (and Windows) supported runtime that allows safely running application code inside the kernel.
by implementing policies in-kernel, cache_ext can delegate all policy decisions to custom policies and enforce them directly.
consider multi-tenant isolation & application-informed policies.
used struct_ops and new kfunc APIs in accordance with modern eBPF practices.

Policy Functions

allows applications to define custom eviction policies as policy functions

policy functions: a set of eBPF programs that trace caching events and determine which folios to evict from the page cache.
triggered by five events: policy initialization, request for eviction, folio admission, folio access, folio removal.
implemented using eBPF’s struct_ops kernel interface

Application-Informed Eviction

applications to use eviction algorithms tailored to their design.: tag the pages, and differentiate priority

Application-Informed Admission Filter

act like a bouncer. if something doesn’t need to be cached, (large, streaming) then don’t save that to cache

Evaluation

it is beneficial for applications to customize the page cache to match their workloads’ unique properties,

achieve up to 70% higher throughput
58% lower tail latency.

eBPF policy customization in Linux

P2Cache

can’t support eviction policies that don’t rely on LRU queue or require multiple queues.

PageFlex

not eviction: swapping & prefetching

FetchBPF

not eviction: customize prefetching policies

sched_ext: eBPF framework for custom Linux scheduling policies

more performant for process scheduling than page eviction.

number of processes on a machine < pages in memory
process scheduling occurs less frequently than page cache events (e.g., page access).

Page cache customization: Extensible kernels

allow applications to customize kernel interfaces and policies.

ex) VINO, SPIN These OS designs never achieved widespread use.

Page cache customization: Extensible file systems

lack of compatibility: wouldn’t work with existing Linux or legacy file systems.

ex) ACFS, XN, libOS, Bento

Userspace caches

: simply implement their own userspace cache, and bypass the OS page cache with direct I/O

ex) TriCache many popular data systems still rely on the page cache

Linux: not very meaningful

add customization options to its LRU policy: add new MGLRU policy to replace the old one MGLRU sometimes underperforms the default LRU algorithm. -> no golden rule. we need more granularity following the workloads.

Inputs

no one-size-fits-all policy that improves all workloads

customization is necessary to maximize performance.

cgroups

a Linux mechanism that isolate resource usage for groups of processes

2 ways to access page cache

Pages in the page cache can be accessed through memory mappings (in which case the page table entry access bit will be set) or through file-based interfaces (e.g., read()).

folio

page container that can hold multiple pages

LRU is notoriously bad for scan-like access patterns.

madvise & fadvise

don’t work well sometimes: their actual behavior is highly dependent on the kernel implementation, can yield unexpected results.

sandbox: a restricted environment where code can run safely without affecting the rest of the system

eBPF

allows userspace functions to run in a sandbox within the Linux kernel in a safe and controlled manner.

how: eBPF runs “userspace-written logic in the kernel” by compiling it into restricted bytecode, verifying it for safety, and executing it in a tightly controlled sandbox with limited memory access and predefined hook points.
usage

observability, security, scheduling, I/O acceleration, customizing page cache behavior, customizing memory management policies such as prefetching

eBPF programs characteristics

eBPF programs are verified by the kernel before they can be run, ensuring, for example, that the programs do not contain illegal memory accesses, and that they will terminate within a fixed number of instructions.

struct_ops: exposes an interface of function-pointer callbacks to userspace.

struct_ops makes it much easier to introduce new user-defined policies by minimizing complex verifier changes.

has been used for: TCP congestion control algorithms, FUSE BPF filesystems, correcting HID device behavior, packet scheduling, and by sched_ext, an eBPF scheduling framework.

kfuncs: eBPF programs interact with the kernel via kfuncs

kfuncs: do not necessarily have a stable interface
helper: eBPF programs also interact with the kernel through helper functions, which are considered a stable interface
kfuncs are simpler to implement and are more flexible.

Userspace-offload overhead

can be expensive. not true : userspace == fast

many of selection policies can be implemeted using linked lists

either exactly or approximately using linked lists, where the policy iterates over one or more lists and evicts items based on a calculated per-item score.

cgroup isolation

common pattern of deploying modern applications via containers: isolate each application in its own memory cgroup.

Eviction policies

S3-FIFO, Least Hit Density, Multi-Generational LRU, LFU, MRU, FIFO

Questions

customization can be dangerous when it happens with arbitary situation

maybe we can use it with memory allocation (something requires multiple policies)

it’s decades ago maybe there would be some changes that we can examine.

Thoughts.

there is no “one-size-fits-all” policy

there is no one-size-fits-all policy that improves all workloads – customization is necessary to maximize performance. -> maybe this one also helps our ideas too. only controlling core and let cache and membw freely is not okay.

Better finishing OS textbook… my file system knowledge is not enough

the kernel provides madvise() and fadvise() system calls. These interfaces allow userspace applications to give hints to the kernel about how to handle certain ranges of memory or files.

But I think I read that in most case LRU works in real world situation. HHmmmmmmmm

with chatgpt, now there is literally no excuse that saying “I couldn’t understand the paper”.. hahaha

Finding a blank space in academia & doing good implemention

so it’s not like new idea. everyone knows linux’s single LRU eviction strategy, and there were tries with eBPF. But they failed handling heterogeniousity or not handling page cache eviction.

so they implemented the layer to link eBPF and customed eviction policy management, kinda from scratch using struct_ops and kfuncs.

eBPF is relatively new tech. And i guess it would be quite recent that LRU really doesn’t severely work well. (actually i still a bit doubt but anyway…) Guess they succeed to catch the right timing and made a good implementation.