October 5, 2025

Serverless in the Wild: Characterizing and Optimizing the Serverless Workload at a Large Cloud Provider

Characterize FaaS workloads → propose per-function keep-alive and pre-warm policies to cut cold starts

Venue: ATC

Topic: Characterize real-world FaaS workloads and propose a practical resource management policy that reduces cold starts while using no more resources than current fixed keep-alive policies.

Summary

AWS Lambda and Azure Functions use a fixed keep-alive policy (10 and 20 minutes respectively) that ignores individual function invocation patterns, leading to cold starts and resource waste. After characterizing production FaaS workloads, a per-function keep-alive policy is proposed that adapts to actual invocation frequency and enables pre-warming just before predicted invocations.

Background

Cold starts in FaaS

Cloud functions run inside containers. A cold start occurs when no idle container is available for a function invocation.
FaaS providers avoid immediate container termination to enable warm starts for the next invocation.
Tension: avoidance of idling vs. on-demand scaling → more cold starts are unavoidable.

Current keep-alive policies

AWS Lambda: keeps containers alive for 10 minutes after last execution.
Azure Functions: keeps containers alive for 20 minutes.
Both are simple, fixed policies that disregard actual invocation frequency and patterns.
- Poor behavior for infrequently invoked functions (wastes resources) and for bursty functions (misses pre-warm opportunities).

Key Idea

Observation from production characterization

Functions show diverse invocation patterns — no single keep-alive duration fits all.
Some workloads have clear, predictable invocation patterns.

Proposed policy

Per-function keep-alive: set a different keep-alive duration per function based on its actual invocation frequency and pattern.
Pre-warming: for workloads with clear invocation patterns, the histogram predicts when to pre-warm a container just before the next invocation — turning a cold start into a warm start.

Implementation

Maintain a small histogram of recent inter-invocation times per function.
From the histogram: determine optimal keep-alive duration and pre-warm timing.

Evaluation

Policy consumes no more resources than current fixed policies while significantly reducing cold-start frequency.
For workloads with stable patterns, pre-warming effectively eliminates cold starts.

Meeting Notes

(to be filled)