SAND: A New Programming Abstraction for Video-based Deep Learning
solve heavy preprocessing problem with abstraction & planning + priority scheduling to reuse
Venue: SOSP 2025 doi
Topic: Video-based Deep Learning, abstraction, scheduling, Delete redundancy
Summary
solve heavy preprocessing problem with abstraction & planning + priority scheduling to reuse
Background
Video-based Deep Learning’s pre-processing is difficult because: processing compressed video data is complex and bottleneck
Complex Implementation: Pipelines
Pipelines incurs complex implementation burden. 2x codes than model training.
Overhead
Normally, CPU does that but
overhead is upto 6.5x higher than GPU case -> bottleneck of the entire process
Offload to GPU
Reduce overhead but partially, and Reduce available memory for training
Resources
Repeated decoding
- the preprocessing workflow (from decoding to augmentation) must be repeated for each video in every epoch.
- because the frames from previous epochs are rarely reusuable. Decoded frames are never reused within the same epoch.
- Decode would not use frames: Due to the video codec dependencies, extracting the required frames necessitates decoding mamy additional frames that are immediately discarded.
effect
GPU underutilization -> drastically reduces the training throughput
Root cause : lack of system lelvel support for sharing decoded objects across independent jobs.
Related Work/ Existing Solutions
Partially mitigate, but inefficient & resource constrained
Programming abstraction for video analysis
only application level
Image preprocessing overhead
do not adress video-specific overhead: the repeated decoding problem
Streamed video pipeline
only basic platform level capbilities, doesn’t handle iterative nature
Solution: Storage level abstraction
provides a file system : exposes handles to critical poobjects in the video training pipeline
Complex Implementation: Pipelines -> abstraction
Abstracting away -> reduce developers’ workload Simplified Preprocessing Management. NO complex preprocessing pipelines or maintain relationships between data objects
Codes fewer than 10 lines…
Overhead -> System Level Optimization
solve CPU case problem by leverage system level object reuse -> eliminate redundant computation enables system-wide decisions for caching & scheduling data processing
Caching
intermediate objects
Design
1. Abstract the preprocessing workflow with “view”
view
high level abstraction representing virtual objects that encapsulate the intermediate stages of video preprocessing
2. Constructs a view materialization plan (models the video preprocessing workflow)
generates abstract view dependency graph / per task - serves as a blue print -> construct a materialization plan as a concrete graph. / per k epoch chunks + graph pruning for materialization under storage Limit
3. Reduces redundant decoding operations by reusing, act like a well functioned cache.
4. Parallelizes the task & priority based scheduling
Evaluation
SAND achieves significant improvements in training time and GPU utilization compared to both CPU and GPU preprocessing baselines.
Compared to CPU baseline, SAND improves training time up to 10.2× and GPU utilization 12.3× in hyperparameter search.
Compared to GPU baseline, SAND improves training time up to 2.8× and GPU utilization 2.9× in hyperparameter search.
Inputs
1. Running GPU consumes more energy than running CPU
2. Creativeness comes from knowing priorworks
“view” concept from DB -> used it
3. having a plan is always better than not having a plan
strategic planning of preprocessing sequences to maximize the reuse of intermediate objects across multiple tasks.
Thoughts.
- have good APIs.
- Literally “solve” the problem : ex. SAND shifts the focus from data processing to model development.