User requests
NVIDIA Dynamo · control plane
KV-Aware Router
SLO Planner
Disaggregated Serving
Backend engine
vLLM
one device / instance
SGLang
one device / instance
TensorRT-LLM
NVIDIA GPU only
OpenInfer runtime
routes the whole fleet
Your fleet
CPU
x86 · ARM
GPU
any vendor
NPU
accelerators
Same Dynamo control plane. Swap the backend and watch how much of the fleet it can actually reach.