One OpenInfer instance routes across the whole fleet under Dynamo

User requests

NVIDIA Dynamo · control plane

KV-Aware Router

SLO Planner

Disaggregated Serving

Backend engine

vLLM

one device / instance

SGLang

one device / instance

TensorRT-LLM

NVIDIA GPU only

OpenInfer runtime

routes the whole fleet

Your fleet

CPU

x86 · ARM

GPU

any vendor

NPU

accelerators

Same Dynamo control plane. Swap the backend and watch how much of the fleet it can actually reach.