Incoming request
OpenInfer · the inference OS
What the scheduler weighs
Free memory
KV residency
Queue depth
Prompt size
Latency target
Prefill / decode
Placed on your silicon
CPU
idle host cores
GPU
any vendor
NPU
accelerators
Per request, in microseconds: weigh the inputs, place it on the cheapest silicon that meets the target.