Vertical Disaggregation: Maximizing Model Throughput on Heterogeneous Silicon
tl;dr — We served Qwen 3.5 27B on two AWS instance types (Intel Xeon + Nvidia GPU and AMD EPYC + Nvidia L40S) using a dynamic scheduler with a custom runtime. By co-executing...





