DGX Spark multi-node vLLM inference setup. 4-node ConnectX-7 ring, custom NCCL 2.28.9 build to work around the 2.29.2 Docker bug, RDMA-tuned for L2-only routing.
A 4-node DGX Spark cluster running vLLM across ConnectX-7 with a custom NCCL 2.28.9 build (the 2.29.2 Docker image has a known bug). RDMA is L2-only via NCCL_IB_DISABLE=1 — the topology constraints are real and not in the docs.
In progress: making the multi-node ring stable enough to be the primary local inference for the agent fleet.