AdvancedDistributed SystemsGoAI Infrastructure

Building an AI Inference Network

Turn the machines you own into one OpenAI-compatible AI cluster. This course unpacks the stack behind Quorum: a Go + Wails desktop app, mDNS LAN discovery, a WebSocket relay for cross-network routing, an OpenAI-compatible streaming API, pluggable local model backends, and distributed inference. Real Go and config to read; each mechanism is runnable as a JavaScript model.

10 lessons · ~2.5 hours

0 of 10 lessons complete

1. The Cluster

What an inference distribution network is
Why you'd turn the machines you already own into one OpenAI-compatible cluster — and the four pieces that make it work.
Scheduling, retry & fallback
How a request reaches a holder, and what happens when that holder is slow, busy, or offline.

2. The Desktop App

Wails: Go backend, web UI, one binary
How a single desktop binary runs a Go core and a React frontend that call each other directly — no HTTP server in between.
The local control plane UI
A local-first React UI fed by live events from the Go core — fleet nodes, models, and cloud status without polling.

3. The Network

LAN discovery with mDNS
How nodes on the same network find each other automatically — multicast announcements, heartbeats, and pruning the dead.
The WebSocket relay
Why cross-network routing needs a relay, and how it registers nodes, tracks presence, and forwards messages between them.
The OpenAI-compatible streaming API
Serving /v1/chat/completions with server-sent events so any OpenAI client works unchanged.

4. Inference & Ops

Local backends behind one interface
Wrapping Ollama, LM Studio, llama.cpp, MLX, and vLLM behind a single driver so the cluster doesn't care which one runs.
Distributed inference (model sharding)
Splitting one model across several machines with llama.cpp RPC when it won't fit on a single GPU.
Deploying & one-click self-update
Running the relay on Linux with systemd, shipping the dashboard, and updating the desktop app without a reinstall.