Edge compute grid
This content is for the 1.0 version. Switch to the latest version for up-to-date documentation.
The idea
Section titled “The idea”Add remote GPU machines — behind NAT, outside the server’s own network, including reused ex-crypto-mining cards — to the cluster as compute nodes that offload vision-model inference (YOLO, Vision One), transcoding, or plugin workers from the media server itself. Encrypted, tolerant of nodes dropping in and out, private to the operator. The comparison point is “vast.ai or Salad, but a private pool running your own models and configs.”
Verdict
Section titled “Verdict”Feasible for compute offload. Not feasible for serving live viewers. The pull-based, outbound-agent, encrypted-overlay pattern is proven at scale elsewhere (Salad: 1M+ nodes, io.net, Aethir on consumer RTX cards). Two things make it a natural fit for StreamHub specifically:
- Vision One is already architected as a VRAM-sized master/slave GPU cluster.
- StreamHub already has a node registry, heartbeats, and capacity reporting (see Cluster).
The Grid concept is essentially those two things, made NAT-safe and tolerant of nodes churning.
The one workload that fundamentally cannot run this way is a live SFU edge that serves viewers directly — that requires a public IP for inbound WebRTC media. Viewer-facing delivery stays on the origin/edge/CDN path described in Distribution.
Why NAT isn’t a blocker
Section titled “Why NAT isn’t a blocker”A NAT’d machine has no public IP and can’t have ports forwarded to it, so it has to initiate every connection outward. Two channels cover everything the Grid needs:
- Control — the agent holds a persistent outbound WebSocket/gRPC connection to the coordinator. This is exactly how StreamHub’s own cluster heartbeat works today, and how GitHub self-hosted runners and Salad nodes work. Jobs get pushed down this channel.
- Media/data — reached by pull, which is inherently NAT-friendly:
| Workload | How the agent pulls it |
|---|---|
| Vision One occupancy | HLS, via cv2.VideoCapture/FFmpeg — already how it works |
| Crowd / demography analysis | Camera snapshots over HTTPS |
| People counting | Recorded MP4s from the app’s S3 bucket |
| Sub-second inference | WHEP/WebRTC (ICE/STUN/TURN already traverse NAT) |
A self-hosted WireGuard overlay (Netbird, Nebula, or headscale) is the proposed unifier: agents
join a private encrypted network with no open ports, NAT hole-punching and relay fallback are
handled by the mesh, and every node behaves as if it were on the LAN. This would also let Vision
One’s current master→slave design (which polls a slave’s /cluster/health — master reaching
slave, needing the slave reachable) run unmodified over the overlay, since the master would reach
the “slave” as if it were local.
Security posture
Section titled “Security posture”- Identity: a per-agent revocable token, extending the cluster’s existing
X-Cluster-Token, moving toward per-agent certificates (mTLS). - In transit: WireGuard (ChaCha20-Poly1305) end to end; media itself pulled over HTTPS/DTLS-SRTP.
- At rest: no persistence on the compute node — ephemeral frames, encrypted scratch space, wiped at job end.
- Trust boundary: a compute node has to decrypt frames to run inference on them, so true zero-knowledge end-to-end encryption is fundamentally incompatible with this design. The Grid is therefore scoped as a private pool of machines the operator controls (home, office, trusted peers) — not a public marketplace. Residual risk is mitigated by routing only non-sensitive streams to lower-trust nodes, per tenant.
Reusing old mining GPUs
Section titled “Reusing old mining GPUs”- Ampere-era ex-miners (RTX 3060 12GB / 3070 / 3080 / 3090 24GB) are a strong fit for YOLO/Vision One inference — CUDA, tensor cores, and enough VRAM that a 3090 can run several models at once.
- Mining-only cards (P106-100, CMP) have no display output but CUDA still works, so headless inference is fine; the older architecture and lower VRAM push them toward lighter models (YOLO n/s).
- AMD (RX580-class) needs ROCm/onnxruntime-DirectML, which is finicky — light use only.
- Transcoding: consumer GPUs now allow up to 8 concurrent NVENC sessions (a driver change since January 2024, no hack required), and inference has no such session cap — so a reused 30-series card can do real transcode work and unlimited inference concurrently.
- Honest caveat: home power cost, heat, and flaky residential power/internet mean these nodes will churn. The scheduling design has to assume that from the start.
Scheduling — mostly already built
Section titled “Scheduling — mostly already built”- StreamHub’s cluster already has the
nodesregistry, heartbeats,stats_jsoncapacity reporting, and nodestatus(active/draining/disabled) — see Cluster. - Vision One’s
LoadBalanceralready sizesmax_streamsfrom free VRAM, does weighted round-robin assignment, and fails back with grace and idempotent database writes — churn-safe by design. - The Grid router, in this design, is those two extended: agent-initiated registration (so NAT doesn’t matter), capacity awareness (GPU model, VRAM, TFLOPS), region/latency awareness, and job typing (which model, which camera, transcode vs. inference), with reassignment on a stale heartbeat — the same staleness mechanism the cluster already uses.
What fits and what doesn’t
Section titled “What fits and what doesn’t”| Fits (pull + push) | Doesn’t fit through NAT |
|---|---|
| Vision One occupancy / people-counting / crowd / demography | A live SFU edge serving viewers (needs inbound public UDP 7882) |
YOLO and any needsWorker plugin |
Acting as a public CDN edge (needs inbound) — do the compute, deliver via a real CDN |
| Transcode / ABR ladder (NVENC), pushing HLS to the origin/S3 | |
| VOD / recording post-processing (async, idempotent) | |
| Egress/recording (pull WHEP, record, push to S3) |
A useful shorthand: the Grid can pre-transcode or package content for a CDN, but the actual public delivery still has to happen on a real edge (Bunny/Cloudflare) or a P2P layer — a NAT’d node can never be a public cache.
How this differs from vast.ai / Salad
Section titled “How this differs from vast.ai / Salad”Those are public marketplaces renting out strangers’ GPUs to run arbitrary containers. The Grid concept is a private pool of the operator’s own machines, running the operator’s own models (Vision One) and configs, integrated directly with StreamHub’s job system — simpler trust, tighter integration, no rental cost. Closer to a self-hosted Salad, or a folding@home for an operator’s own CCTV analytics.
Phased plan (if built)
Section titled “Phased plan (if built)”- P1 — Agent + mesh. A compute agent (Docker or a static binary) plus a self-hosted WireGuard overlay and an outbound capacity heartbeat into the existing cluster registry. Prove it with one job: Vision One occupancy running on a remote/NAT’d GPU, pulling StreamHub HLS, reporting events back via an HMAC webhook to a dashboard overlay.
- P2 — Grid router. Capacity-, region-, and trust-aware, churn-tolerant job assignment across agents, folding in Vision One’s existing balancer. Job types: vision models, transcode, VOD.
- P3 — Transcode/VOD offload. Move the ABR ladder and recording post-processing onto Grid nodes (NVENC), relieving the origin.
- P4 — Fleet UX. A one-liner agent installer (matching the node installer’s UX), a Grid tab in the dashboard (GPUs, jobs, health), revocable tokens.
Bottom line: the concept is realistic specifically because the target workloads are pull-based and most of the scheduler already exists (StreamHub’s cluster registry plus Vision One’s balancer). If built, it would turn spare or old GPUs into cheap parallel capacity for the vision-AI features already on the roadmap — without ever needing those GPUs to accept inbound connections.