Skip to content

Edge compute grid

Add remote GPU machines — behind NAT, outside the server’s own network, including reused ex-crypto-mining cards — to the cluster as compute nodes that offload vision-model inference (YOLO, Vision One), transcoding, or plugin workers from the media server itself. Encrypted, tolerant of nodes dropping in and out, private to the operator. The comparison point is “vast.ai or Salad, but a private pool running your own models and configs.”

Feasible for compute offload. Not feasible for serving live viewers. The pull-based, outbound-agent, encrypted-overlay pattern is proven at scale elsewhere (Salad: 1M+ nodes, io.net, Aethir on consumer RTX cards). Two things make it a natural fit for StreamHub specifically:

  • Vision One is already architected as a VRAM-sized master/slave GPU cluster.
  • StreamHub already has a node registry, heartbeats, and capacity reporting (see Cluster).

The Grid concept is essentially those two things, made NAT-safe and tolerant of nodes churning.

The one workload that fundamentally cannot run this way is a live SFU edge that serves viewers directly — that requires a public IP for inbound WebRTC media. Viewer-facing delivery stays on the origin/edge/CDN path described in Distribution.

A NAT’d machine has no public IP and can’t have ports forwarded to it, so it has to initiate every connection outward. Two channels cover everything the Grid needs:

  • Control — the agent holds a persistent outbound WebSocket/gRPC connection to the coordinator. This is exactly how StreamHub’s own cluster heartbeat works today, and how GitHub self-hosted runners and Salad nodes work. Jobs get pushed down this channel.
  • Media/data — reached by pull, which is inherently NAT-friendly:
Workload How the agent pulls it
Vision One occupancy HLS, via cv2.VideoCapture/FFmpeg — already how it works
Crowd / demography analysis Camera snapshots over HTTPS
People counting Recorded MP4s from the app’s S3 bucket
Sub-second inference WHEP/WebRTC (ICE/STUN/TURN already traverse NAT)

A self-hosted WireGuard overlay (Netbird, Nebula, or headscale) is the proposed unifier: agents join a private encrypted network with no open ports, NAT hole-punching and relay fallback are handled by the mesh, and every node behaves as if it were on the LAN. This would also let Vision One’s current master→slave design (which polls a slave’s /cluster/health — master reaching slave, needing the slave reachable) run unmodified over the overlay, since the master would reach the “slave” as if it were local.

  • Identity: a per-agent revocable token, extending the cluster’s existing X-Cluster-Token, moving toward per-agent certificates (mTLS).
  • In transit: WireGuard (ChaCha20-Poly1305) end to end; media itself pulled over HTTPS/DTLS-SRTP.
  • At rest: no persistence on the compute node — ephemeral frames, encrypted scratch space, wiped at job end.
  • Trust boundary: a compute node has to decrypt frames to run inference on them, so true zero-knowledge end-to-end encryption is fundamentally incompatible with this design. The Grid is therefore scoped as a private pool of machines the operator controls (home, office, trusted peers) — not a public marketplace. Residual risk is mitigated by routing only non-sensitive streams to lower-trust nodes, per tenant.
  • Ampere-era ex-miners (RTX 3060 12GB / 3070 / 3080 / 3090 24GB) are a strong fit for YOLO/Vision One inference — CUDA, tensor cores, and enough VRAM that a 3090 can run several models at once.
  • Mining-only cards (P106-100, CMP) have no display output but CUDA still works, so headless inference is fine; the older architecture and lower VRAM push them toward lighter models (YOLO n/s).
  • AMD (RX580-class) needs ROCm/onnxruntime-DirectML, which is finicky — light use only.
  • Transcoding: consumer GPUs now allow up to 8 concurrent NVENC sessions (a driver change since January 2024, no hack required), and inference has no such session cap — so a reused 30-series card can do real transcode work and unlimited inference concurrently.
  • Honest caveat: home power cost, heat, and flaky residential power/internet mean these nodes will churn. The scheduling design has to assume that from the start.
  • StreamHub’s cluster already has the nodes registry, heartbeats, stats_json capacity reporting, and node status (active/draining/disabled) — see Cluster.
  • Vision One’s LoadBalancer already sizes max_streams from free VRAM, does weighted round-robin assignment, and fails back with grace and idempotent database writes — churn-safe by design.
  • The Grid router, in this design, is those two extended: agent-initiated registration (so NAT doesn’t matter), capacity awareness (GPU model, VRAM, TFLOPS), region/latency awareness, and job typing (which model, which camera, transcode vs. inference), with reassignment on a stale heartbeat — the same staleness mechanism the cluster already uses.
Fits (pull + push) Doesn’t fit through NAT
Vision One occupancy / people-counting / crowd / demography A live SFU edge serving viewers (needs inbound public UDP 7882)
YOLO and any needsWorker plugin Acting as a public CDN edge (needs inbound) — do the compute, deliver via a real CDN
Transcode / ABR ladder (NVENC), pushing HLS to the origin/S3
VOD / recording post-processing (async, idempotent)
Egress/recording (pull WHEP, record, push to S3)

A useful shorthand: the Grid can pre-transcode or package content for a CDN, but the actual public delivery still has to happen on a real edge (Bunny/Cloudflare) or a P2P layer — a NAT’d node can never be a public cache.

Those are public marketplaces renting out strangers’ GPUs to run arbitrary containers. The Grid concept is a private pool of the operator’s own machines, running the operator’s own models (Vision One) and configs, integrated directly with StreamHub’s job system — simpler trust, tighter integration, no rental cost. Closer to a self-hosted Salad, or a folding@home for an operator’s own CCTV analytics.

  1. P1 — Agent + mesh. A compute agent (Docker or a static binary) plus a self-hosted WireGuard overlay and an outbound capacity heartbeat into the existing cluster registry. Prove it with one job: Vision One occupancy running on a remote/NAT’d GPU, pulling StreamHub HLS, reporting events back via an HMAC webhook to a dashboard overlay.
  2. P2 — Grid router. Capacity-, region-, and trust-aware, churn-tolerant job assignment across agents, folding in Vision One’s existing balancer. Job types: vision models, transcode, VOD.
  3. P3 — Transcode/VOD offload. Move the ABR ladder and recording post-processing onto Grid nodes (NVENC), relieving the origin.
  4. P4 — Fleet UX. A one-liner agent installer (matching the node installer’s UX), a Grid tab in the dashboard (GPUs, jobs, health), revocable tokens.

Bottom line: the concept is realistic specifically because the target workloads are pull-based and most of the scheduler already exists (StreamHub’s cluster registry plus Vision One’s balancer). If built, it would turn spare or old GPUs into cheap parallel capacity for the vision-AI features already on the roadmap — without ever needing those GPUs to accept inbound connections.