Skip to content

Vision One (computer-vision analytics)

Vision One (internally “StoreHub Vision One”) is a production, multi-model computer-vision framework for retail CCTV analytics — roughly 13k lines of Python covering occupancy tracking, people counting, crowd detection, and demographics. It’s a separate codebase from StreamHub. This page describes the plan to make StreamHub the media, camera-registry, and dashboard layer underneath Vision One’s analytics, so the pair together form a self-hosted retail-intelligence platform — something neither Ant Media nor Wowza offer natively.

The reason this is a replacement path, not greenfield: Vision One’s real-time model already pulls HLS from an Ant Media box today. Its three input modalities — HLS, recorded MP4, and snapshots — map directly onto what StreamHub already produces.

Model Stack Input Output
Occupancy (real-time daemon) YOLO11m + ByteTrack HLS stream, read via cv2.VideoCapture zone-based occupancy sessions → Postgres; dwell-time webhook alerts; its own /metrics + MJPEG debug view
People counting (nightly batch, 3-step) YOLO11m → YOLOv8m fallback + BoT-SORT recorded video (pulled from a records API as MP4) line-crossing IN/OUT counts → Postgres; a load_completed webhook
Crowd (cron, per-minute) YOLO26m camera snapshots (CDN) agglomeration alerts + an annotated image
Demography (cron, every 2 min) InsightFace (genderage) + HSEmotion (emotion), onnxruntime snapshots age/gender/emotion estimates → Postgres
Benchmark YOLO26m / YOLOv8m GPU/CPU performance report

Architecturally, each model is an isolated OS process with file-based heartbeats (no Redis dependency), behind a FastAPI control surface, with a master/slave GPU cluster mode for the two models that support distribution (occupancy, people counting).

Vision One needs StreamHub already produces
An HLS stream per camera (occupancy) HLS egress: /hls/<app>/<room>/index.m3u8
Recorded MP4s per camera/day (people counting) per-app S3 recordings (egress writes directly to the app’s bucket)
Camera snapshots (crowd, demography) ws-mjpeg frame.jpg + egress snapshots
Camera + zone/line configuration, per tenant per-app config.yaml plus the ingress/ws-ingest registry — StreamHub apps map to tenants
A place to push events and visualize them HMAC-signed webhooks + player-overlay plugins on the dashboard

Vision One’s zone geometry is already normalized to 0–1 coordinates, and its counting line is similarly normalized (position, length, orientation) — both drop directly onto StreamHub player-overlay coordinates without conversion.

StreamHub owns the media plane, camera registry, and dashboard. Vision One owns the analytics plane. Two data paths plus one event bridge:

  1. Occupancy → a StreamHub worker plugin (real-time). This is the best fit: the occupancy model is already a standalone daemon process with file heartbeats — exactly the shape of StreamHub’s needsWorker plugin contract (see the plugin framework in the architecture docs). A StreamHub plugin would spawn it per app on the GPU node and inject:
    • the app’s HLS egress URL, overriding Vision One’s stream-source config, and
    • the device and zone list, sourced from the app’s config.yaml instead of Vision One’s own management API.
  2. People counting → a nightly job over StreamHub S3 (batch). Replace Vision One’s Records API call with an S3-prefix listing that returns the same [{ record_url: <mp4> }] shape its downloader expects. Counts continue to flow into Postgres exactly as they do today.
  3. Events → StreamHub → overlays. Point Vision One’s alert/webhook URLs at a StreamHub ingest endpoint. StreamHub re-signs the events with HMAC and fans them out to both external webhook consumers and player-overlay plugins on the dashboard (live occupancy zones, footfall counters, crowd heatmaps, demographic cards).

Both systems would share the GPU node described in Capacity planning — StreamHub handles ingest/SFU and the control plane; the GPU node runs the vision models.

The glue — three thin adapters, no algorithm changes

Section titled “The glue — three thin adapters, no algorithm changes”
  1. Device/zone config source — feed Vision One its device list from StreamHub apps instead of its current management API.
  2. S3 record lister — a small endpoint or prefix-lister matching the shape Vision One’s downloader already expects, backed by StreamHub’s per-app bucket.
  3. HMAC webhook bridge — a StreamHub receiver that accepts Vision One’s (currently unsigned) events and re-emits them signed to the rest of StreamHub’s webhook/overlay pipeline.
  • Config coupling: cameras and zones currently come from Vision One’s own management API — needs to move to StreamHub’s app config (adapter #1, above).
  • Persistence coupling: the models write directly to a multi-tenant Postgres database today. Either share that database, or add an event-only output mode (a CSV test-mode sink already exists as a model for this) so StreamHub owns storage instead.
  • Webhook contract: Vision One’s events are currently unsigned JSON, so a bridge or a small signing patch is needed to fit StreamHub’s HMAC webhook contract.
  • Tenancy mapping: Vision One keys everything by tenant_id + a legacy store/domain ID — this needs an explicit mapping to StreamHub’s app/room identity.
  • GPU and lifecycle ownership: as a StreamHub worker, StreamHub’s plugin supervisor would own start/stop — Vision One’s existing one-GPU-per-cluster-node assumption still applies on top of that.
  1. P1 — Occupancy live (medium effort). A worker plugin wraps the occupancy daemon; StreamHub HLS and app-config zones go in, HMAC webhooks come out, with a zone overlay on the player. This phase alone replaces the current Ant Media dependency with StreamHub as the media source.
  2. P2 — People counting batch (medium effort). S3 record lister plus a nightly job; footfall overlays and an /apps/:app/analytics endpoint.
  3. P3 — Crowd + demography (medium effort). Snapshot feed from StreamHub; demographics and crowd cards as dashboard panel plugins.
  4. P4 — Unify (large effort). Fold Vision One’s model-runner into StreamHub’s worker framework so there’s a single control plane, GPU scheduler, and dashboard for both.

Positioning: if built, this turns StreamHub into a retail-intelligence product (retail, fuel, parking, quick-service) on top of primitives that already exist in the platform — plugins, per-app config, S3 recordings, and signed webhooks — paired with a vision suite that’s already running in production elsewhere.