Vision One (computer-vision analytics)

What this is

Vision One (internally “StoreHub Vision One”) is a production, multi-model computer-vision framework for retail CCTV analytics — roughly 13k lines of Python covering occupancy tracking, people counting, crowd detection, and demographics. It’s a separate codebase from StreamHub. This page describes the plan to make StreamHub the media, camera-registry, and dashboard layer underneath Vision One’s analytics, so the pair together form a self-hosted retail-intelligence platform — something neither Ant Media nor Wowza offer natively.

The reason this is a replacement path, not greenfield: Vision One’s real-time model already pulls HLS from an Ant Media box today. Its three input modalities — HLS, recorded MP4, and snapshots — map directly onto what StreamHub already produces.

The five models

Model	Stack	Input	Output
Occupancy (real-time daemon)	YOLO11m + ByteTrack	HLS stream, read via `cv2.VideoCapture`	zone-based occupancy sessions → Postgres; dwell-time webhook alerts; its own `/metrics` + MJPEG debug view
People counting (nightly batch, 3-step)	YOLO11m → YOLOv8m fallback + BoT-SORT	recorded video (pulled from a records API as MP4)	line-crossing IN/OUT counts → Postgres; a `load_completed` webhook
Crowd (cron, per-minute)	YOLO26m	camera snapshots (CDN)	agglomeration alerts + an annotated image
Demography (cron, every 2 min)	InsightFace (genderage) + HSEmotion (emotion), onnxruntime	snapshots	age/gender/emotion estimates → Postgres
Benchmark	YOLO26m / YOLOv8m	—	GPU/CPU performance report

Architecturally, each model is an isolated OS process with file-based heartbeats (no Redis dependency), behind a FastAPI control surface, with a master/slave GPU cluster mode for the two models that support distribution (occupancy, people counting).

Why StreamHub fits (input/output mapping)

Vision One needs	StreamHub already produces
An HLS stream per camera (occupancy)	HLS egress: `/hls/<app>/<room>/index.m3u8`
Recorded MP4s per camera/day (people counting)	per-app S3 recordings (egress writes directly to the app’s bucket)
Camera snapshots (crowd, demography)	ws-mjpeg `frame.jpg` + egress snapshots
Camera + zone/line configuration, per tenant	per-app `config.yaml` plus the ingress/ws-ingest registry — StreamHub apps map to tenants
A place to push events and visualize them	HMAC-signed webhooks + player-overlay plugins on the dashboard

Vision One’s zone geometry is already normalized to 0–1 coordinates, and its counting line is similarly normalized (position, length, orientation) — both drop directly onto StreamHub player-overlay coordinates without conversion.

Integration architecture

StreamHub owns the media plane, camera registry, and dashboard. Vision One owns the analytics plane. Two data paths plus one event bridge:

Occupancy → a StreamHub worker plugin (real-time). This is the best fit: the occupancy model is already a standalone daemon process with file heartbeats — exactly the shape of StreamHub’s needsWorker plugin contract (see the plugin framework in the architecture docs). A StreamHub plugin would spawn it per app on the GPU node and inject:
- the app’s HLS egress URL, overriding Vision One’s stream-source config, and
- the device and zone list, sourced from the app’s config.yaml instead of Vision One’s own management API.
People counting → a nightly job over StreamHub S3 (batch). Replace Vision One’s Records API call with an S3-prefix listing that returns the same [{ record_url: <mp4> }] shape its downloader expects. Counts continue to flow into Postgres exactly as they do today.
Events → StreamHub → overlays. Point Vision One’s alert/webhook URLs at a StreamHub ingest endpoint. StreamHub re-signs the events with HMAC and fans them out to both external webhook consumers and player-overlay plugins on the dashboard (live occupancy zones, footfall counters, crowd heatmaps, demographic cards).

Both systems would share the GPU node described in Capacity planning — StreamHub handles ingest/SFU and the control plane; the GPU node runs the vision models.

The glue — three thin adapters, no algorithm changes

Device/zone config source — feed Vision One its device list from StreamHub apps instead of its current management API.
S3 record lister — a small endpoint or prefix-lister matching the shape Vision One’s downloader already expects, backed by StreamHub’s per-app bucket.
HMAC webhook bridge — a StreamHub receiver that accepts Vision One’s (currently unsigned) events and re-emits them signed to the rest of StreamHub’s webhook/overlay pipeline.

Gaps to close

Config coupling: cameras and zones currently come from Vision One’s own management API — needs to move to StreamHub’s app config (adapter #1, above).
Persistence coupling: the models write directly to a multi-tenant Postgres database today. Either share that database, or add an event-only output mode (a CSV test-mode sink already exists as a model for this) so StreamHub owns storage instead.
Webhook contract: Vision One’s events are currently unsigned JSON, so a bridge or a small signing patch is needed to fit StreamHub’s HMAC webhook contract.
Tenancy mapping: Vision One keys everything by tenant_id + a legacy store/domain ID — this needs an explicit mapping to StreamHub’s app/room identity.
GPU and lifecycle ownership: as a StreamHub worker, StreamHub’s plugin supervisor would own start/stop — Vision One’s existing one-GPU-per-cluster-node assumption still applies on top of that.

Phased plan

P1 — Occupancy live (medium effort). A worker plugin wraps the occupancy daemon; StreamHub HLS and app-config zones go in, HMAC webhooks come out, with a zone overlay on the player. This phase alone replaces the current Ant Media dependency with StreamHub as the media source.
P2 — People counting batch (medium effort). S3 record lister plus a nightly job; footfall overlays and an /apps/:app/analytics endpoint.
P3 — Crowd + demography (medium effort). Snapshot feed from StreamHub; demographics and crowd cards as dashboard panel plugins.
P4 — Unify (large effort). Fold Vision One’s model-runner into StreamHub’s worker framework so there’s a single control plane, GPU scheduler, and dashboard for both.

Positioning: if built, this turns StreamHub into a retail-intelligence product (retail, fuel, parking, quick-service) on top of primitives that already exist in the platform — plugins, per-app config, S3 recordings, and signed webhooks — paired with a vision suite that’s already running in production elsewhere.