Vision One (computer-vision analytics)
What this is
Section titled “What this is”Vision One (internally “StoreHub Vision One”) is a production, multi-model computer-vision framework for retail CCTV analytics — roughly 13k lines of Python covering occupancy tracking, people counting, crowd detection, and demographics. It’s a separate codebase from StreamHub. This page describes the plan to make StreamHub the media, camera-registry, and dashboard layer underneath Vision One’s analytics, so the pair together form a self-hosted retail-intelligence platform — something neither Ant Media nor Wowza offer natively.
The reason this is a replacement path, not greenfield: Vision One’s real-time model already pulls HLS from an Ant Media box today. Its three input modalities — HLS, recorded MP4, and snapshots — map directly onto what StreamHub already produces.
The five models
Section titled “The five models”| Model | Stack | Input | Output |
|---|---|---|---|
| Occupancy (real-time daemon) | YOLO11m + ByteTrack | HLS stream, read via cv2.VideoCapture |
zone-based occupancy sessions → Postgres; dwell-time webhook alerts; its own /metrics + MJPEG debug view |
| People counting (nightly batch, 3-step) | YOLO11m → YOLOv8m fallback + BoT-SORT | recorded video (pulled from a records API as MP4) | line-crossing IN/OUT counts → Postgres; a load_completed webhook |
| Crowd (cron, per-minute) | YOLO26m | camera snapshots (CDN) | agglomeration alerts + an annotated image |
| Demography (cron, every 2 min) | InsightFace (genderage) + HSEmotion (emotion), onnxruntime | snapshots | age/gender/emotion estimates → Postgres |
| Benchmark | YOLO26m / YOLOv8m | — | GPU/CPU performance report |
Architecturally, each model is an isolated OS process with file-based heartbeats (no Redis dependency), behind a FastAPI control surface, with a master/slave GPU cluster mode for the two models that support distribution (occupancy, people counting).
Why StreamHub fits (input/output mapping)
Section titled “Why StreamHub fits (input/output mapping)”| Vision One needs | StreamHub already produces |
|---|---|
| An HLS stream per camera (occupancy) | HLS egress: /hls/<app>/<room>/index.m3u8 |
| Recorded MP4s per camera/day (people counting) | per-app S3 recordings (egress writes directly to the app’s bucket) |
| Camera snapshots (crowd, demography) | ws-mjpeg frame.jpg + egress snapshots |
| Camera + zone/line configuration, per tenant | per-app config.yaml plus the ingress/ws-ingest registry — StreamHub apps map to tenants |
| A place to push events and visualize them | HMAC-signed webhooks + player-overlay plugins on the dashboard |
Vision One’s zone geometry is already normalized to 0–1 coordinates, and its counting line is similarly normalized (position, length, orientation) — both drop directly onto StreamHub player-overlay coordinates without conversion.
Integration architecture
Section titled “Integration architecture”StreamHub owns the media plane, camera registry, and dashboard. Vision One owns the analytics plane. Two data paths plus one event bridge:
- Occupancy → a StreamHub worker plugin (real-time). This is the best fit: the occupancy
model is already a standalone daemon process with file heartbeats — exactly the shape of
StreamHub’s
needsWorkerplugin contract (see the plugin framework in the architecture docs). A StreamHub plugin would spawn it per app on the GPU node and inject:- the app’s HLS egress URL, overriding Vision One’s stream-source config, and
- the device and zone list, sourced from the app’s
config.yamlinstead of Vision One’s own management API.
- People counting → a nightly job over StreamHub S3 (batch). Replace Vision One’s Records
API call with an S3-prefix listing that returns the same
[{ record_url: <mp4> }]shape its downloader expects. Counts continue to flow into Postgres exactly as they do today. - Events → StreamHub → overlays. Point Vision One’s alert/webhook URLs at a StreamHub ingest endpoint. StreamHub re-signs the events with HMAC and fans them out to both external webhook consumers and player-overlay plugins on the dashboard (live occupancy zones, footfall counters, crowd heatmaps, demographic cards).
Both systems would share the GPU node described in Capacity planning — StreamHub handles ingest/SFU and the control plane; the GPU node runs the vision models.
The glue — three thin adapters, no algorithm changes
Section titled “The glue — three thin adapters, no algorithm changes”- Device/zone config source — feed Vision One its device list from StreamHub apps instead of its current management API.
- S3 record lister — a small endpoint or prefix-lister matching the shape Vision One’s downloader already expects, backed by StreamHub’s per-app bucket.
- HMAC webhook bridge — a StreamHub receiver that accepts Vision One’s (currently unsigned) events and re-emits them signed to the rest of StreamHub’s webhook/overlay pipeline.
Gaps to close
Section titled “Gaps to close”- Config coupling: cameras and zones currently come from Vision One’s own management API — needs to move to StreamHub’s app config (adapter #1, above).
- Persistence coupling: the models write directly to a multi-tenant Postgres database today. Either share that database, or add an event-only output mode (a CSV test-mode sink already exists as a model for this) so StreamHub owns storage instead.
- Webhook contract: Vision One’s events are currently unsigned JSON, so a bridge or a small signing patch is needed to fit StreamHub’s HMAC webhook contract.
- Tenancy mapping: Vision One keys everything by
tenant_id+ a legacy store/domain ID — this needs an explicit mapping to StreamHub’s app/room identity. - GPU and lifecycle ownership: as a StreamHub worker, StreamHub’s plugin supervisor would own start/stop — Vision One’s existing one-GPU-per-cluster-node assumption still applies on top of that.
Phased plan
Section titled “Phased plan”- P1 — Occupancy live (medium effort). A worker plugin wraps the occupancy daemon; StreamHub HLS and app-config zones go in, HMAC webhooks come out, with a zone overlay on the player. This phase alone replaces the current Ant Media dependency with StreamHub as the media source.
- P2 — People counting batch (medium effort). S3 record lister plus a nightly job;
footfall overlays and an
/apps/:app/analyticsendpoint. - P3 — Crowd + demography (medium effort). Snapshot feed from StreamHub; demographics and crowd cards as dashboard panel plugins.
- P4 — Unify (large effort). Fold Vision One’s model-runner into StreamHub’s worker framework so there’s a single control plane, GPU scheduler, and dashboard for both.
Positioning: if built, this turns StreamHub into a retail-intelligence product (retail, fuel, parking, quick-service) on top of primitives that already exist in the platform — plugins, per-app config, S3 recordings, and signed webhooks — paired with a vision suite that’s already running in production elsewhere.