Skip to content

AWS EC2 testing

Three time-boxed proof-of-concepts on real AWS EC2 (us-east-1), run to certify per-size cluster capacity, measure NVIDIA T4 transcode/CV performance, and validate a full S3 recording round-trip against real AWS S3. Total cost: ~$0.58 across all three. Full methodology, exact CLI commands, and raw results live in the repo at streamhub-docs/operations/AWS-POC.md; this page is the condensed how-to.

  1. Resolve the AMI live — don’t hardcode an AMI ID (they’re region- and rotation-specific):

    Terminal window
    AMI_ID=$(aws ssm get-parameters --region us-east-1 \
    --names /aws/service/canonical/ubuntu/server/24.04/stable/current/amd64/hvm/ebs-gp3/ami-id \
    --query 'Parameters[0].Value' --output text)
  2. Security group — open the same ports the installer preflights: 22, 80, 443, 1935 (RTMP), 7880/7881 (LiveKit signaling), 8080 (WHIP), 7882/udp (WebRTC media). Keep 6379 (Redis, cluster coordination) private only — a self-referencing security group rule (source = the SG itself) is enough for nodes to reach each other’s Redis without ever exposing it publicly.

  3. No Elastic IPs. Use the auto-assigned public IP (--associate-public-ip-address) — it dies with the instance, so nothing outlives teardown by accident.

  4. Install with the published one-liner:

    Terminal window
    # origin — non-interactive, cluster-ready
    curl -fsSL https://www.streamhub.studio/install.sh | sudo bash -s -- \
    --non-interactive --no-tls \
    --domain <origin-public-ip> \
    --cluster-redis-bind <origin-private-ip>
    # each edge — join by token
    curl -fsSL https://www.streamhub.studio/install.sh | sudo bash -s -- \
    --join --master-token <clt_...> \
    --master-ip <origin-private-ip> --master-url http://<origin-private-ip> \
    --node-name <edge-name>

    See Quick install for every origin flag and Join a cluster for the day-1 edge flow in detail.

Cluster sizing — certified per instance size

Section titled “Cluster sizing — certified per instance size”

Certified by actually loading each size to its ceiling (5-node cluster: 1 origin + t3.small/t3.medium/t3.large/c5.large edges), not read off the spec sheet:

Instance vCPU / RAM Reliable concurrent RTMP ingest Room-composite HLS/recording?
t3.small 2 / 2 GB ~5–6 sessions; collapses at 12 (loadavg 35) No
t3.medium, t3.large, c5.large 2 / 4–8 GB not pushed to collapse No — all 2 vCPU
c5.xlarge 4 / 8 GB not the bottleneck Yes — exactly 1 concurrent (~1.5 GB RSS, 3 of 4 vCPU)

Room-composite egress (HLS-live and Chrome-based recording) needs ≥4 vCPU — LiveKit egress refuses below that (minimumCpu: 4). None of the 2-vCPU edge sizes above can serve composite HLS or recording regardless of RAM; use track/track-composite (ffmpeg, no Chrome — see Capacity planning) for small edges instead. Pure WebRTC room-serving is cheap everywhere: LiveKit sat at ~5% CPU serving 10 viewers on a t3.small-hosted room, at 13–15 fps.

At the cluster level: 15 simultaneous streams placed correctly across all 5 nodes by LiveKit’s own load-based allocator; a hard-killed edge holding 7 rooms recovered playback in under 2 minutes — but publishers ingesting through that edge died and did not fail over (ingest is pinned to the node it opened the RTMP/WHIP session on).

GPU transcode — NVIDIA T4 (g4dn.xlarge, spot)

Section titled “GPU transcode — NVIDIA T4 (g4dn.xlarge, spot)”

Stock Ubuntu 24.04 ffmpeg already ships h264_nvenc/hevc_nvenc/av1_nvenc once the driver is installed (nvidia-driver-580 from the stock repo). GPU passthrough to containers needs nvidia-container-toolkit + gpus: all (a ready-to-uncomment override in docker-compose.yml) — confirm with GET /api/v1/system/gpu (see the Transcoding / GPU section of server config).

Workload CPU GPU (T4) Gain
1080p→720p transcode (single job) 2.12x realtime (libx264) 4.6x realtime (NVENC) ~2.2x
1080p→720p transcode, 10 concurrent jobs each ≥1.67x realtime at 42% GPU util ceiling was CPU-side software decode, not the GPU — extrapolated ~16 jobs with full on-GPU decode+encode
deface/CenterFace @640×360 (face blur) 52.1 ms/frame 12.0 ms/frame ~4.3x
YOLOv8n inference 59.4 ms/frame 8.9 ms/frame ~6.7x

CUDA execution was verified active (not silently falling back), and CPU fallback was verified graceful when CUDA is unavailable.

Validated against real AWS S3 with a bucket-scoped IAM user (inline policy naming only that bucket — never account root keys):

Terminal window
# scoped IAM: create user + inline policy limited to one bucket, then one access key
aws iam create-user --user-name streamhub-poc-s3
aws iam put-user-policy --user-name streamhub-poc-s3 \
--policy-name streamhub-poc-bucket-only --policy-document file://policy.json
aws iam create-access-key --user-name streamhub-poc-s3
# point the app at it
curl -X PUT $BASE/apps/live/s3 -H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{"provider":"aws","bucket":"<bucket>","region":"us-east-1","endpoint":"","key":"...","secret":"..."}'

RTMP publish → POST /recording/startstop → VOD reaches ready → object present in the bucket → presigned URL returns 200 → valid MP4 (H.264 + AAC). GET /apps/:app/s3 never echoes the key/secret back. Full details on the S3 config schema in the s3 field reference in the per-app config.yaml docs.

Every PoC resource is tagged Project=streamhub-poc at creation, deleted in reverse order immediately after results are captured, and confirmed gone with an audit query — this is what keeps repeated EC2 testing cheap and leak-free:

Terminal window
aws ec2 terminate-instances --instance-ids <id...> # DeleteOnTermination:true on the root volume
aws ec2 delete-security-group --group-id <sg-id>
aws ec2 delete-key-pair --key-name streamhub-poc
aws s3 rm s3://<bucket> --recursive && aws s3 rb s3://<bucket>
aws iam delete-access-key --user-name streamhub-poc-s3 --access-key-id <akid>
aws iam delete-user-policy --user-name streamhub-poc-s3 --policy-name streamhub-poc-bucket-only
aws iam delete-user --user-name streamhub-poc-s3
# audit — every query below must return empty
aws ec2 describe-instances --filters "Name=tag:Project,Values=streamhub-poc" \
"Name=instance-state-name,Values=pending,running,stopping,stopped"
aws ec2 describe-security-groups --filters "Name=tag:Project,Values=streamhub-poc"
aws s3api list-buckets --query "Buckets[?starts_with(Name,'streamhub-poc')].Name"
aws iam list-users --query "Users[?starts_with(UserName,'streamhub-poc')].UserName"

No Elastic IPs were allocated in any of the three PoCs, so there’s no separate IP-release step.