YOLO11 detection agent
YOLO11 detections come from argus-agent — a Python worker that joins the mission’s LiveKit room as a bot participant, pulls drone video tracks, runs YOLO11 inference on every frame, and publishes bounding boxes back to the room as data-channel messages. The webapp doesn’t run YOLO in the browser; it just consumes the agent’s output.
Agent dispatch
When a mission activates with AI detection enabled in its feature set, the agent auto-dispatches to the mission’s room (see agent rooms). The agent:
- Subscribes to every drone video track (
VIDEO_DRONE_TRACK_ID). - Runs YOLO11 per-frame, typically 20-30 fps depending on resolution.
- For each detected object, runs BoT-SORT for short-term tracking (reset on occlusion) and DINOv2 + FAISS for persistent re-ID (stable across occlusions).
- Publishes detections on the data channel with participant identity = agent’s LiveKit identity.
What the webapp consumes
LiveKitDetectionsService subscribes to data-channel messages of shape:
{ type: 'detections', participantIdentity: string, // which drone's track detections: Array<{ left: number, // 0-1 normalised top: number, right: number, bottom: number, trackingId: string, // short-term BoT-SORT id persistentId: string, // DINOv2 re-ID, 'p_abcd' format confidences: Record<string, number>, // class -> confidence }>}The drone-stream tile reads this stream and renders each detection as a labelled bounding box. Color coding defaults:
| Class | Color |
|---|---|
| person | red |
| vehicle | orange |
| animal | cyan |
| other | green |
You can override to single-colour mode in the drone settings drawer → AI tab — Auto colouring vs Single colouring (with a custom picker).
Persistent tracking IDs
The persistentId field is what makes YOLO11 detections useful beyond a
single frame:
- Tracking ID (
trackingId) resets every time an object is lost (occlusion, leaving frame). - Persistent ID (
persistentId, shapep_xxxx) uses DINOv2 feature embeddings + FAISS approximate-nearest-neighbour search to re-identify the same object when it re-appears — so a person walking behind a tree and back out keeps the samep_xxxxID.
The drone-stream tile’s overlay displays persistent IDs on the bounding box tooltip when available — this lets operators correlate “that’s the same person I tagged two minutes ago.”
What the webapp does NOT expose
- Per-class filter UI — you can’t say “only show vehicles” in the UI. The agent publishes all classes; the browser renders all classes. Planned.
- Confidence threshold beyond the per-drone slider — the drawer’s 0-1 slider is the only threshold control. The agent’s own pre-filter isn’t exposed.
- Model selection — the agent deploys with a fixed checkpoint (YOLO11-L default, YOLO11-SAR for SAR-tuned orgs). Changing checkpoint requires agent redeployment.
Cost / metering
The agent runs on GPU instances (NVIDIA L4-class). Inference minutes are metered per operation and visible under Admin → Organisation → Usage. When the mission completes, the agent releases — no standing GPU cost.