YOLO11 detection agent

YOLO11 detections come from argus-agent — a Python worker that joins the mission’s LiveKit room as a bot participant, pulls drone video tracks, runs YOLO11 inference on every frame, and publishes bounding boxes back to the room as data-channel messages. The webapp doesn’t run YOLO in the browser; it just consumes the agent’s output.

Agent dispatch

When a mission activates with AI detection enabled in its feature set, the agent auto-dispatches to the mission’s room (see agent rooms). The agent:

Subscribes to every drone video track (VIDEO_DRONE_TRACK_ID).
Runs YOLO11 per-frame, typically 20-30 fps depending on resolution.
For each detected object, runs BoT-SORT for short-term tracking (reset on occlusion) and DINOv2 + FAISS for persistent re-ID (stable across occlusions).
Publishes detections on the data channel with participant identity = agent’s LiveKit identity.

What the webapp consumes

LiveKitDetectionsService subscribes to data-channel messages of shape:

{
  type: 'detections',
  participantIdentity: string,  // which drone's track
  detections: Array<{
    left: number,      // 0-1 normalised
    top: number,
    right: number,
    bottom: number,
    trackingId: string,        // short-term BoT-SORT id
    persistentId: string,      // DINOv2 re-ID, 'p_abcd' format
    confidences: Record<string, number>,  // class -> confidence
  }>
}

The drone-stream tile reads this stream and renders each detection as a labelled bounding box. Color coding defaults:

Class	Color
person	red
vehicle	orange
animal	cyan
other	green

You can override to single-colour mode in the drone settings drawer → AI tab — Auto colouring vs Single colouring (with a custom picker).

Persistent tracking IDs

The persistentId field is what makes YOLO11 detections useful beyond a single frame:

Tracking ID (trackingId) resets every time an object is lost (occlusion, leaving frame).
Persistent ID (persistentId, shape p_xxxx) uses DINOv2 feature embeddings + FAISS approximate-nearest-neighbour search to re-identify the same object when it re-appears — so a person walking behind a tree and back out keeps the same p_xxxx ID.

The drone-stream tile’s overlay displays persistent IDs on the bounding box tooltip when available — this lets operators correlate “that’s the same person I tagged two minutes ago.”

What the webapp does NOT expose

Per-class filter UI — you can’t say “only show vehicles” in the UI. The agent publishes all classes; the browser renders all classes. Planned.
Confidence threshold beyond the per-drone slider — the drawer’s 0-1 slider is the only threshold control. The agent’s own pre-filter isn’t exposed.
Model selection — the agent deploys with a fixed checkpoint (YOLO11-L default, YOLO11-SAR for SAR-tuned orgs). Changing checkpoint requires agent redeployment.

Cost / metering

The agent runs on GPU instances (NVIDIA L4-class). Inference minutes are metered per operation and visible under Admin → Organisation → Usage. When the mission completes, the agent releases — no standing GPU cost.