Skip to content

YOLO11 detection agent

YOLO11 detections come from argus-agent — a Python worker that joins the mission’s LiveKit room as a bot participant, pulls drone video tracks, runs YOLO11 inference on every frame, and publishes bounding boxes back to the room as data-channel messages. The webapp doesn’t run YOLO in the browser; it just consumes the agent’s output.

Agent dispatch

When a mission activates with AI detection enabled in its feature set, the agent auto-dispatches to the mission’s room (see agent rooms). The agent:

  • Subscribes to every drone video track (VIDEO_DRONE_TRACK_ID).
  • Runs YOLO11 per-frame, typically 20-30 fps depending on resolution.
  • For each detected object, runs BoT-SORT for short-term tracking (reset on occlusion) and DINOv2 + FAISS for persistent re-ID (stable across occlusions).
  • Publishes detections on the data channel with participant identity = agent’s LiveKit identity.

What the webapp consumes

LiveKitDetectionsService subscribes to data-channel messages of shape:

{
type: 'detections',
participantIdentity: string, // which drone's track
detections: Array<{
left: number, // 0-1 normalised
top: number,
right: number,
bottom: number,
trackingId: string, // short-term BoT-SORT id
persistentId: string, // DINOv2 re-ID, 'p_abcd' format
confidences: Record<string, number>, // class -> confidence
}>
}

The drone-stream tile reads this stream and renders each detection as a labelled bounding box. Color coding defaults:

ClassColor
personred
vehicleorange
animalcyan
othergreen

You can override to single-colour mode in the drone settings drawer → AI tab — Auto colouring vs Single colouring (with a custom picker).

Persistent tracking IDs

The persistentId field is what makes YOLO11 detections useful beyond a single frame:

  • Tracking ID (trackingId) resets every time an object is lost (occlusion, leaving frame).
  • Persistent ID (persistentId, shape p_xxxx) uses DINOv2 feature embeddings + FAISS approximate-nearest-neighbour search to re-identify the same object when it re-appears — so a person walking behind a tree and back out keeps the same p_xxxx ID.

The drone-stream tile’s overlay displays persistent IDs on the bounding box tooltip when available — this lets operators correlate “that’s the same person I tagged two minutes ago.”

What the webapp does NOT expose

  • Per-class filter UI — you can’t say “only show vehicles” in the UI. The agent publishes all classes; the browser renders all classes. Planned.
  • Confidence threshold beyond the per-drone slider — the drawer’s 0-1 slider is the only threshold control. The agent’s own pre-filter isn’t exposed.
  • Model selection — the agent deploys with a fixed checkpoint (YOLO11-L default, YOLO11-SAR for SAR-tuned orgs). Changing checkpoint requires agent redeployment.

Cost / metering

The agent runs on GPU instances (NVIDIA L4-class). Inference minutes are metered per operation and visible under Admin → Organisation → Usage. When the mission completes, the agent releases — no standing GPU cost.