Skip to content

Voice commands

ARGUS includes a voice-command detector that watches every finalised PTT transcription for specific phrase patterns. When a match is found, the corresponding command fires immediately — drop a flag, trigger RTH, raise the master caution, mark a spotted target.

Unlike the speech-recognition stub in many apps, this is really implemented — the service lives in voice-command-detector.service.ts and is wired into the transcription stream produced during any PTT burst while the mission has transcription enabled.

Turning it on

  1. Mission must have Transcription enabled in its feature set (see the create-mission Features step). Without transcription, there’s no text stream to match against.
  2. In the operation console, open the settings drawer and turn on Voice Commands. Four sub-toggles let you enable/disable individual emergency and notification responses.

Once on, every PTT release produces a transcription that flows through the detector; any recognised phrase becomes a DetectedVoiceCommand event.

The 17 commands

Phrases match as case-insensitive substring matches against the full transcription text. You don’t need to say only the phrase — saying “okay team, launch drones to sector 4” matches LAUNCH_DRONES via the “launch drones” substring.

Mission control

CommandPhrases that trigger it
LAUNCH_DRONES”launch drone”, “launch drones”, “deploy drone”, “deploy drones”
RETURN_TO_HOME”return to home”, “return home”, “come back”, “rth”
LAND_DRONE”land the drone”, “land drone”, “initiate landing”
CANCEL_MISSION”cancel mission”, “abort mission”, “cancel operation”
HOLD_POSITION”hold position”, “hover”, “stay there”
SEARCH_AREA”search area”, “search this area”, “begin search”, “start search”

Camera / media

CommandPhrases
TAKE_PICTURE”take picture”, “take a picture”, “take photo”, “capture image”
START_RECORDING”start recording”, “begin recording”, “start video”
STOP_RECORDING”stop recording”, “end recording”, “stop video”
ZOOM_IN”zoom in”
ZOOM_OUT”zoom out”

Situational reporting

CommandPhrases
PERSON_SPOTTED”person spotted”, “i see someone”, “person detected”, “found someone”, “target spotted”
ANIMAL_SPOTTED”animal spotted”, “animal detected”, “i see an animal”
VEHICLE_SPOTTED”vehicle spotted”, “vehicle detected”, “car spotted”
DROP_FLAG”drop flag”, “mark location”, “drop marker”, “mark this”, “flag this”
ALL_CLEAR”all clear”, “area clear”, “sector clear”

Emergency (multilingual)

The EMERGENCY command is the broadest — 43 phrase variants across English, Spanish, and Arabic transliteration. Every single one triggers the same event:

  • English: “emergency”, “mayday”, “sos”, “help needed”, “help me”, “distress”, “man down”, “officer down”, “under fire”, “being attacked”, “require assistance”, “need backup”, “need help”, “send help”, …
  • Spanish: “emergencia”, “emergencia sos”, “ayuda”, “auxilio”, “socorro”, …
  • Arabic (transliterated): “musaada”, “tawari”, “najda”, “istighatha”, …

Because the list is substring-matched, the detector is very forgiving — a shouted single word (“mayday!”) inside a burst of panicked speech still fires.

What fires on each command

Each command emits a DetectedVoiceCommand event with:

  • command — the canonical name (e.g. RETURN_TO_HOME).
  • label — human-readable label for UI display.
  • transcription — the full transcription text the match was found in.
  • peerId — the peer that produced the transcription.
  • commsId — the comms channel ID the burst belonged to.

That event is dispatched to:

  1. The PTT transcription bubble — shows a voice-command badge beside the transcription (icon record_voice_over, label = command name).
  2. The mission timeline — an entry voice.command with the full payload.
  3. Command-specific handlers (only if the relevant sub-toggle is on in the settings drawer):
    • EMERGENCY → master-caution critical alert + optional gimbal-aim + optional distress-beacon overlay on the map.
    • DROP_FLAG → opens the flag picker at the speaker’s mapped location (if known).
    • PERSON_SPOTTED / ANIMAL_SPOTTED / VEHICLE_SPOTTED → auto-flag with the matching type.
    • LAUNCH_DRONES / RETURN_TO_HOME / LAND_DRONE / HOLD_POSITION / CANCEL_MISSION — commander-scoped; non-commanders see a toast “command heard” but nothing fires without the commander bit.

Reliability + latency

  • Latency: ~500 ms polling of live transcription + detection on every finalised burst. You’ll typically see the badge within a second of releasing PTT.
  • Accuracy: depends entirely on transcription quality. Clear speech + headset mic ≈ 95 %. Radio-quality audio + background noise ≈ 70-80 %.
  • False positives: substring matching means “don’t mark this” contains “mark this” → would match DROP_FLAG. The emergency phrase list is the most prone to false positives; keep it disabled in training scenarios where someone might say “mayday” as an example.

Voice aliases (not yet configurable in-app)

The command list is hard-coded in voice-command-detector.service.ts. There’s no user-facing “voice aliases” page; adding a new phrase or language requires a code change + redeploy. This is deliberate for v1 — free-form custom phrases increased false-positive rates in early tests — but a curated “add phrase” UI is planned.

If you urgently need a custom phrase for a field op, email support with the phrase + the command it should map to and we can ship a hotfix.

What voice commands can’t do

  • No continuous listening — the detector only runs on transcribed PTT bursts. Whispering into your mic without holding PTT doesn’t match anything.
  • No chained commands — “RTH and drop flag” will match RTH OR DROP_FLAG (the first substring found), not both. Hit PTT twice for two commands.
  • No speaker identification — the detector doesn’t know who said what beyond the peer ID. If two operators say “RTH” at the same time, you’ll get two events.

Privacy

  • Transcription runs in the cloud agent (see transcription).
  • The detector itself is purely client-side — no audio or transcription text is sent anywhere by the detection step.
  • Emitted events go to the mission timeline (persisted) and the transcription bubble (ephemeral).
  • If your org has auto-redact PII enabled on transcription, voice commands still fire but the transcription shown in the bubble has redactions applied.