Skip to content

Voice

Nimbus supports local voice input and output: speak a query, get a spoken response. The speech-to-text engine is Whisper.cpp running in a child process on your machine; text-to-speech uses the OS-native synthesiser on each platform. No audio leaves your device.



Whisper binary. Install whisper.cpp and ensure whisper-cli (or the older build name main) is on your PATH. Alternatively, set whisper_path in the [voice] section of nimbus.toml to an absolute path.

The binary resolution order is:

  1. whisper_path in nimbus.toml
  2. NIMBUS_WHISPER_PATH environment variable
  3. whisper-cli on PATH
  4. main on PATH (older Whisper.cpp build name)

TTS engine. Required per OS:

PlatformEngineInstall
macOSsayBuilt in — no action needed
WindowsPowerShell SAPI (System.Speech.Synthesis.SpeechSynthesizer)Built in — no action needed
Linuxespeak-ng (preferred) or spd-saysudo apt install espeak-ng / sudo dnf install espeak-ng

Wake-word loop (optional). The wake-word detector records audio chunks via ffmpeg. Install ffmpeg and ensure it is on your PATH if you plan to use the always-on wake-word mode.


Voice is off by default (enabled = false). Enable it in Tauri Settings → Voice, or edit nimbus.toml directly and restart the Gateway:

[voice]
enabled = true
wake_word = "hey nimbus"
whisper_model = "base.en"
# Uncomment if whisper-cli is not on PATH:
# whisper_path = "/usr/local/bin/whisper-cli"
# Optional: separate, lighter model for the always-on wake-word loop:
# wake_word_whisper_model = "tiny.en"
# Optional: Piper TTS for higher-quality speech output:
# piper_path = "/usr/local/bin/piper"
# piper_model = "/usr/local/share/piper/en_US-amy-medium.onnx"

After saving, run nimbus stop && nimbus start to apply the change.


Hold the configured global hotkey, speak your query, then release. While you hold the key:

  1. The Gateway opens a temporary WAV file and records from the default audio input device via ffmpeg.
  2. On key release, recording stops and Whisper transcribes the file.
  3. The transcribed text is sent to the agent as a normal query.
  4. The agent’s response streams back as text and is also spoken aloud via the native TTS engine.
  5. The temporary WAV file is deleted.

The Tauri UI and CLI both emit a voice.microphoneActive notification when the microphone opens and closes, so any status bar or indicator can reflect the current state.


When wake_word is set in nimbus.toml, a background detector loop runs continuously while voice is enabled. The loop:

  1. Records a 2-second audio chunk from the default input device.
  2. Runs a silence check via ffmpeg’s silencedetect filter — silent chunks skip Whisper entirely to keep CPU usage low.
  3. If the chunk is not silent, Whisper (using wake_word_whisper_model, default tiny.en) transcribes it.
  4. If the transcript contains the wake word (case-insensitive substring match), the detector fires. The loop then pauses for 3 seconds to prevent double-firing, then resumes.

The default wake phrase is "hey nimbus". You can change it:

[voice]
enabled = true
wake_word = "okay nimbus"
wake_word_whisper_model = "tiny.en" # keep lightweight for always-on use

When a push-to-talk transcription starts, the wake-word loop is automatically paused so the two audio loops cannot contend for the input device. It resumes when transcription finishes.


Piper is an optional, locally-run neural TTS engine that produces more natural speech than the OS-native synthesisers. To enable it:

  1. Download the Piper binary and a voice model (.onnx file) from the Piper releases page.

  2. Set piper_path and piper_model in nimbus.toml:

    [voice]
    enabled = true
    piper_path = "/usr/local/bin/piper"
    piper_model = "/usr/local/share/piper/en_US-amy-medium.onnx"

When both are configured, the Gateway uses Piper for TTS instead of the native synthesiser. Piper pipes raw audio to afplay (macOS), aplay (Linux), or PowerShell SoundPlayer (Windows).


On first microphone access, macOS displays a system permission dialog. Grant access via System Settings → Privacy & Security → Microphone → Nimbus. The say command is used for TTS and requires no extra permission.

Windows may prompt for microphone access under Settings → Privacy & security → Microphone → Allow apps to access your microphone. The Gateway runs as a desktop application, so the prompt appears once and the setting persists. TTS uses PowerShell’s built-in System.Speech.Synthesis.SpeechSynthesizer — no additional install required.

The wake-word loop and push-to-talk both record via ffmpeg using the alsa input device (audio=default). Ensure:

  • pulseaudio or pipewire is running.
  • Your user is in the audio group: sudo usermod -aG audio $USER (re-log for the group to take effect).
  • ffmpeg is installed: sudo apt install ffmpeg / sudo dnf install ffmpeg.

If espeak-ng is not found on PATH, the Gateway falls back to spd-say. Install at least one: sudo apt install espeak-ng.


Latency is bounded by Whisper inference speed on your CPU or GPU. As a rough guide on a modern laptop CPU (no GPU acceleration):

ModelSizeTypical 5 s clip
tiny.en~39 MB~50–100 ms
base.en~74 MB~100–300 ms
small.en~244 MB~400–900 ms
medium.en~769 MB~1.5–3 s

Use tiny.en for the wake-word loop (wake_word_whisper_model) and base.en or small.en for full transcription (whisper_model). Larger models improve accuracy for accented speech and technical vocabulary.


Toggle voice off in Tauri Settings → Voice, or set enabled = false in nimbus.toml and restart the Gateway. When disabled, the wake-word loop stops immediately and the microphone is released. Push-to-talk hotkey handling is also suspended.