Voice
Nimbus supports local voice input and output: speak a query, get a spoken response. The speech-to-text engine is Whisper.cpp running in a child process on your machine; text-to-speech uses the OS-native synthesiser on each platform. No audio leaves your device.
Privacy invariant
Section titled “Privacy invariant”Prerequisites
Section titled “Prerequisites”Whisper binary. Install
whisper.cpp and ensure
whisper-cli (or the older build name main) is on your PATH. Alternatively,
set whisper_path in the [voice] section of nimbus.toml to an absolute
path.
The binary resolution order is:
whisper_pathinnimbus.tomlNIMBUS_WHISPER_PATHenvironment variablewhisper-clionPATHmainonPATH(older Whisper.cpp build name)
TTS engine. Required per OS:
| Platform | Engine | Install |
|---|---|---|
| macOS | say | Built in — no action needed |
| Windows | PowerShell SAPI (System.Speech.Synthesis.SpeechSynthesizer) | Built in — no action needed |
| Linux | espeak-ng (preferred) or spd-say | sudo apt install espeak-ng / sudo dnf install espeak-ng |
Wake-word loop (optional). The wake-word detector records audio chunks
via ffmpeg. Install ffmpeg and ensure it is on your PATH if you plan to
use the always-on wake-word mode.
Enable voice
Section titled “Enable voice”Voice is off by default (enabled = false). Enable it in
Tauri Settings → Voice, or edit nimbus.toml directly and restart the
Gateway:
[voice]enabled = truewake_word = "hey nimbus"whisper_model = "base.en"
# Uncomment if whisper-cli is not on PATH:# whisper_path = "/usr/local/bin/whisper-cli"
# Optional: separate, lighter model for the always-on wake-word loop:# wake_word_whisper_model = "tiny.en"
# Optional: Piper TTS for higher-quality speech output:# piper_path = "/usr/local/bin/piper"# piper_model = "/usr/local/share/piper/en_US-amy-medium.onnx"After saving, run nimbus stop && nimbus start to apply the change.
Push-to-talk
Section titled “Push-to-talk”Hold the configured global hotkey, speak your query, then release. While you hold the key:
- The Gateway opens a temporary WAV file and records from the default audio
input device via
ffmpeg. - On key release, recording stops and Whisper transcribes the file.
- The transcribed text is sent to the agent as a normal query.
- The agent’s response streams back as text and is also spoken aloud via the native TTS engine.
- The temporary WAV file is deleted.
The Tauri UI and CLI both emit a voice.microphoneActive notification when the
microphone opens and closes, so any status bar or indicator can reflect the
current state.
Wake word
Section titled “Wake word”When wake_word is set in nimbus.toml, a background detector loop runs
continuously while voice is enabled. The loop:
- Records a 2-second audio chunk from the default input device.
- Runs a silence check via
ffmpeg’ssilencedetectfilter — silent chunks skip Whisper entirely to keep CPU usage low. - If the chunk is not silent, Whisper (using
wake_word_whisper_model, defaulttiny.en) transcribes it. - If the transcript contains the wake word (case-insensitive substring match), the detector fires. The loop then pauses for 3 seconds to prevent double-firing, then resumes.
The default wake phrase is "hey nimbus". You can change it:
[voice]enabled = truewake_word = "okay nimbus"wake_word_whisper_model = "tiny.en" # keep lightweight for always-on useWhen a push-to-talk transcription starts, the wake-word loop is automatically paused so the two audio loops cannot contend for the input device. It resumes when transcription finishes.
Higher-quality TTS with Piper
Section titled “Higher-quality TTS with Piper”Piper is an optional, locally-run neural TTS engine that produces more natural speech than the OS-native synthesisers. To enable it:
-
Download the Piper binary and a voice model (
.onnxfile) from the Piper releases page. -
Set
piper_pathandpiper_modelinnimbus.toml:[voice]enabled = truepiper_path = "/usr/local/bin/piper"piper_model = "/usr/local/share/piper/en_US-amy-medium.onnx"
When both are configured, the Gateway uses Piper for TTS instead of the native
synthesiser. Piper pipes raw audio to afplay (macOS), aplay (Linux), or
PowerShell SoundPlayer (Windows).
Per-OS notes
Section titled “Per-OS notes”On first microphone access, macOS displays a system permission dialog. Grant
access via System Settings → Privacy & Security → Microphone → Nimbus.
The say command is used for TTS and requires no extra permission.
Windows
Section titled “Windows”Windows may prompt for microphone access under Settings → Privacy &
security → Microphone → Allow apps to access your microphone. The Gateway
runs as a desktop application, so the prompt appears once and the setting
persists. TTS uses PowerShell’s built-in System.Speech.Synthesis.SpeechSynthesizer
— no additional install required.
The wake-word loop and push-to-talk both record via ffmpeg using the alsa
input device (audio=default). Ensure:
pulseaudioorpipewireis running.- Your user is in the
audiogroup:sudo usermod -aG audio $USER(re-log for the group to take effect). ffmpegis installed:sudo apt install ffmpeg/sudo dnf install ffmpeg.
If espeak-ng is not found on PATH, the Gateway falls back to spd-say.
Install at least one: sudo apt install espeak-ng.
Latency notes
Section titled “Latency notes”Latency is bounded by Whisper inference speed on your CPU or GPU. As a rough guide on a modern laptop CPU (no GPU acceleration):
| Model | Size | Typical 5 s clip |
|---|---|---|
tiny.en | ~39 MB | ~50–100 ms |
base.en | ~74 MB | ~100–300 ms |
small.en | ~244 MB | ~400–900 ms |
medium.en | ~769 MB | ~1.5–3 s |
Use tiny.en for the wake-word loop (wake_word_whisper_model) and
base.en or small.en for full transcription (whisper_model). Larger
models improve accuracy for accented speech and technical vocabulary.
Disable voice
Section titled “Disable voice”Toggle voice off in Tauri Settings → Voice, or set enabled = false in
nimbus.toml and restart the Gateway. When disabled, the wake-word loop stops
immediately and the microphone is released. Push-to-talk hotkey handling is
also suspended.