Voice

Nimbus supports local voice input and output: speak a query, get a spoken response. The speech-to-text engine is Whisper.cpp running in a child process on your machine; text-to-speech uses the OS-native synthesiser on each platform. No audio leaves your device.

Privacy invariant

Prerequisites

Whisper binary. Install whisper.cpp and ensure whisper-cli (or the older build name main) is on your PATH. Alternatively, set whisper_path in the [voice] section of nimbus.toml to an absolute path.

The binary resolution order is:

whisper_path in nimbus.toml
NIMBUS_WHISPER_PATH environment variable
whisper-cli on PATH
main on PATH (older Whisper.cpp build name)

TTS engine. Required per OS:

Platform	Engine	Install
macOS	`say`	Built in — no action needed
Windows	PowerShell SAPI (`System.Speech.Synthesis.SpeechSynthesizer`)	Built in — no action needed
Linux	`espeak-ng` (preferred) or `spd-say`	`sudo apt install espeak-ng` / `sudo dnf install espeak-ng`

Wake-word loop (optional). The wake-word detector records audio chunks via ffmpeg. Install ffmpeg and ensure it is on your PATH if you plan to use the always-on wake-word mode.

Enable voice

Voice is off by default (enabled = false). Enable it in Tauri Settings → Voice, or edit nimbus.toml directly and restart the Gateway:

[voice]
enabled = true
wake_word = "hey nimbus"
whisper_model = "base.en"

# Uncomment if whisper-cli is not on PATH:
# whisper_path = "/usr/local/bin/whisper-cli"

# Optional: separate, lighter model for the always-on wake-word loop:
# wake_word_whisper_model = "tiny.en"

# Optional: Piper TTS for higher-quality speech output:
# piper_path = "/usr/local/bin/piper"
# piper_model = "/usr/local/share/piper/en_US-amy-medium.onnx"

After saving, run nimbus stop && nimbus start to apply the change.

Push-to-talk

Hold the configured global hotkey, speak your query, then release. While you hold the key:

The Gateway opens a temporary WAV file and records from the default audio input device via ffmpeg.
On key release, recording stops and Whisper transcribes the file.
The transcribed text is sent to the agent as a normal query.
The agent’s response streams back as text and is also spoken aloud via the native TTS engine.
The temporary WAV file is deleted.

The Tauri UI and CLI both emit a voice.microphoneActive notification when the microphone opens and closes, so any status bar or indicator can reflect the current state.

Wake word

When wake_word is set in nimbus.toml, a background detector loop runs continuously while voice is enabled. The loop:

Records a 2-second audio chunk from the default input device.
Runs a silence check via ffmpeg’s silencedetect filter — silent chunks skip Whisper entirely to keep CPU usage low.
If the chunk is not silent, Whisper (using wake_word_whisper_model, default tiny.en) transcribes it.
If the transcript contains the wake word (case-insensitive substring match), the detector fires. The loop then pauses for 3 seconds to prevent double-firing, then resumes.

The default wake phrase is "hey nimbus". You can change it:

[voice]
enabled = true
wake_word = "okay nimbus"
wake_word_whisper_model = "tiny.en"   # keep lightweight for always-on use

When a push-to-talk transcription starts, the wake-word loop is automatically paused so the two audio loops cannot contend for the input device. It resumes when transcription finishes.

Higher-quality TTS with Piper

Piper is an optional, locally-run neural TTS engine that produces more natural speech than the OS-native synthesisers. To enable it:

Download the Piper binary and a voice model (.onnx file) from the Piper releases page.

Set piper_path and piper_model in nimbus.toml:

[voice]
enabled = true
piper_path = "/usr/local/bin/piper"
piper_model = "/usr/local/share/piper/en_US-amy-medium.onnx"

When both are configured, the Gateway uses Piper for TTS instead of the native synthesiser. Piper pipes raw audio to afplay (macOS), aplay (Linux), or PowerShell SoundPlayer (Windows).

Per-OS notes

macOS

On first microphone access, macOS displays a system permission dialog. Grant access via System Settings → Privacy & Security → Microphone → Nimbus. The say command is used for TTS and requires no extra permission.

Windows

Windows may prompt for microphone access under Settings → Privacy & security → Microphone → Allow apps to access your microphone. The Gateway runs as a desktop application, so the prompt appears once and the setting persists. TTS uses PowerShell’s built-in System.Speech.Synthesis.SpeechSynthesizer — no additional install required.

Linux

The wake-word loop and push-to-talk both record via ffmpeg using the alsa input device (audio=default). Ensure:

pulseaudio or pipewire is running.
Your user is in the audio group: sudo usermod -aG audio $USER (re-log for the group to take effect).
ffmpeg is installed: sudo apt install ffmpeg / sudo dnf install ffmpeg.

If espeak-ng is not found on PATH, the Gateway falls back to spd-say. Install at least one: sudo apt install espeak-ng.

Latency notes

Latency is bounded by Whisper inference speed on your CPU or GPU. As a rough guide on a modern laptop CPU (no GPU acceleration):

Model	Size	Typical 5 s clip
`tiny.en`	~39 MB	~50–100 ms
`base.en`	~74 MB	~100–300 ms
`small.en`	~244 MB	~400–900 ms
`medium.en`	~769 MB	~1.5–3 s

Use tiny.en for the wake-word loop (wake_word_whisper_model) and base.en or small.en for full transcription (whisper_model). Larger models improve accuracy for accented speech and technical vocabulary.

Disable voice

Toggle voice off in Tauri Settings → Voice, or set enabled = false in nimbus.toml and restart the Gateway. When disabled, the wake-word loop stops immediately and the microphone is released. Push-to-talk hotkey handling is also suspended.

Where to next

Your first query Send your first text query to the Nimbus agent.

HITL & safety How the consent gate works and which actions require approval.

Profiles Separate work and personal voice settings with named profiles.