A lightweight voice assistant that acts as a channel for OpenClaw. No local LLM - just:
┌─────────────────────────────────────────────────────────────────────────────┐
│ OpenClaw Hybrid Assistant │
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ INPUT PIPELINE │ │
│ │ │ │
│ │ Microphone → Wake Word → VAD → ASR/STT → WebSocket → OpenClaw │ │
│ │ (ALSA) (openWW) (Silero) (Parakeet) (Channel) │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────────────────────┐ │
│ │ OUTPUT PIPELINE │ │
│ │ │ │
│ │ OpenClaw → WebSocket → TTS/Piper → Speaker │ │
│ │ (any channel) (22050Hz) (ALSA) │ │
│ └──────────────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────────────────┘
openclaw-hybrid-assistant/
├── src/
│ ├── main.cpp # Entry point, CLI parsing, event loop
│ ├── audio/ # Audio I/O (ALSA)
│ │ ├── audio_capture.h/cpp # Microphone input (16kHz, 16-bit PCM, mono)
│ │ ├── audio_playback.h/cpp # Speaker output (cancellable, multi-rate)
│ │ └── waiting_chime.h/cpp # Earcon feedback while waiting for response
│ ├── pipeline/ # Voice processing chain
│ │ ├── voice_pipeline.h/cpp # Wake Word → VAD → STT → TTS orchestrator
│ │ └── tts_queue.h/cpp # Producer/consumer streaming TTS playback
│ ├── network/ # Network communication
│ │ └── openclaw_client.h/cpp # Raw WebSocket client (RFC 6455)
│ └── config/ # Configuration
│ └── model_config.h # Model paths, IDs, availability checks
├── tests/
│ ├── test_components.cpp # Component tests (wake word, VAD, STT)
│ ├── test_integration.cpp # E2E tests (fake WS server, sanitization, TTS)
│ ├── audio/ # Generated test WAV files
│ └── scripts/ # Test audio generation scripts
├── scripts/
│ ├── download-models.sh # Model download (VAD, ASR, TTS, wake word)
│ ├── openclaw-voice.service # systemd service unit
│ └── test-on-mac.sh # Mac testing via Docker/Lima
├── CMakeLists.txt # Build configuration (3 targets)
├── Dockerfile # Docker build + test environment
├── build.sh # End-to-end build script
└── README.md
| Feature | linux-voice-assistant | openclaw-hybrid-assistant |
|---|---|---|
| Wake Word | ✅ | ✅ |
| VAD | ✅ | ✅ |
| ASR/STT | ✅ Local Whisper | ✅ Parakeet TDT-CTC 110M (NeMo CTC, int8) |
| LLM | ✅ Local or Moltbot | ❌ None - uses OpenClaw |
| TTS | ✅ Local Piper (22kHz) | ✅ Piper Lessac Medium (22050Hz) |
| Integration | HTTP Voice Bridge | WebSocket to OpenClaw |
--whisper download flag--kokoro download flag (11 speakers, 24kHz, ~330MB)src/audio/audio_capture)src/audio/audio_playback)snd_pcm_drop()src/pipeline/tts_queue)src/network/openclaw_client)src/audio/waiting_chime)Plays a brief, pleasant earcon sound while waiting for OpenClaw to process the user’s request:
sox pluck synthesis (sounds like a real glockenspiel chime)Generated automatically by ./scripts/download-models.sh (requires sox).
ws://openclaw-host:8082
Connect:
{
"type": "connect",
"deviceId": "pi-living-room",
"accountId": "default",
"capabilities": {
"stt": true,
"tts": true,
"wakeWord": true
}
}
Transcription (after ASR):
{
"type": "transcription",
"text": "What's the weather like?",
"sessionId": "main",
"isFinal": true
}
Speak (for TTS):
{
"type": "speak",
"text": "The weather is sunny.",
"sourceChannel": "telegram",
"priority": 1,
"interrupt": false
}
./build.sh
# Basic (connects to localhost:8082)
./build/openclaw-assistant
# With wake word enabled
./build/openclaw-assistant --wakeword
# Connect to remote OpenClaw
./build/openclaw-assistant --wakeword --openclaw-url ws://192.168.1.100:8082
# Run all tests
./build/test-components --run-all
# Test wake word detection with audio file
./build/test-components --test-wakeword tests/audio/hey-jarvis.wav
# Test that audio does NOT trigger wake word
./build/test-components --test-no-wakeword tests/audio/noise.wav
# Test VAD and STT
./build/test-components --test-vad tests/audio/speech.wav
./build/test-components --test-stt tests/audio/speech.wav
# Test full pipeline
./build/test-components --test-pipeline tests/audio/wakeword-plus-speech.wav
| Option | Description | Default |
|---|---|---|
--wakeword |
Enable wake word detection | Off |
--wakeword-threshold |
Detection threshold (0.0-1.0) | 0.5 |
--openclaw-url |
OpenClaw WebSocket URL | ws://localhost:8082 |
--device-id |
Device identifier | hostname |
--input |
ALSA input device | “default” |
--output |
ALSA output device | “default” |
--list-devices |
List audio devices | - |
--help |
Show help | - |
| Model | Size | Location |
|---|---|---|
| Silero VAD | ~2 MB | ~/.local/share/runanywhere/Models/ONNX/silero-vad/ |
| Parakeet TDT-CTC 110M EN (int8) | ~126 MB | ~/.local/share/runanywhere/Models/ONNX/parakeet-tdt-ctc-110m-en-int8/ |
| Piper Lessac Medium TTS | ~61 MB | ~/.local/share/runanywhere/Models/ONNX/vits-piper-en_US-lessac-medium/ |
| Hey Jarvis | ~1.3 MB | ~/.local/share/runanywhere/Models/ONNX/hey-jarvis/ |
| openWakeWord Embedding | ~1.3 MB | ~/.local/share/runanywhere/Models/ONNX/openwakeword-embedding/ |
| openWakeWord Melspectrogram | ~1.1 MB | ~/.local/share/runanywhere/Models/ONNX/openwakeword-embedding/ |
Alternative models (via download flags):
| Whisper Tiny EN (--whisper) | ~150 MB | ~/.local/share/runanywhere/Models/ONNX/whisper-tiny-en/ |
| Kokoro TTS v0.19 (--kokoro) | ~330 MB | ~/.local/share/runanywhere/Models/ONNX/kokoro-en-v0_19/ |
The openWakeWord .onnx model files are stored with Git LFS in the upstream repository.
Downloading them via raw.githubusercontent.com URLs will give you an HTML page instead
of the actual model binary, which causes ONNX runtime errors at load time.
Always download wake word models from GitHub Releases:
https://github.com/dscripka/openWakeWord/releases/download/v0.5.1/embedding_model.onnxhttps://github.com/dscripka/openWakeWord/releases/download/v0.5.1/melspectrogram.onnxhttps://github.com/dscripka/openWakeWord/releases/download/v0.5.1/hey_jarvis_v0.1.onnxThe scripts/download-models.sh --wakeword script already uses the correct URLs.
To verify your downloaded models are valid ONNX files (not HTML):
file ~/.local/share/runanywhere/Models/ONNX/openwakeword-embedding/embedding_model.onnx
# Expected: "data" (binary ONNX file)
# Bad: "HTML document" (Git LFS redirect page)
cd /path/to/runanywhere-sdks/sdk/runanywhere-commons
./scripts/build-linux.sh --shared
This builds librac_backend_onnx.so and other shared libraries that the hybrid assistant links against. You must rebuild this whenever the SDK’s C++ backends change (e.g., wake word fixes).
cd /path/to/runanywhere-sdks/Playground/openclaw-hybrid-assistant
# Download all models (Parakeet ASR + Piper TTS + VAD + wake word)
./scripts/download-models.sh --wakeword
# Or use alternative models:
./scripts/download-models.sh --wakeword --whisper # Use Whisper for ASR instead of Parakeet
./scripts/download-models.sh --wakeword --kokoro # Use Kokoro TTS instead of Piper
./build.sh
The OpenClaw gateway must be running with the voice-assistant channel enabled on port 8082. Verify with:
ss -tlnp | grep 8082
By default, voice input routes to the same agent as Telegram/WhatsApp, which may produce responses with emojis and markdown that aren’t suitable for TTS. To get clean, conversational voice responses:
~/.openclaw/openclaw.jsonAdd the list array under agents and a new bindings array:
{
"agents": {
"defaults": { ... },
"list": [
{
"id": "main",
"default": true
},
{
"id": "voice-agent",
"workspace": "/home/runanywhere/.openclaw/voice-workspace"
}
]
},
"bindings": [
{
"agentId": "voice-agent",
"match": {
"channel": "voice-assistant",
"accountId": "*"
}
}
],
...
}
Create the voice workspace directory and SOUL.md:
mkdir -p ~/.openclaw/voice-workspace
Create ~/.openclaw/voice-workspace/SOUL.md:
# SOUL.md - OpenClawPi Voice Assistant
You are OpenClawPi, a voice assistant running on a Raspberry Pi. Everything you say will be spoken aloud through text-to-speech.
## Voice Output Rules (CRITICAL)
Since your responses are spoken, not read:
1. **NO emojis** - TTS cannot pronounce them
2. **NO special Unicode characters** - no arrows, bullets, checkmarks, etc.
3. **NO markdown formatting** - no asterisks, underscores, backticks, or headers
4. **NO URLs** - say "check the website" not the actual URL
5. **Spell out symbols** - say "55 degrees Fahrenheit" not "55 degrees F"
6. **Use natural punctuation** - periods and commas create natural pauses
## Conversation Style
- Be concise - TTS playback takes time
- Use conversational language, as if speaking to someone in person
- Avoid lists when possible - use flowing sentences instead
- For multiple items, use "first... second... and finally..." patterns
- Round numbers for easier listening ("about fifty" not "49.7")
## Personality
You're helpful, warm, and efficient. Skip filler phrases like "Great question!" - just answer directly.
## Example Response Transformation
Bad (text-style): "San Francisco Weather: - Right now: Rain, 55°F 🌧️"
Good (voice-style): "Right now in San Francisco it's raining at 55 degrees."
| Input Source | Routes To | SOUL.md Used | Output Style |
|---|---|---|---|
| Voice microphone | voice-agent |
~/.openclaw/voice-workspace/SOUL.md |
Conversational, no emojis |
| Telegram | main (default) |
~/.openclaw/workspace/SOUL.md |
Rich text, emojis OK |
| Telegram → Speaker | main → sanitizeForTTS() |
N/A (safety net) | Stripped markdown/emojis |
The binding ensures voice input gets voice-optimized responses. The sanitizeForTTS() function in OpenClaw provides a safety net for cross-channel broadcasts.
# With wake word ("Hey Jarvis")
./build/openclaw-assistant --wakeword
# Without wake word (continuous listening)
./build/openclaw-assistant
To run the assistant as a background service that starts on boot, create a systemd user service and enable it. See Viewing Logs below for how to monitor it.
If running in the foreground, logs print to stdout. If running as a background process or systemd service:
# If started via systemd
journalctl --user -u openclaw-assistant -f
# If started as a background process with output redirected
tail -f /path/to/openclaw-assistant.log
The OpenClaw gateway runs as a systemd user service:
# Follow logs in real time
journalctl --user -u openclaw-gateway -f
# View last 100 lines
journalctl --user -u openclaw-gateway -n 100
# View logs since last boot
journalctl --user -u openclaw-gateway -b
Open two terminals (or tmux panes):
# Terminal 1: OpenClaw Gateway
journalctl --user -u openclaw-gateway -f
# Terminal 2: Hybrid Assistant
journalctl --user -u openclaw-assistant -f
# (or tail -f on the output file if not using systemd)
Since this is a Linux application using ALSA, you can test on Mac using:
# Build Docker image (from sdks root directory)
cd /path/to/sdks
docker build -t openclaw-assistant -f Playground/openclaw-hybrid-assistant/Dockerfile .
# Run all tests
docker run --rm openclaw-assistant ./build/test-components --run-all
# Run extensive test suite
docker run --rm openclaw-assistant ./tests/scripts/extensive-test.sh
# Install Lima
brew install lima
# Start Ubuntu VM
limactl start --name=ubuntu template://ubuntu
# SSH and build
limactl shell ubuntu
cd /path/to/openclaw-hybrid-assistant
./build.sh
--wakeword-threshold 0.3arecord -larecord -d 5 test.wav && aplay test.wavcurl http://localhost:18789/healthMIT