runanywhere-sdks

OpenClaw Hybrid Assistant

A lightweight voice assistant that acts as a channel for OpenClaw. No local LLM - just:

Architecture

┌─────────────────────────────────────────────────────────────────────────────┐
│                        OpenClaw Hybrid Assistant                             │
│                                                                              │
│  ┌──────────────────────────────────────────────────────────────────────┐   │
│  │                         INPUT PIPELINE                                │   │
│  │                                                                       │   │
│  │   Microphone → Wake Word → VAD → ASR/STT  → WebSocket → OpenClaw    │   │
│  │    (ALSA)     (openWW)  (Silero) (Parakeet)            (Channel)    │   │
│  └──────────────────────────────────────────────────────────────────────┘   │
│                                                                              │
│  ┌──────────────────────────────────────────────────────────────────────┐   │
│  │                        OUTPUT PIPELINE                                │   │
│  │                                                                       │   │
│  │   OpenClaw → WebSocket → TTS/Piper → Speaker                        │   │
│  │  (any channel)          (22050Hz)    (ALSA)                          │   │
│  └──────────────────────────────────────────────────────────────────────┘   │
└─────────────────────────────────────────────────────────────────────────────┘

Project Structure

openclaw-hybrid-assistant/
├── src/
│   ├── main.cpp                          # Entry point, CLI parsing, event loop
│   ├── audio/                            # Audio I/O (ALSA)
│   │   ├── audio_capture.h/cpp           #   Microphone input (16kHz, 16-bit PCM, mono)
│   │   ├── audio_playback.h/cpp          #   Speaker output (cancellable, multi-rate)
│   │   └── waiting_chime.h/cpp           #   Earcon feedback while waiting for response
│   ├── pipeline/                         # Voice processing chain
│   │   ├── voice_pipeline.h/cpp          #   Wake Word → VAD → STT → TTS orchestrator
│   │   └── tts_queue.h/cpp              #   Producer/consumer streaming TTS playback
│   ├── network/                          # Network communication
│   │   └── openclaw_client.h/cpp         #   Raw WebSocket client (RFC 6455)
│   └── config/                           # Configuration
│       └── model_config.h                #   Model paths, IDs, availability checks
├── tests/
│   ├── test_components.cpp               # Component tests (wake word, VAD, STT)
│   ├── test_integration.cpp              # E2E tests (fake WS server, sanitization, TTS)
│   ├── audio/                            # Generated test WAV files
│   └── scripts/                          # Test audio generation scripts
├── scripts/
│   ├── download-models.sh                # Model download (VAD, ASR, TTS, wake word)
│   ├── openclaw-voice.service            # systemd service unit
│   └── test-on-mac.sh                    # Mac testing via Docker/Lima
├── CMakeLists.txt                        # Build configuration (3 targets)
├── Dockerfile                            # Docker build + test environment
├── build.sh                              # End-to-end build script
└── README.md

Key Differences from linux-voice-assistant

Feature linux-voice-assistant openclaw-hybrid-assistant
Wake Word
VAD
ASR/STT ✅ Local Whisper ✅ Parakeet TDT-CTC 110M (NeMo CTC, int8)
LLM ✅ Local or Moltbot ❌ None - uses OpenClaw
TTS ✅ Local Piper (22kHz) ✅ Piper Lessac Medium (22050Hz)
Integration HTTP Voice Bridge WebSocket to OpenClaw

Components

1. Wake Word Detector

2. Voice Activity Detection (VAD)

3. Speech-to-Text (ASR)

4. Text-to-Speech (TTS)

5. Audio Capture (src/audio/audio_capture)

6. Audio Playback (src/audio/audio_playback)

7. TTS Queue (src/pipeline/tts_queue)

8. OpenClaw Client (src/network/openclaw_client)

9. Waiting Feedback (src/audio/waiting_chime)

Plays a brief, pleasant earcon sound while waiting for OpenClaw to process the user’s request:

Generated automatically by ./scripts/download-models.sh (requires sox).

10. Barge-in Support

OpenClaw WebSocket Protocol

Connection

ws://openclaw-host:8082

Messages: Assistant → OpenClaw

Connect:

{
  "type": "connect",
  "deviceId": "pi-living-room",
  "accountId": "default",
  "capabilities": {
    "stt": true,
    "tts": true,
    "wakeWord": true
  }
}

Transcription (after ASR):

{
  "type": "transcription",
  "text": "What's the weather like?",
  "sessionId": "main",
  "isFinal": true
}

Messages: OpenClaw → Assistant

Speak (for TTS):

{
  "type": "speak",
  "text": "The weather is sunny.",
  "sourceChannel": "telegram",
  "priority": 1,
  "interrupt": false
}

Quick Start

Prerequisites

Build

./build.sh

Run

# Basic (connects to localhost:8082)
./build/openclaw-assistant

# With wake word enabled
./build/openclaw-assistant --wakeword

# Connect to remote OpenClaw
./build/openclaw-assistant --wakeword --openclaw-url ws://192.168.1.100:8082

Test Components

# Run all tests
./build/test-components --run-all

# Test wake word detection with audio file
./build/test-components --test-wakeword tests/audio/hey-jarvis.wav

# Test that audio does NOT trigger wake word
./build/test-components --test-no-wakeword tests/audio/noise.wav

# Test VAD and STT
./build/test-components --test-vad tests/audio/speech.wav
./build/test-components --test-stt tests/audio/speech.wav

# Test full pipeline
./build/test-components --test-pipeline tests/audio/wakeword-plus-speech.wav

Configuration

Command Line Options

Option Description Default
--wakeword Enable wake word detection Off
--wakeword-threshold Detection threshold (0.0-1.0) 0.5
--openclaw-url OpenClaw WebSocket URL ws://localhost:8082
--device-id Device identifier hostname
--input ALSA input device “default”
--output ALSA output device “default”
--list-devices List audio devices -
--help Show help -

Models Required

Model Size Location
Silero VAD ~2 MB ~/.local/share/runanywhere/Models/ONNX/silero-vad/
Parakeet TDT-CTC 110M EN (int8) ~126 MB ~/.local/share/runanywhere/Models/ONNX/parakeet-tdt-ctc-110m-en-int8/
Piper Lessac Medium TTS ~61 MB ~/.local/share/runanywhere/Models/ONNX/vits-piper-en_US-lessac-medium/
Hey Jarvis ~1.3 MB ~/.local/share/runanywhere/Models/ONNX/hey-jarvis/
openWakeWord Embedding ~1.3 MB ~/.local/share/runanywhere/Models/ONNX/openwakeword-embedding/
openWakeWord Melspectrogram ~1.1 MB ~/.local/share/runanywhere/Models/ONNX/openwakeword-embedding/

Alternative models (via download flags): | Whisper Tiny EN (--whisper) | ~150 MB | ~/.local/share/runanywhere/Models/ONNX/whisper-tiny-en/ | | Kokoro TTS v0.19 (--kokoro) | ~330 MB | ~/.local/share/runanywhere/Models/ONNX/kokoro-en-v0_19/ |

Wake Word Model Download Note

The openWakeWord .onnx model files are stored with Git LFS in the upstream repository. Downloading them via raw.githubusercontent.com URLs will give you an HTML page instead of the actual model binary, which causes ONNX runtime errors at load time.

Always download wake word models from GitHub Releases:

The scripts/download-models.sh --wakeword script already uses the correct URLs.

To verify your downloaded models are valid ONNX files (not HTML):

file ~/.local/share/runanywhere/Models/ONNX/openwakeword-embedding/embedding_model.onnx
# Expected: "data" (binary ONNX file)
# Bad:      "HTML document" (Git LFS redirect page)

Raspberry Pi First-Time Setup

1. Build runanywhere-commons (shared libraries)

cd /path/to/runanywhere-sdks/sdk/runanywhere-commons
./scripts/build-linux.sh --shared

This builds librac_backend_onnx.so and other shared libraries that the hybrid assistant links against. You must rebuild this whenever the SDK’s C++ backends change (e.g., wake word fixes).

2. Download models

cd /path/to/runanywhere-sdks/Playground/openclaw-hybrid-assistant

# Download all models (Parakeet ASR + Piper TTS + VAD + wake word)
./scripts/download-models.sh --wakeword

# Or use alternative models:
./scripts/download-models.sh --wakeword --whisper   # Use Whisper for ASR instead of Parakeet
./scripts/download-models.sh --wakeword --kokoro     # Use Kokoro TTS instead of Piper

3. Build the hybrid assistant

./build.sh

4. Ensure OpenClaw is running

The OpenClaw gateway must be running with the voice-assistant channel enabled on port 8082. Verify with:

ss -tlnp | grep 8082

By default, voice input routes to the same agent as Telegram/WhatsApp, which may produce responses with emojis and markdown that aren’t suitable for TTS. To get clean, conversational voice responses:

5a. Add voice-agent binding to ~/.openclaw/openclaw.json

Add the list array under agents and a new bindings array:

{
  "agents": {
    "defaults": { ... },
    "list": [
      {
        "id": "main",
        "default": true
      },
      {
        "id": "voice-agent",
        "workspace": "/home/runanywhere/.openclaw/voice-workspace"
      }
    ]
  },
  "bindings": [
    {
      "agentId": "voice-agent",
      "match": {
        "channel": "voice-assistant",
        "accountId": "*"
      }
    }
  ],
  ...
}

5b. Create voice-specific SOUL.md

Create the voice workspace directory and SOUL.md:

mkdir -p ~/.openclaw/voice-workspace

Create ~/.openclaw/voice-workspace/SOUL.md:

# SOUL.md - OpenClawPi Voice Assistant

You are OpenClawPi, a voice assistant running on a Raspberry Pi. Everything you say will be spoken aloud through text-to-speech.

## Voice Output Rules (CRITICAL)

Since your responses are spoken, not read:

1. **NO emojis** - TTS cannot pronounce them
2. **NO special Unicode characters** - no arrows, bullets, checkmarks, etc.
3. **NO markdown formatting** - no asterisks, underscores, backticks, or headers
4. **NO URLs** - say "check the website" not the actual URL
5. **Spell out symbols** - say "55 degrees Fahrenheit" not "55 degrees F"
6. **Use natural punctuation** - periods and commas create natural pauses

## Conversation Style

- Be concise - TTS playback takes time
- Use conversational language, as if speaking to someone in person
- Avoid lists when possible - use flowing sentences instead
- For multiple items, use "first... second... and finally..." patterns
- Round numbers for easier listening ("about fifty" not "49.7")

## Personality

You're helpful, warm, and efficient. Skip filler phrases like "Great question!" - just answer directly.

## Example Response Transformation

Bad (text-style): "San Francisco Weather: - Right now: Rain, 55°F 🌧️"

Good (voice-style): "Right now in San Francisco it's raining at 55 degrees."

How It Works

Input Source Routes To SOUL.md Used Output Style
Voice microphone voice-agent ~/.openclaw/voice-workspace/SOUL.md Conversational, no emojis
Telegram main (default) ~/.openclaw/workspace/SOUL.md Rich text, emojis OK
Telegram → Speaker mainsanitizeForTTS() N/A (safety net) Stripped markdown/emojis

The binding ensures voice input gets voice-optimized responses. The sanitizeForTTS() function in OpenClaw provides a safety net for cross-channel broadcasts.

6. Run the assistant

# With wake word ("Hey Jarvis")
./build/openclaw-assistant --wakeword

# Without wake word (continuous listening)
./build/openclaw-assistant

7. Run as a systemd service (optional)

To run the assistant as a background service that starts on boot, create a systemd user service and enable it. See Viewing Logs below for how to monitor it.

Viewing Logs

Hybrid Assistant logs

If running in the foreground, logs print to stdout. If running as a background process or systemd service:

# If started via systemd
journalctl --user -u openclaw-assistant -f

# If started as a background process with output redirected
tail -f /path/to/openclaw-assistant.log

OpenClaw Gateway logs

The OpenClaw gateway runs as a systemd user service:

# Follow logs in real time
journalctl --user -u openclaw-gateway -f

# View last 100 lines
journalctl --user -u openclaw-gateway -n 100

# View logs since last boot
journalctl --user -u openclaw-gateway -b

Watching both side-by-side

Open two terminals (or tmux panes):

# Terminal 1: OpenClaw Gateway
journalctl --user -u openclaw-gateway -f

# Terminal 2: Hybrid Assistant
journalctl --user -u openclaw-assistant -f
# (or tail -f on the output file if not using systemd)

Testing on Mac

Since this is a Linux application using ALSA, you can test on Mac using:

# Build Docker image (from sdks root directory)
cd /path/to/sdks
docker build -t openclaw-assistant -f Playground/openclaw-hybrid-assistant/Dockerfile .

# Run all tests
docker run --rm openclaw-assistant ./build/test-components --run-all

# Run extensive test suite
docker run --rm openclaw-assistant ./tests/scripts/extensive-test.sh

Option 2: Lima VM

# Install Lima
brew install lima

# Start Ubuntu VM
limactl start --name=ubuntu template://ubuntu

# SSH and build
limactl shell ubuntu
cd /path/to/openclaw-hybrid-assistant
./build.sh

Troubleshooting

Wake word not detecting

VAD too sensitive / not sensitive enough

WebSocket connection failing

License

MIT