runanywhere-sdks

Linux Voice Assistant

A complete on-device voice AI pipeline for Linux (Raspberry Pi 5, x86_64, ARM64). All inference runs locally — no cloud, no API keys.

Pipeline: Wake Word -> VAD -> STT -> LLM -> TTS

Architecture

Microphone (ALSA)
    │
    ▼
Wake Word Detection (openWakeWord / "Hey Jarvis")  [optional]
    │
    ▼
Voice Activity Detection (Silero VAD)
    │  Buffers speech, detects silence timeout
    ▼
Speech-to-Text (Whisper Tiny EN)
    │
    ▼
Large Language Model (Qwen2.5 0.5B Q4)
    │
    ▼
Text-to-Speech (Piper Lessac Medium)
    │
    ▼
Speaker (ALSA)

Project Structure

linux-voice-assistant/
├── src/
│   ├── main.cpp                    # Entry point, CLI parsing, main loop
│   ├── audio/
│   │   ├── audio_capture.h/cpp     # ALSA mic input (16kHz, 16-bit PCM, mono)
│   │   └── audio_playback.h/cpp    # ALSA speaker output (multi-rate)
│   ├── pipeline/
│   │   └── voice_pipeline.h/cpp    # Full pipeline: VAD -> STT -> LLM -> TTS
│   └── config/
│       └── model_config.h          # Model paths, IDs, availability checks
├── tests/
│   └── test_pipeline.cpp           # Feed WAV file through pipeline (no mic needed)
├── scripts/
│   └── download-models.sh          # Download all required models
├── CMakeLists.txt                  # Build configuration
├── build.sh                        # End-to-end build script
└── README.md

Quick Start

Prerequisites

Build and Run

# 1. Build everything (SDK + models + assistant)
./build.sh

# 2. Run the voice assistant
./build/voice-assistant

# With wake word detection:
./build/voice-assistant --wakeword

Manual Build

# Step 1: Download Sherpa-ONNX
cd ../../sdk/runanywhere-commons
./scripts/linux/download-sherpa-onnx.sh

# Step 2: Build runanywhere-commons
./scripts/build-linux.sh --shared

# Step 3: Download models
cd ../../Playground/linux-voice-assistant
./scripts/download-models.sh

# Step 4: Build
mkdir -p build && cd build
cmake ..
cmake --build . -j$(nproc)

# Step 5: Run
./voice-assistant

Models

Component Model Size Framework
VAD Silero VAD ~2 MB ONNX
STT Whisper Tiny EN ~150 MB ONNX (Sherpa)
LLM Qwen2.5 0.5B Q4 ~500 MB llama.cpp
TTS Piper Lessac Medium ~65 MB ONNX (Sherpa)
Wake Word openWakeWord “Hey Jarvis” ~20 MB ONNX

Download models:

# Required models (VAD, STT, LLM, TTS)
./scripts/download-models.sh

# Optional: Wake word model
./scripts/download-models.sh --wakeword

# Select a different LLM:
./scripts/download-models.sh --model qwen3-1.7b
./scripts/download-models.sh --model llama-3.2-3b
./scripts/download-models.sh --model qwen3-4b

Usage

# Basic usage (always listening)
./build/voice-assistant

# With wake word ("Hey Jarvis" to activate)
./build/voice-assistant --wakeword

# Select audio devices
./build/voice-assistant --list-devices
./build/voice-assistant --input hw:1,0 --output hw:0,0

# Test pipeline with a WAV file (no microphone needed)
./build/test-pipeline path/to/audio.wav

Components

Audio Capture (src/audio/audio_capture)

Audio Playback (src/audio/audio_playback)

Voice Pipeline (src/pipeline/voice_pipeline)

Model Config (src/config/model_config)

Troubleshooting

“ALSA: Cannot open audio device”

“Models are missing”

No audio output

Slow LLM response on Raspberry Pi