A complete on-device voice AI pipeline for Linux (Raspberry Pi 5, x86_64, ARM64). All inference runs locally — no cloud, no API keys.
Pipeline: Wake Word -> VAD -> STT -> LLM -> TTS
Microphone (ALSA)
│
▼
Wake Word Detection (openWakeWord / "Hey Jarvis") [optional]
│
▼
Voice Activity Detection (Silero VAD)
│ Buffers speech, detects silence timeout
▼
Speech-to-Text (Whisper Tiny EN)
│
▼
Large Language Model (Qwen2.5 0.5B Q4)
│
▼
Text-to-Speech (Piper Lessac Medium)
│
▼
Speaker (ALSA)
linux-voice-assistant/
├── src/
│ ├── main.cpp # Entry point, CLI parsing, main loop
│ ├── audio/
│ │ ├── audio_capture.h/cpp # ALSA mic input (16kHz, 16-bit PCM, mono)
│ │ └── audio_playback.h/cpp # ALSA speaker output (multi-rate)
│ ├── pipeline/
│ │ └── voice_pipeline.h/cpp # Full pipeline: VAD -> STT -> LLM -> TTS
│ └── config/
│ └── model_config.h # Model paths, IDs, availability checks
├── tests/
│ └── test_pipeline.cpp # Feed WAV file through pipeline (no mic needed)
├── scripts/
│ └── download-models.sh # Download all required models
├── CMakeLists.txt # Build configuration
├── build.sh # End-to-end build script
└── README.md
sudo apt install libasound2-dev# 1. Build everything (SDK + models + assistant)
./build.sh
# 2. Run the voice assistant
./build/voice-assistant
# With wake word detection:
./build/voice-assistant --wakeword
# Step 1: Download Sherpa-ONNX
cd ../../sdk/runanywhere-commons
./scripts/linux/download-sherpa-onnx.sh
# Step 2: Build runanywhere-commons
./scripts/build-linux.sh --shared
# Step 3: Download models
cd ../../Playground/linux-voice-assistant
./scripts/download-models.sh
# Step 4: Build
mkdir -p build && cd build
cmake ..
cmake --build . -j$(nproc)
# Step 5: Run
./voice-assistant
| Component | Model | Size | Framework |
|---|---|---|---|
| VAD | Silero VAD | ~2 MB | ONNX |
| STT | Whisper Tiny EN | ~150 MB | ONNX (Sherpa) |
| LLM | Qwen2.5 0.5B Q4 | ~500 MB | llama.cpp |
| TTS | Piper Lessac Medium | ~65 MB | ONNX (Sherpa) |
| Wake Word | openWakeWord “Hey Jarvis” | ~20 MB | ONNX |
Download models:
# Required models (VAD, STT, LLM, TTS)
./scripts/download-models.sh
# Optional: Wake word model
./scripts/download-models.sh --wakeword
# Select a different LLM:
./scripts/download-models.sh --model qwen3-1.7b
./scripts/download-models.sh --model llama-3.2-3b
./scripts/download-models.sh --model qwen3-4b
# Basic usage (always listening)
./build/voice-assistant
# With wake word ("Hey Jarvis" to activate)
./build/voice-assistant --wakeword
# Select audio devices
./build/voice-assistant --list-devices
./build/voice-assistant --input hw:1,0 --output hw:0,0
# Test pipeline with a WAV file (no microphone needed)
./build/test-pipeline path/to/audio.wav
src/audio/audio_capture)src/audio/audio_playback)src/pipeline/voice_pipeline)rac_voice_agent_transcriberac_voice_agent_process_voice_turnrac_voice_agent_synthesize_speechsrc/config/model_config)~/.local/share/runanywhere/Models/“ALSA: Cannot open audio device”
aplay -l (output) and arecord -l (input)--input hw:1,0“Models are missing”
./scripts/download-models.sh to download all required models./scripts/download-models.sh --wakewordNo audio output
alsamixerspeaker-test -D default -c 2Slow LLM response on Raspberry Pi
./scripts/download-models.sh --model qwen3-0.6b