Talk to your Mac, query your docs, no cloud required.
RCLI (RunAnywhere Command Line Interface) is a complete STT + LLM + TTS pipeline running on Apple Silicon with Metal GPU. 43 macOS actions via voice or text. Local RAG over your documents. Sub-200ms end-to-end latency. No cloud, no API keys.
Table of Contents
- Demo
- Install
- Quick Start
- Features
- Supported Models
- Benchmarks
- Architecture
- Build from Source
- Contributing
- License
Demo
Real-time screen recordings on Apple Silicon — no cloud, no edits, no tricks.
Install
macOS only — Apple Silicon (M1 or later), macOS 13+.
Homebrew
brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup # downloads default models (~1GB, one-time)
Or install with one command
curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash
Installs Homebrew (if needed), downloads RCLI, and fetches the default AI models (~1GB).
Quick Start
rcli # interactive TUI (push-to-talk + text)
rcli listen # continuous voice mode, always listening
rcli ask "open Safari" # one-shot text command
rcli ask "create a note called Meeting Notes"
rcli ask "play some jazz on Spotify"
Run rcli actions to see all 43 available macOS actions, or rcli --help for the full CLI reference.
Features
Voice Pipeline
A complete STT, LLM, TTS pipeline running on Metal GPU with three concurrent threads:
- VAD — Silero voice activity detection, filters silence in real-time
- STT — Zipformer streaming (live mic) + Whisper/Parakeet offline (batch)
- LLM — Qwen3 / LFM2 / Qwen3.5 with system prompt KV caching and Flash Attention
- TTS — Double-buffered sentence-level synthesis (next sentence synthesizes while current plays)
- Tool Calling — Fully LLM-driven with model-native tool call formats (Qwen3
<tool_call>, LFM2<|tool_call_start|>, etc.) - Multi-turn Memory — Sliding window conversation history with token-budget trimming to fit context
macOS Actions
Control your Mac by voice or text. The LLM routes intent to 43 actions executed locally via AppleScript and shell commands. Actions can be individually enabled/disabled (persisted across sessions) via the Actions panel or CLI.
| Category | Actions |
|---|---|
| Productivity | create_note, create_reminder, run_shortcut |
| Communication | send_message, facetime_call, facetime_audio |
| Media | play_on_spotify, play_apple_music, play_pause_music, next_track, previous_track, set_music_volume, get_now_playing |
| System | open_app, quit_app, set_volume, toggle_dark_mode, lock_screen, screenshot, search_files, open_settings, open_url, get_battery, get_wifi, get_ip_address, get_uptime, get_disk_usage |
| Window | close_window, minimize_window, fullscreen_window, get_frontmost_app, list_apps |
| Web / Nav | search_web, search_youtube, get_browser_url, get_browser_tabs, open_maps |
| Clipboard | clipboard_read, clipboard_write |
RAG (Retrieval-Augmented Generation)
Index local documents and query them by voice or text. Hybrid retrieval combining vector search (USearch HNSW) and BM25 full-text search, fused via Reciprocal Rank Fusion. Retrieval latency is ~4ms over 5K+ chunks. Supports PDF, DOCX, and plain text files.
rcli rag ingest ~/Documents/notes # index a directory
rcli rag query "What were the key decisions?"
rcli ask --rag ~/Library/RCLI/index "summarize the project plan"
In the TUI, drag a file or folder from Finder into the terminal to auto-index it, then ask questions immediately.
Interactive TUI
A terminal dashboard built with FTXUI with push-to-talk, live hardware monitoring, performance metrics, model management, and an actions browser.
| Key | Action |
|---|---|
| SPACE | Start / stop push-to-talk voice recording |
| M | Models panel — browse, download, hot-swap LLM/STT/TTS without restart |
| A | Actions panel — browse, enable/disable, run macOS actions |
| B | Benchmarks panel — run STT, LLM, TTS, E2E benchmarks |
| R | RAG panel — ingest documents, clear index |
| D | Cleanup panel — delete unused models to free disk |
| T | Toggle tool call trace — see every tool call and result inline |
| ESC | Stop processing / close panel / quit |
Drag a file or folder into the terminal to auto-index it for RAG.
Tool Call Trace
Press T in the TUI to toggle tool call tracing. When enabled, every tool call the LLM makes is shown inline in the chat — the tool name, arguments passed, and the execution result (success/fail + output). This is useful for understanding how the LLM routes your requests, debugging action failures, and evaluating tool-calling performance across different models.
> open Safari
~ [TRACE] Tool call: open_app({"app_name": "Safari"})
~ [TRACE] open_app -> OK: {"success": true, "output": "Opened Safari"}
RCLI: Done! Safari is now open.
Use rcli bench --suite tools to benchmark tool-calling accuracy and latency for the active LLM, or rcli bench --all-llm --suite tools to compare across all installed models.
Supported Models
RCLI ships with a default model set (~1GB via rcli setup) and supports 20 models across 5 modalities. All models run locally on Apple Silicon with Metal GPU. Use rcli models to download, switch, or remove any model.
LLM
| Model | Provider | Size | Speed | License | Features |
|---|---|---|---|---|---|
| LFM2 1.2B Tool | Liquid AI | 731 MB | ~180 t/s | LFM Open | tool calling, default |
| LFM2 350M | Liquid AI | 219 MB | ~350 t/s | LFM Open | fastest inference, 128K context |
| LFM2.5 1.2B Instruct | Liquid AI | 731 MB | ~180 t/s | LFM Open | 128K context |
| LFM2 2.6B | Liquid AI | 1.5 GB | ~120 t/s | LFM Open | stronger conversational, 128K context |
| Qwen3 0.6B | Alibaba Qwen | 456 MB | ~250 t/s | Apache 2.0 | ultra-fast, smallest footprint |
| Qwen3.5 0.8B | Alibaba Qwen | 600 MB | ~220 t/s | Apache 2.0 | Qwen3.5 generation |
| Qwen3.5 2B | Alibaba Qwen | 1.2 GB | ~150 t/s | Apache 2.0 | good all-rounder |
| Qwen3 4B | Alibaba Qwen | 2.5 GB | ~80 t/s | Apache 2.0 | smart reasoning |
| Qwen3.5 4B | Alibaba Qwen | 2.7 GB | ~75 t/s | Apache 2.0 | best small model, 262K context |
TTS
| Voice | Provider | Size | Speakers | License | Features |
|---|---|---|---|---|---|
| Piper Lessac | Rhasspy | 60 MB | 1 | MIT | fast, clear English, default |
| Piper Amy | Rhasspy | 60 MB | 1 | MIT | warm female voice |
| KittenTTS Nano | KittenML | 90 MB | 8 | Apache 2.0 | 8 voices (4M/4F), lightweight |
| Matcha LJSpeech | Matcha-TTS | 100 MB | 1 | MIT | HiFi-GAN vocoder |
| Kokoro English v0.19 | Hexgrad | 310 MB | 11 | Apache 2.0 | best English quality |
| Kokoro Multi-lang v1.1 | Hexgrad | 500 MB | 103 | Apache 2.0 | 103 speakers, Chinese + English |
STT
| Model | Provider | Size | Accuracy | License | Features |
|---|---|---|---|---|---|
| Zipformer | k2-fsa | 50 MB | Good | Apache 2.0 | streaming (live mic), default |
| Whisper base.en | OpenAI | 140 MB | ~5% WER | MIT | offline, English, default |
| Parakeet TDT 0.6B v3 | NVIDIA | 640 MB | ~1.9% WER | CC-BY-4.0 | 25 languages, auto-punctuation |
VAD and Embeddings
| Model | Provider | Modality | Size | License |
|---|---|---|---|---|
| Silero VAD | Silero | Voice Activity Detection | 0.6 MB | MIT |
| Snowflake Arctic Embed S | Snowflake | Text Embeddings (RAG) | 34 MB | Apache 2.0 |
Defaults (installed by rcli setup)
rcli setup downloads ~1GB: LFM2 1.2B Tool (LLM), Zipformer + Whisper base.en (STT), Piper Lessac (TTS), Silero (VAD), and Snowflake Arctic Embed S (embeddings).
Model Commands
rcli models # interactive model management (all modalities)
rcli models llm # jump to LLM management
rcli upgrade-llm # guided LLM upgrade
rcli upgrade-stt # upgrade to Parakeet TDT (~1.9% WER)
rcli voices # browse, download, switch TTS voices
rcli cleanup # remove unused models to free disk space
rcli info # show engine info and installed models
Models are stored in ~/Library/RCLI/models/. Active model selection persists across launches in ~/Library/RCLI/config.
Benchmarks
All measurements on Apple M3 Max (14-core CPU, 30-core GPU, 36 GB unified memory).
| Component | Metric | Value |
|---|---|---|
| STT | Avg latency | 43.7 ms |
| STT | Real-time factor | 0.022x |
| LLM | Time to first token | 22.5 ms |
| LLM | Generation throughput | 159.6 tok/s |
| TTS | Avg latency | 150.6 ms |
| RAG | Hybrid retrieval | 3.82 ms |
| E2E | Voice-in to audio-out | 131 ms |
rcli bench # run all benchmarks
rcli bench --suite llm # LLM only
rcli bench --suite tools # tool-calling accuracy and latency
rcli bench --all-llm --suite llm # compare all installed LLMs
rcli bench --all-llm --suite tools # compare tool calling across LLMs
rcli bench --output results.json # export to JSON
Suites: stt, llm, tts, e2e, tools, rag, memory, all.
Architecture
Mic → VAD → STT → [RAG] → LLM → TTS → Speaker
|
Tool Calling → 43 macOS Actions
|
[Tool Trace] → TUI (optional)
Three dedicated threads in live mode, synchronized via condition variables:
| Thread | Role |
|---|---|
| STT | Captures mic audio, runs VAD, detects speech endpoints |
| LLM | Receives transcribed text, generates tokens, dispatches tool calls |
| TTS | Queues sentences from LLM, double-buffered playback |
Design decisions:
- 64 MB pre-allocated memory pool — no runtime malloc during inference
- Lock-free ring buffers — zero-copy audio transfer between threads
- System prompt KV caching — reuses llama.cpp KV cache across queries
- Sentence-level TTS scheduling — next sentence synthesizes while current plays
- Hardware profiling at startup — detects P/E cores, Metal GPU, RAM for optimal config
- Filtered tool definitions — top-k relevance scoring limits tool context for small LLMs
- Token-budget conversation trimming — fits history to context window, evicts oldest turns
- Live model hot-swap — switch LLM at runtime without restarting the pipeline
Project Structure
src/
engines/ STT, LLM, TTS, VAD, embedding engine wrappers, model profiles
pipeline/ Orchestrator, sentence detector, text sanitizer
rag/ Vector index, BM25, hybrid retriever, document processor
core/ Types, ring buffer, memory pool, hardware profiler
audio/ CoreAudio mic/speaker I/O
tools/ Tool calling engine with JSON schema definitions
bench/ Benchmark harness (STT, LLM, TTS, E2E, tools, RAG, memory)
actions/ 43 macOS action implementations (AppleScript + shell)
api/ C API (rcli_api.h) — public engine interface
cli/ TUI dashboard (FTXUI), CLI commands
models/ Model registries (LLM, TTS, STT) with on-demand download
scripts/ setup.sh, download_models.sh, package.sh
Formula/ Homebrew formula (self-hosted tap)
Build from Source
git clone https://github.com/RunanywhereAI/RCLI.git && cd RCLI
bash scripts/setup.sh # clone llama.cpp + sherpa-onnx
bash scripts/download_models.sh # download models (~1GB)
mkdir -p build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . -j$(sysctl -n hw.ncpu)
./rcli
Dependencies
All vendored or CMake-fetched. No external package manager required.
| Dependency | Purpose |
|---|---|
| llama.cpp | LLM + embedding inference with Metal GPU |
| sherpa-onnx | STT / TTS / VAD via ONNX Runtime |
| USearch | HNSW vector index for RAG |
| FTXUI | Terminal UI library |
| CoreAudio, Metal, Accelerate, IOKit | macOS system frameworks |
Requires CMake 3.15+ and Apple Clang (C++17).
CLI Reference
``` rcli Interactive TUI (push-to-talk + text + trace) rcli listen Continuous voice mode (always listening) rcli askPowered by RunAnywhere, Inc.