RCLI | On-device voice AI for macOS — STT, LLM, TTS, 38+ actions, and local RAG. 100% local, zero cloud.

RCLI Waveform
Talk to your Mac, query your docs, no cloud required.

RCLI (RunAnywhere Command Line Interface) is a complete STT + LLM + TTS pipeline running on Apple Silicon with Metal GPU. 43 macOS actions via voice or text. Local RAG over your documents. Sub-200ms end-to-end latency. No cloud, no API keys.

Demo
Install
Quick Start
Features
Supported Models
Benchmarks
Architecture
Build from Source
Contributing
License

Demo

Real-time screen recordings on Apple Silicon — no cloud, no edits, no tricks.

Voice Conversation Talk naturally — RCLI listens, understands, and responds on-device. _{🔊 Click for full video with audio}	App Control Control Spotify, adjust volume — 43 macOS actions by voice. _{🔊 Click for full video with audio}
Models & Benchmarks Browse models, hot-swap LLMs, run benchmarks — all from the TUI. _{🔊 Click for full video with audio}	Document Intelligence (RAG) Ingest docs, ask questions by voice — ~4ms hybrid retrieval. _{🔊 Click for full video with audio}

Install

macOS only — Apple Silicon (M1 or later), macOS 13+.

Homebrew

brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup     # downloads default models (~1GB, one-time)

Or install with one command

curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash

Installs Homebrew (if needed), downloads RCLI, and fetches the default AI models (~1GB).

Quick Start

rcli                             # interactive TUI (push-to-talk + text)
rcli listen                      # continuous voice mode, always listening
rcli ask "open Safari"           # one-shot text command
rcli ask "create a note called Meeting Notes"
rcli ask "play some jazz on Spotify"

Run rcli actions to see all 43 available macOS actions, or rcli --help for the full CLI reference.

Features

Voice Pipeline

A complete STT, LLM, TTS pipeline running on Metal GPU with three concurrent threads:

VAD — Silero voice activity detection, filters silence in real-time
STT — Zipformer streaming (live mic) + Whisper/Parakeet offline (batch)
LLM — Qwen3 / LFM2 / Qwen3.5 with system prompt KV caching and Flash Attention
TTS — Double-buffered sentence-level synthesis (next sentence synthesizes while current plays)
Tool Calling — Fully LLM-driven with model-native tool call formats (Qwen3 <tool_call>, LFM2 <|tool_call_start|>, etc.)
Multi-turn Memory — Sliding window conversation history with token-budget trimming to fit context

macOS Actions

Control your Mac by voice or text. The LLM routes intent to 43 actions executed locally via AppleScript and shell commands. Actions can be individually enabled/disabled (persisted across sessions) via the Actions panel or CLI.

Category	Actions
Productivity	`create_note`, `create_reminder`, `run_shortcut`
Communication	`send_message`, `facetime_call`, `facetime_audio`
Media	`play_on_spotify`, `play_apple_music`, `play_pause_music`, `next_track`, `previous_track`, `set_music_volume`, `get_now_playing`
System	`open_app`, `quit_app`, `set_volume`, `toggle_dark_mode`, `lock_screen`, `screenshot`, `search_files`, `open_settings`, `open_url`, `get_battery`, `get_wifi`, `get_ip_address`, `get_uptime`, `get_disk_usage`
Window	`close_window`, `minimize_window`, `fullscreen_window`, `get_frontmost_app`, `list_apps`
Web / Nav	`search_web`, `search_youtube`, `get_browser_url`, `get_browser_tabs`, `open_maps`
Clipboard	`clipboard_read`, `clipboard_write`

RAG (Retrieval-Augmented Generation)

Index local documents and query them by voice or text. Hybrid retrieval combining vector search (USearch HNSW) and BM25 full-text search, fused via Reciprocal Rank Fusion. Retrieval latency is ~4ms over 5K+ chunks. Supports PDF, DOCX, and plain text files.

rcli rag ingest ~/Documents/notes         # index a directory
rcli rag query "What were the key decisions?"
rcli ask --rag ~/Library/RCLI/index "summarize the project plan"

In the TUI, drag a file or folder from Finder into the terminal to auto-index it, then ask questions immediately.

Interactive TUI

A terminal dashboard built with FTXUI with push-to-talk, live hardware monitoring, performance metrics, model management, and an actions browser.

Key	Action
SPACE	Start / stop push-to-talk voice recording
M	Models panel — browse, download, hot-swap LLM/STT/TTS without restart
A	Actions panel — browse, enable/disable, run macOS actions
B	Benchmarks panel — run STT, LLM, TTS, E2E benchmarks
R	RAG panel — ingest documents, clear index
D	Cleanup panel — delete unused models to free disk
T	Toggle tool call trace — see every tool call and result inline
ESC	Stop processing / close panel / quit

Drag a file or folder into the terminal to auto-index it for RAG.

Tool Call Trace

Press T in the TUI to toggle tool call tracing. When enabled, every tool call the LLM makes is shown inline in the chat — the tool name, arguments passed, and the execution result (success/fail + output). This is useful for understanding how the LLM routes your requests, debugging action failures, and evaluating tool-calling performance across different models.

> open Safari
  ~ [TRACE] Tool call: open_app({"app_name": "Safari"})
  ~ [TRACE] open_app -> OK: {"success": true, "output": "Opened Safari"}
  RCLI: Done! Safari is now open.

Use rcli bench --suite tools to benchmark tool-calling accuracy and latency for the active LLM, or rcli bench --all-llm --suite tools to compare across all installed models.

Supported Models

RCLI ships with a default model set (~1GB via rcli setup) and supports 20 models across 5 modalities. All models run locally on Apple Silicon with Metal GPU. Use rcli models to download, switch, or remove any model.

LLM

Model	Provider	Size	Speed	License	Features
LFM2 1.2B Tool	Liquid AI	731 MB	~180 t/s	LFM Open	tool calling, default
LFM2 350M	Liquid AI	219 MB	~350 t/s	LFM Open	fastest inference, 128K context
LFM2.5 1.2B Instruct	Liquid AI	731 MB	~180 t/s	LFM Open	128K context
LFM2 2.6B	Liquid AI	1.5 GB	~120 t/s	LFM Open	stronger conversational, 128K context
Qwen3 0.6B	Alibaba Qwen	456 MB	~250 t/s	Apache 2.0	ultra-fast, smallest footprint
Qwen3.5 0.8B	Alibaba Qwen	600 MB	~220 t/s	Apache 2.0	Qwen3.5 generation
Qwen3.5 2B	Alibaba Qwen	1.2 GB	~150 t/s	Apache 2.0	good all-rounder
Qwen3 4B	Alibaba Qwen	2.5 GB	~80 t/s	Apache 2.0	smart reasoning
Qwen3.5 4B	Alibaba Qwen	2.7 GB	~75 t/s	Apache 2.0	best small model, 262K context

TTS

Voice	Provider	Size	Speakers	License	Features
Piper Lessac	Rhasspy	60 MB	1	MIT	fast, clear English, default
Piper Amy	Rhasspy	60 MB	1	MIT	warm female voice
KittenTTS Nano	KittenML	90 MB	8	Apache 2.0	8 voices (4M/4F), lightweight
Matcha LJSpeech	Matcha-TTS	100 MB	1	MIT	HiFi-GAN vocoder
Kokoro English v0.19	Hexgrad	310 MB	11	Apache 2.0	best English quality
Kokoro Multi-lang v1.1	Hexgrad	500 MB	103	Apache 2.0	103 speakers, Chinese + English

STT

Model	Provider	Size	Accuracy	License	Features
Zipformer	k2-fsa	50 MB	Good	Apache 2.0	streaming (live mic), default
Whisper base.en	OpenAI	140 MB	~5% WER	MIT	offline, English, default
Parakeet TDT 0.6B v3	NVIDIA	640 MB	~1.9% WER	CC-BY-4.0	25 languages, auto-punctuation

VAD and Embeddings

Model	Provider	Modality	Size	License
Silero VAD	Silero	Voice Activity Detection	0.6 MB	MIT
Snowflake Arctic Embed S	Snowflake	Text Embeddings (RAG)	34 MB	Apache 2.0

Defaults (installed by `rcli setup`)

rcli setup downloads ~1GB: LFM2 1.2B Tool (LLM), Zipformer + Whisper base.en (STT), Piper Lessac (TTS), Silero (VAD), and Snowflake Arctic Embed S (embeddings).

Model Commands

rcli models                  # interactive model management (all modalities)
rcli models llm              # jump to LLM management
rcli upgrade-llm             # guided LLM upgrade
rcli upgrade-stt             # upgrade to Parakeet TDT (~1.9% WER)
rcli voices                  # browse, download, switch TTS voices
rcli cleanup                 # remove unused models to free disk space
rcli info                    # show engine info and installed models

Models are stored in ~/Library/RCLI/models/. Active model selection persists across launches in ~/Library/RCLI/config.

Benchmarks

All measurements on Apple M3 Max (14-core CPU, 30-core GPU, 36 GB unified memory).

Component	Metric	Value
STT	Avg latency	43.7 ms
STT	Real-time factor	0.022x
LLM	Time to first token	22.5 ms
LLM	Generation throughput	159.6 tok/s
TTS	Avg latency	150.6 ms
RAG	Hybrid retrieval	3.82 ms
E2E	Voice-in to audio-out	131 ms

rcli bench                          # run all benchmarks
rcli bench --suite llm              # LLM only
rcli bench --suite tools            # tool-calling accuracy and latency
rcli bench --all-llm --suite llm    # compare all installed LLMs
rcli bench --all-llm --suite tools  # compare tool calling across LLMs
rcli bench --output results.json    # export to JSON

Suites: stt, llm, tts, e2e, tools, rag, memory, all.

Architecture

Mic → VAD → STT → [RAG] → LLM → TTS → Speaker
                            |
                     Tool Calling → 43 macOS Actions
                            |
                     [Tool Trace] → TUI (optional)

Three dedicated threads in live mode, synchronized via condition variables:

Thread	Role
STT	Captures mic audio, runs VAD, detects speech endpoints
LLM	Receives transcribed text, generates tokens, dispatches tool calls
TTS	Queues sentences from LLM, double-buffered playback

Design decisions:

64 MB pre-allocated memory pool — no runtime malloc during inference
Lock-free ring buffers — zero-copy audio transfer between threads
System prompt KV caching — reuses llama.cpp KV cache across queries
Sentence-level TTS scheduling — next sentence synthesizes while current plays
Hardware profiling at startup — detects P/E cores, Metal GPU, RAM for optimal config
Filtered tool definitions — top-k relevance scoring limits tool context for small LLMs
Token-budget conversation trimming — fits history to context window, evicts oldest turns
Live model hot-swap — switch LLM at runtime without restarting the pipeline

Project Structure

src/
  engines/     STT, LLM, TTS, VAD, embedding engine wrappers, model profiles
  pipeline/    Orchestrator, sentence detector, text sanitizer
  rag/         Vector index, BM25, hybrid retriever, document processor
  core/        Types, ring buffer, memory pool, hardware profiler
  audio/       CoreAudio mic/speaker I/O
  tools/       Tool calling engine with JSON schema definitions
  bench/       Benchmark harness (STT, LLM, TTS, E2E, tools, RAG, memory)
  actions/     43 macOS action implementations (AppleScript + shell)
  api/         C API (rcli_api.h) — public engine interface
  cli/         TUI dashboard (FTXUI), CLI commands
  models/      Model registries (LLM, TTS, STT) with on-demand download
scripts/       setup.sh, download_models.sh, package.sh
Formula/       Homebrew formula (self-hosted tap)

Build from Source

git clone https://github.com/RunanywhereAI/RCLI.git && cd RCLI
bash scripts/setup.sh              # clone llama.cpp + sherpa-onnx
bash scripts/download_models.sh    # download models (~1GB)
mkdir -p build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . -j$(sysctl -n hw.ncpu)
./rcli

Dependencies

All vendored or CMake-fetched. No external package manager required.

Dependency	Purpose
llama.cpp	LLM + embedding inference with Metal GPU
sherpa-onnx	STT / TTS / VAD via ONNX Runtime
USearch	HNSW vector index for RAG
FTXUI	Terminal UI library
CoreAudio, Metal, Accelerate, IOKit	macOS system frameworks

Requires CMake 3.15+ and Apple Clang (C++17).

CLI Reference

``` rcli Interactive TUI (push-to-talk + text + trace) rcli listen Continuous voice mode (always listening) rcli ask One-shot text command rcli actions [name] List actions or show detail for one rcli action [json] Execute action directly rcli rag ingest Index documents for RAG rcli rag query Query indexed documents rcli rag status Show index info rcli models [llm|stt|tts] Manage AI models rcli voices Manage TTS voices rcli upgrade-llm Download a larger LLM rcli upgrade-stt Download Parakeet TDT rcli bench [--suite ...] Run benchmarks rcli mic-test Test microphone audio levels rcli cleanup Remove unused models rcli setup Download default models (~1GB) rcli info Show engine info and installed models Options: --models Models directory (default: ~/Library/RCLI/models) --rag Load RAG index for document-grounded answers --gpu-layers GPU layers for LLM (default: 99 = all) --ctx-size LLM context size (default: 4096) --no-speak Text output only (no TTS playback) --verbose, -v Show debug logs --suite Benchmark suite: stt, llm, tts, e2e, tools, rag, memory, all --all-llm Benchmark all installed LLM models --all-tts Benchmark all installed TTS voices --output Export benchmark results to JSON ``` </details> ## Contributing Contributions are welcome. See [CONTRIBUTING.md](/RCLI/CONTRIBUTING.md) for build instructions, architecture overview, and how to add new actions, models, or voices. ## License MIT License. See [LICENSE](/RCLI/LICENSE) for details.

Table of Contents