RCLI

On-device voice AI for macOS — STT, LLM, TTS, 38+ actions, and local RAG. 100% local, zero cloud.

View on GitHub

RCLI Waveform
Talk to your Mac, query your docs, no cloud required.

macOS Apple Silicon C++17 Local MIT
Liquid AI Qwen Whisper Parakeet Piper KittenTTS Kokoro

RCLI (RunAnywhere Command Line Interface) is a complete STT + LLM + TTS pipeline running on Apple Silicon with Metal GPU. 43 macOS actions via voice or text. Local RAG over your documents. Sub-200ms end-to-end latency. No cloud, no API keys.

Table of Contents

Demo

Real-time screen recordings on Apple Silicon — no cloud, no edits, no tricks.

Voice Conversation
Talk naturally — RCLI listens, understands, and responds on-device.

Voice Conversation Demo
🔊 Click for full video with audio
App Control
Control Spotify, adjust volume — 43 macOS actions by voice.

App Control Demo
🔊 Click for full video with audio
Models & Benchmarks
Browse models, hot-swap LLMs, run benchmarks — all from the TUI.

Models & Benchmarks Demo
🔊 Click for full video with audio
Document Intelligence (RAG)
Ingest docs, ask questions by voice — ~4ms hybrid retrieval.

RAG Demo
🔊 Click for full video with audio

Install

macOS only — Apple Silicon (M1 or later), macOS 13+.

Homebrew

brew tap RunanywhereAI/rcli https://github.com/RunanywhereAI/RCLI.git
brew install rcli
rcli setup     # downloads default models (~1GB, one-time)

Or install with one command

curl -fsSL https://raw.githubusercontent.com/RunanywhereAI/RCLI/main/install.sh | bash

Installs Homebrew (if needed), downloads RCLI, and fetches the default AI models (~1GB).

Quick Start

rcli                             # interactive TUI (push-to-talk + text)
rcli listen                      # continuous voice mode, always listening
rcli ask "open Safari"           # one-shot text command
rcli ask "create a note called Meeting Notes"
rcli ask "play some jazz on Spotify"

Run rcli actions to see all 43 available macOS actions, or rcli --help for the full CLI reference.

Features

Voice Pipeline

A complete STT, LLM, TTS pipeline running on Metal GPU with three concurrent threads:

macOS Actions

Control your Mac by voice or text. The LLM routes intent to 43 actions executed locally via AppleScript and shell commands. Actions can be individually enabled/disabled (persisted across sessions) via the Actions panel or CLI.

Category Actions
Productivity create_note, create_reminder, run_shortcut
Communication send_message, facetime_call, facetime_audio
Media play_on_spotify, play_apple_music, play_pause_music, next_track, previous_track, set_music_volume, get_now_playing
System open_app, quit_app, set_volume, toggle_dark_mode, lock_screen, screenshot, search_files, open_settings, open_url, get_battery, get_wifi, get_ip_address, get_uptime, get_disk_usage
Window close_window, minimize_window, fullscreen_window, get_frontmost_app, list_apps
Web / Nav search_web, search_youtube, get_browser_url, get_browser_tabs, open_maps
Clipboard clipboard_read, clipboard_write

RAG (Retrieval-Augmented Generation)

Index local documents and query them by voice or text. Hybrid retrieval combining vector search (USearch HNSW) and BM25 full-text search, fused via Reciprocal Rank Fusion. Retrieval latency is ~4ms over 5K+ chunks. Supports PDF, DOCX, and plain text files.

rcli rag ingest ~/Documents/notes         # index a directory
rcli rag query "What were the key decisions?"
rcli ask --rag ~/Library/RCLI/index "summarize the project plan"

In the TUI, drag a file or folder from Finder into the terminal to auto-index it, then ask questions immediately.

Interactive TUI

A terminal dashboard built with FTXUI with push-to-talk, live hardware monitoring, performance metrics, model management, and an actions browser.

Key Action
SPACE Start / stop push-to-talk voice recording
M Models panel — browse, download, hot-swap LLM/STT/TTS without restart
A Actions panel — browse, enable/disable, run macOS actions
B Benchmarks panel — run STT, LLM, TTS, E2E benchmarks
R RAG panel — ingest documents, clear index
D Cleanup panel — delete unused models to free disk
T Toggle tool call trace — see every tool call and result inline
ESC Stop processing / close panel / quit

Drag a file or folder into the terminal to auto-index it for RAG.

Tool Call Trace

Press T in the TUI to toggle tool call tracing. When enabled, every tool call the LLM makes is shown inline in the chat — the tool name, arguments passed, and the execution result (success/fail + output). This is useful for understanding how the LLM routes your requests, debugging action failures, and evaluating tool-calling performance across different models.

> open Safari
  ~ [TRACE] Tool call: open_app({"app_name": "Safari"})
  ~ [TRACE] open_app -> OK: {"success": true, "output": "Opened Safari"}
  RCLI: Done! Safari is now open.

Use rcli bench --suite tools to benchmark tool-calling accuracy and latency for the active LLM, or rcli bench --all-llm --suite tools to compare across all installed models.

Supported Models

RCLI ships with a default model set (~1GB via rcli setup) and supports 20 models across 5 modalities. All models run locally on Apple Silicon with Metal GPU. Use rcli models to download, switch, or remove any model.

LLM

Model Provider Size Speed License Features
LFM2 1.2B Tool Liquid AI 731 MB ~180 t/s LFM Open tool calling, default
LFM2 350M Liquid AI 219 MB ~350 t/s LFM Open fastest inference, 128K context
LFM2.5 1.2B Instruct Liquid AI 731 MB ~180 t/s LFM Open 128K context
LFM2 2.6B Liquid AI 1.5 GB ~120 t/s LFM Open stronger conversational, 128K context
Qwen3 0.6B Alibaba Qwen 456 MB ~250 t/s Apache 2.0 ultra-fast, smallest footprint
Qwen3.5 0.8B Alibaba Qwen 600 MB ~220 t/s Apache 2.0 Qwen3.5 generation
Qwen3.5 2B Alibaba Qwen 1.2 GB ~150 t/s Apache 2.0 good all-rounder
Qwen3 4B Alibaba Qwen 2.5 GB ~80 t/s Apache 2.0 smart reasoning
Qwen3.5 4B Alibaba Qwen 2.7 GB ~75 t/s Apache 2.0 best small model, 262K context

TTS

Voice Provider Size Speakers License Features
Piper Lessac Rhasspy 60 MB 1 MIT fast, clear English, default
Piper Amy Rhasspy 60 MB 1 MIT warm female voice
KittenTTS Nano KittenML 90 MB 8 Apache 2.0 8 voices (4M/4F), lightweight
Matcha LJSpeech Matcha-TTS 100 MB 1 MIT HiFi-GAN vocoder
Kokoro English v0.19 Hexgrad 310 MB 11 Apache 2.0 best English quality
Kokoro Multi-lang v1.1 Hexgrad 500 MB 103 Apache 2.0 103 speakers, Chinese + English

STT

Model Provider Size Accuracy License Features
Zipformer k2-fsa 50 MB Good Apache 2.0 streaming (live mic), default
Whisper base.en OpenAI 140 MB ~5% WER MIT offline, English, default
Parakeet TDT 0.6B v3 NVIDIA 640 MB ~1.9% WER CC-BY-4.0 25 languages, auto-punctuation

VAD and Embeddings

Model Provider Modality Size License
Silero VAD Silero Voice Activity Detection 0.6 MB MIT
Snowflake Arctic Embed S Snowflake Text Embeddings (RAG) 34 MB Apache 2.0

Defaults (installed by rcli setup)

rcli setup downloads ~1GB: LFM2 1.2B Tool (LLM), Zipformer + Whisper base.en (STT), Piper Lessac (TTS), Silero (VAD), and Snowflake Arctic Embed S (embeddings).

Model Commands

rcli models                  # interactive model management (all modalities)
rcli models llm              # jump to LLM management
rcli upgrade-llm             # guided LLM upgrade
rcli upgrade-stt             # upgrade to Parakeet TDT (~1.9% WER)
rcli voices                  # browse, download, switch TTS voices
rcli cleanup                 # remove unused models to free disk space
rcli info                    # show engine info and installed models

Models are stored in ~/Library/RCLI/models/. Active model selection persists across launches in ~/Library/RCLI/config.

Benchmarks

All measurements on Apple M3 Max (14-core CPU, 30-core GPU, 36 GB unified memory).

Component Metric Value
STT Avg latency 43.7 ms
STT Real-time factor 0.022x
LLM Time to first token 22.5 ms
LLM Generation throughput 159.6 tok/s
TTS Avg latency 150.6 ms
RAG Hybrid retrieval 3.82 ms
E2E Voice-in to audio-out 131 ms
rcli bench                          # run all benchmarks
rcli bench --suite llm              # LLM only
rcli bench --suite tools            # tool-calling accuracy and latency
rcli bench --all-llm --suite llm    # compare all installed LLMs
rcli bench --all-llm --suite tools  # compare tool calling across LLMs
rcli bench --output results.json    # export to JSON

Suites: stt, llm, tts, e2e, tools, rag, memory, all.

Architecture

Mic → VAD → STT → [RAG] → LLM → TTS → Speaker
                            |
                     Tool Calling → 43 macOS Actions
                            |
                     [Tool Trace] → TUI (optional)

Three dedicated threads in live mode, synchronized via condition variables:

Thread Role
STT Captures mic audio, runs VAD, detects speech endpoints
LLM Receives transcribed text, generates tokens, dispatches tool calls
TTS Queues sentences from LLM, double-buffered playback

Design decisions:

Project Structure

src/
  engines/     STT, LLM, TTS, VAD, embedding engine wrappers, model profiles
  pipeline/    Orchestrator, sentence detector, text sanitizer
  rag/         Vector index, BM25, hybrid retriever, document processor
  core/        Types, ring buffer, memory pool, hardware profiler
  audio/       CoreAudio mic/speaker I/O
  tools/       Tool calling engine with JSON schema definitions
  bench/       Benchmark harness (STT, LLM, TTS, E2E, tools, RAG, memory)
  actions/     43 macOS action implementations (AppleScript + shell)
  api/         C API (rcli_api.h) — public engine interface
  cli/         TUI dashboard (FTXUI), CLI commands
  models/      Model registries (LLM, TTS, STT) with on-demand download
scripts/       setup.sh, download_models.sh, package.sh
Formula/       Homebrew formula (self-hosted tap)

Build from Source

git clone https://github.com/RunanywhereAI/RCLI.git && cd RCLI
bash scripts/setup.sh              # clone llama.cpp + sherpa-onnx
bash scripts/download_models.sh    # download models (~1GB)
mkdir -p build && cd build
cmake .. -DCMAKE_BUILD_TYPE=Release
cmake --build . -j$(sysctl -n hw.ncpu)
./rcli

Dependencies

All vendored or CMake-fetched. No external package manager required.

Dependency Purpose
llama.cpp LLM + embedding inference with Metal GPU
sherpa-onnx STT / TTS / VAD via ONNX Runtime
USearch HNSW vector index for RAG
FTXUI Terminal UI library
CoreAudio, Metal, Accelerate, IOKit macOS system frameworks

Requires CMake 3.15+ and Apple Clang (C++17).

CLI Reference ``` rcli Interactive TUI (push-to-talk + text + trace) rcli listen Continuous voice mode (always listening) rcli ask One-shot text command rcli actions [name] List actions or show detail for one rcli action [json] Execute action directly rcli rag ingest Index documents for RAG rcli rag query Query indexed documents rcli rag status Show index info rcli models [llm|stt|tts] Manage AI models rcli voices Manage TTS voices rcli upgrade-llm Download a larger LLM rcli upgrade-stt Download Parakeet TDT rcli bench [--suite ...] Run benchmarks rcli mic-test Test microphone audio levels rcli cleanup Remove unused models rcli setup Download default models (~1GB) rcli info Show engine info and installed models Options: --models Models directory (default: ~/Library/RCLI/models) --rag Load RAG index for document-grounded answers --gpu-layers GPU layers for LLM (default: 99 = all) --ctx-size LLM context size (default: 4096) --no-speak Text output only (no TTS playback) --verbose, -v Show debug logs --suite Benchmark suite: stt, llm, tts, e2e, tools, rag, memory, all --all-llm Benchmark all installed LLM models --all-tts Benchmark all installed TTS voices --output Export benchmark results to JSON ``` </details> ## Contributing Contributions are welcome. See [CONTRIBUTING.md](/RCLI/CONTRIBUTING.md) for build instructions, architecture overview, and how to add new actions, models, or voices. ## License MIT License. See [LICENSE](/RCLI/LICENSE) for details.

Powered by RunAnywhere, Inc.