On-device AI for React Native. Run LLMs, Speech-to-Text, Text-to-Speech, and Voice AI locally with privacy-first, offline-capable inference.
EventBus| Component | Minimum | Recommended |
|---|---|---|
| React Native | 0.71+ | 0.74+ |
| iOS | 15.1+ | 17.0+ |
| Android | API 24 (7.0+) | API 28+ |
| Node.js | 18+ | 20+ |
| Xcode | 15+ | 16+ |
| Android Studio | Hedgehog+ | Latest |
| RAM | 3GB | 6GB+ for 7B models |
| Storage | Variable | Models: 200MB–8GB |
Apple Silicon devices (M1/M2/M3, A14+) and Android devices with 6GB+ RAM are recommended. Metal GPU acceleration provides 3-5x speedup on iOS.
This SDK uses a modular multi-package architecture. Install only the packages you need:
| Package | Description | Required |
|---|---|---|
@runanywhere/core |
Core SDK infrastructure, public API, events, model registry | Yes |
@runanywhere/llamacpp |
LlamaCPP backend for LLM text generation (GGUF models) | For LLM |
@runanywhere/onnx |
ONNX Runtime backend for STT/TTS (Whisper, Piper) | For Voice |
npm install @runanywhere/core @runanywhere/llamacpp @runanywhere/onnx
# or
yarn add @runanywhere/core @runanywhere/llamacpp @runanywhere/onnx
npm install @runanywhere/core @runanywhere/llamacpp
npm install @runanywhere/core @runanywhere/onnx
cd ios && pod install && cd ..
No additional setup required. Native libraries are automatically downloaded during the Gradle build.
import { RunAnywhere, SDKEnvironment, ModelCategory } from '@runanywhere/core';
import { LlamaCPP } from '@runanywhere/llamacpp';
import { ONNX, ModelArtifactType } from '@runanywhere/onnx';
// Initialize SDK (development mode - no API key needed)
await RunAnywhere.initialize({
environment: SDKEnvironment.Development,
});
// Register LlamaCpp module and add LLM models
LlamaCPP.register();
await LlamaCPP.addModel({
id: 'smollm2-360m-q8_0',
name: 'SmolLM2 360M Q8_0',
url: 'https://huggingface.co/prithivMLmods/SmolLM2-360M-GGUF/resolve/main/SmolLM2-360M.Q8_0.gguf',
memoryRequirement: 500_000_000,
});
// Register ONNX module and add STT/TTS models
ONNX.register();
await ONNX.addModel({
id: 'sherpa-onnx-whisper-tiny.en',
name: 'Sherpa Whisper Tiny (ONNX)',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
modality: ModelCategory.SpeechRecognition,
artifactType: ModelArtifactType.TarGzArchive,
memoryRequirement: 75_000_000,
});
console.log('SDK initialized');
// Download model with progress tracking
await RunAnywhere.downloadModel('smollm2-360m-q8_0', (progress) => {
console.log(`Download: ${(progress.progress * 100).toFixed(1)}%`);
});
// Load model into memory
const modelInfo = await RunAnywhere.getModelInfo('smollm2-360m-q8_0');
if (modelInfo?.localPath) {
await RunAnywhere.loadModel(modelInfo.localPath);
}
// Check if model is loaded
const isLoaded = await RunAnywhere.isModelLoaded();
console.log('Model loaded:', isLoaded);
// Simple chat
const response = await RunAnywhere.chat('What is the capital of France?');
console.log(response); // "Paris is the capital of France."
// With options
const result = await RunAnywhere.generate(
'Explain quantum computing in simple terms',
{
maxTokens: 200,
temperature: 0.7,
systemPrompt: 'You are a helpful assistant.',
}
);
console.log('Response:', result.text);
console.log('Speed:', result.performanceMetrics.tokensPerSecond, 'tok/s');
console.log('Latency:', result.latencyMs, 'ms');
const streamResult = await RunAnywhere.generateStream(
'Write a short poem about AI',
{ maxTokens: 150 }
);
// Display tokens in real-time
for await (const token of streamResult.stream) {
process.stdout.write(token);
}
// Get final metrics
const metrics = await streamResult.result;
console.log('\nSpeed:', metrics.performanceMetrics.tokensPerSecond, 'tok/s');
// Download and load STT model
await RunAnywhere.downloadModel('sherpa-onnx-whisper-tiny.en');
const sttModel = await RunAnywhere.getModelInfo('sherpa-onnx-whisper-tiny.en');
await RunAnywhere.loadSTTModel(sttModel.localPath, 'whisper');
// Transcribe audio file
const result = await RunAnywhere.transcribeFile(audioFilePath, {
language: 'en',
});
console.log('Transcription:', result.text);
console.log('Confidence:', result.confidence);
// Download and load TTS model
await RunAnywhere.downloadModel('vits-piper-en_US-lessac-medium');
const ttsModel = await RunAnywhere.getModelInfo('vits-piper-en_US-lessac-medium');
await RunAnywhere.loadTTSModel(ttsModel.localPath, 'piper');
// Synthesize speech
const output = await RunAnywhere.synthesize(
'Hello from the RunAnywhere SDK.',
{ rate: 1.0, pitch: 1.0, volume: 1.0 }
);
// output.audio contains base64-encoded float32 PCM
// output.sampleRate, output.numSamples, output.duration
The RunAnywhere SDK follows a modular, provider-based architecture with a shared C++ core:
┌─────────────────────────────────────────────────────────────────┐
│ Your React Native App │
├─────────────────────────────────────────────────────────────────┤
│ @runanywhere/core (TypeScript API) │
│ ┌──────────────┐ ┌───────────────┐ ┌──────────────────────┐ │
│ │ RunAnywhere │ │ EventBus │ │ ModelRegistry │ │
│ │ (public API) │ │ (events, │ │ (model discovery, │ │
│ │ │ │ callbacks) │ │ download, storage) │ │
│ └──────────────┘ └───────────────┘ └──────────────────────┘ │
├────────────┬─────────────────────────────────────┬──────────────┤
│ │ │ │
│ ┌─────────▼─────────┐ ┌────────────▼────────────┐ │
│ │ @runanywhere/ │ │ @runanywhere/onnx │ │
│ │ llamacpp │ │ (STT/TTS/VAD) │ │
│ │ (LLM/GGUF) │ │ │ │
│ └─────────┬─────────┘ └────────────┬────────────┘ │
├────────────┼─────────────────────────────────────┼──────────────┤
│ │ Nitrogen/Nitro JSI │ │
│ │ (Native Bridge Layer) │ │
├────────────┼─────────────────────────────────────┼──────────────┤
│ ┌─────────▼──────────────────────────────────────▼───────────┐ │
│ │ runanywhere-commons (C++) │ │
│ │ ┌────────────────┐ ┌────────────────┐ ┌───────────────┐ │ │
│ │ │ RACommons │ │ RABackend │ │ RABackendONNX │ │ │
│ │ │ (Core Engine) │ │ LLAMACPP │ │ (Sherpa-ONNX) │ │ │
│ │ └────────────────┘ └────────────────┘ └───────────────┘ │ │
│ └────────────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────────┘
| Component | Description |
|---|---|
| RunAnywhere | Main SDK singleton providing all public methods |
| EventBus | Event subscription system for SDK events (initialization, generation, model, voice) |
| ModelRegistry | Manages model metadata, discovery, and download tracking |
| ServiceContainer | Dependency injection for internal services |
| FileSystem | Cross-platform file operations for model storage |
| DownloadService | Model download with progress, resume, and extraction |
| Framework | Size | Provides |
|---|---|---|
RACommons.xcframework / librac_commons.so |
~2MB | Core C++ commons, registries, events |
RABackendLLAMACPP.xcframework / librunanywhere_llamacpp.so |
~15-25MB | LLM capability (GGUF models) |
RABackendONNX.xcframework / librunanywhere_onnx.so |
~50-70MB | STT, TTS, VAD (ONNX models) |
// Development mode (default) - no API key needed
await RunAnywhere.initialize({
environment: SDKEnvironment.Development,
});
// Production mode - requires API key
await RunAnywhere.initialize({
apiKey: '<YOUR_API_KEY>',
baseURL: 'https://api.runanywhere.ai',
environment: SDKEnvironment.Production,
});
| Environment | Description |
|---|---|
.Development |
Verbose logging, local backend, no auth required |
.Staging |
Testing with real services |
.Production |
Minimal logging, full authentication, telemetry |
const options: GenerationOptions = {
maxTokens: 256, // Maximum tokens to generate
temperature: 0.7, // Sampling temperature (0.0–2.0)
topP: 0.95, // Top-p sampling parameter
stopSequences: ['END'], // Stop generation at these sequences
systemPrompt: 'You are a helpful assistant.',
};
The SDK provides structured error handling through SDKError:
import { SDKError, SDKErrorCode, isSDKError } from '@runanywhere/core';
try {
const response = await RunAnywhere.generate('Hello!');
} catch (error) {
if (isSDKError(error)) {
switch (error.code) {
case SDKErrorCode.notInitialized:
console.log('SDK not initialized. Call RunAnywhere.initialize() first.');
break;
case SDKErrorCode.modelNotFound:
console.log('Model not found. Download it first.');
break;
case SDKErrorCode.insufficientMemory:
console.log('Not enough memory. Try a smaller model.');
break;
default:
console.log('Error:', error.message);
}
}
}
| Category | Description |
|---|---|
general |
General SDK errors |
llm |
LLM generation errors |
stt |
Speech-to-text errors |
tts |
Text-to-speech errors |
vad |
Voice activity detection errors |
voiceAgent |
Voice pipeline errors |
download |
Model download errors |
network |
Network-related errors |
authentication |
Auth and API key errors |
import { LogLevel, SDKLogger } from '@runanywhere/core';
// Set minimum log level
RunAnywhere.setLogLevel(LogLevel.Debug); // debug, info, warning, error, fault
// Create a custom logger
const logger = new SDKLogger('MyApp');
logger.info('App started');
logger.debug('Debug info', { modelId: 'llama-2' });
// Subscribe to generation events
const unsubscribe = RunAnywhere.events.onGeneration((event) => {
switch (event.type) {
case 'started':
console.log('Generation started');
break;
case 'tokenGenerated':
console.log('Token:', event.token);
break;
case 'completed':
console.log('Done:', event.response.text);
break;
case 'failed':
console.error('Error:', event.error);
break;
}
});
// Subscribe to model events
RunAnywhere.events.onModel((event) => {
if (event.type === 'downloadProgress') {
console.log(`Progress: ${(event.progress * 100).toFixed(1)}%`);
}
});
// Unsubscribe when done
unsubscribe();
| Model Size | RAM Required | Use Case |
|---|---|---|
| 360M–500M (Q8) | ~500MB | Fast, lightweight chat |
| 1B–3B (Q4/Q6) | 1–2GB | Balanced quality/speed |
| 7B (Q4) | 4–5GB | High quality, slower |
// Unload models when not in use
await RunAnywhere.unloadModel();
// Check storage before downloading
const storageInfo = await RunAnywhere.getStorageInfo();
if (storageInfo.freeSpace > modelSize) {
// Safe to download
}
// Clean up temporary files
await RunAnywhere.clearCache();
await RunAnywhere.cleanTempFiles();
Symptoms: Download stuck or fails with network error
Solutions:
Symptoms: App crashes during model loading or inference
Solutions:
RunAnywhere.unloadModel()Symptoms: Generation takes 10+ seconds per token
Solutions:
maxTokens for shorter responsesSymptoms: modelNotFound error even though download completed
Solutions:
await RunAnywhere.getAvailableModels()Symptoms: Native module not available error
Solutions:
pod install was run for iOSnpx react-native run-ios / run-androidnpx react-native start --reset-cacheA: Only for initial model download. Once downloaded, all inference runs 100% on-device with no network required.
A: Varies by model:
A: No. All inference happens on-device. Only anonymous analytics (latency, error rates) are collected in production mode, and this can be disabled.
A: iOS 15.1+ (iPhone/iPad) and Android 7.0+ (API 24+). Modern devices with 6GB+ RAM are recommended for larger models.
A: Yes, any GGUF model works with the LlamaCPP backend. ONNX models work for STT/TTS.
chat() and generate()?A: chat() is a convenience method that returns just the text. generate() returns full metrics (tokens, latency, etc.).
Contributions are welcome. This section explains how to set up your development environment to build the SDK from source and test your changes with the sample app.
The SDK depends on native C++ libraries from runanywhere-commons. The setup script builds these locally so you can develop and test the SDK end-to-end.
# 1. Clone the repository
git clone https://github.com/RunanywhereAI/runanywhere-sdks.git
cd runanywhere-sdks/sdk/runanywhere-react-native
# 2. Run first-time setup (~15-20 minutes)
./scripts/build-react-native.sh --setup
# 3. Install JavaScript dependencies
yarn install
What the setup script does:
RACommons.xcframework and JNI librariesRABackendLLAMACPP (LLM backend)RABackendONNX (STT/TTS/VAD backend)ios/Binaries/ and JNI libs to android/src/main/jniLibs/.testlocal marker files (enables local library consumption)The SDK has two modes:
| Mode | Description |
|---|---|
| Local | Uses frameworks/JNI libs from package directories (for development) |
| Remote | Downloads from GitHub releases during pod install/Gradle sync (for end users) |
When you run --setup, the script automatically enables local mode via:
.testlocal marker files in ios/ directoriesRA_TEST_LOCAL=1 environment variable or runanywhere.testLocal=true in gradle.propertiesThe recommended way to test SDK changes is with the sample app:
# 1. Ensure SDK is set up (from previous step)
# 2. Navigate to the sample app
cd ../../examples/react-native/RunAnywhereAI
# 3. Install sample app dependencies
npm install
# 4. iOS: Install pods and run
cd ios && pod install && cd ..
npx react-native run-ios
# 5. Android: Run directly
npx react-native run-android
You can open the sample app in VS Code or Cursor for development.
The sample app’s package.json uses workspace dependencies to reference the local SDK packages:
Sample App → Local RN SDK Packages → Local Frameworks/JNI libs
↑
Built by build-react-native.sh --setup
After modifying TypeScript SDK code:
# Type check all packages
yarn typecheck
# Run ESLint
yarn lint
# Build all packages
yarn build
After modifying runanywhere-commons (C++ code):
cd sdk/runanywhere-react-native
./scripts/build-react-native.sh --local --rebuild-commons
| Command | Description |
|---|---|
--setup |
First-time setup: downloads deps, builds all frameworks, enables local mode |
--local |
Use local frameworks from package directories |
--remote |
Use remote frameworks from GitHub releases |
--rebuild-commons |
Rebuild runanywhere-commons from source |
--ios |
Build for iOS only |
--android |
Build for Android only |
--clean |
Clean build artifacts before building |
--abis=ABIS |
Android ABIs to build (default: arm64-v8a) |
We use ESLint and Prettier for code formatting:
# Run linter
yarn lint
# Auto-fix linting issues
yarn lint:fix
git checkout -b feature/my-featureyarn typecheckyarn lintOpen an issue on GitHub with:
RunAnywhere.versionMIT License. See LICENSE for details.