runanywhere-sdks

RunAnywhere Flutter SDK

On-Device AI for Flutter Applications
Run LLMs, Speech-to-Text, Text-to-Speech, and Voice AI pipelines locally—privacy-first, offline-capable, production-ready.

Quick Links

Architecture Overview — How the SDK works
Quick Start — Get running in 5 minutes
API Reference — Complete public API documentation
Flutter Starter Example — Minimal starter project
FAQ — Common questions answered
Troubleshooting — Problems & solutions
Contributing — How to contribute

Features

Large Language Models (LLM)

On-device text generation with streaming support
LlamaCPP backend for GGUF models with Metal/GPU acceleration
Customizable generation parameters (temperature, max tokens, etc.)
Support for thinking/reasoning models (<think>...</think> patterns)
Token-by-token streaming for responsive UX

Speech-to-Text (STT)

Real-time streaming transcription
Batch audio transcription with Whisper models via ONNX Runtime
Multi-language support
Confidence scores and timestamps

Text-to-Speech (TTS)

Neural voice synthesis with Piper TTS
System voices fallback via flutter_tts
Customizable voice, pitch, rate, and volume
PCM audio output for flexible playback

Voice Activity Detection (VAD)

Energy-based speech detection with Silero VAD
Configurable sensitivity thresholds
Real-time audio stream processing

Voice Agent Pipeline

Full VAD → STT → LLM → TTS orchestration
Complete voice conversation flow
Session-based management with events

Infrastructure

Automatic model discovery and download with progress tracking
Comprehensive event system via EventBus
Structured logging with SDKLogger
Platform-optimized native binaries (XCFrameworks + JNI)

System Requirements

Component	Minimum	Recommended
Flutter	3.10.0+	3.24.0+
Dart	3.0.0+	3.5.0+
iOS	15.1+	16.0+
Android	API 24 (7.0)	API 28+
Xcode	15.0+	15.0+
RAM	2GB	4GB+ for larger models
Storage	Variable	Models: 100MB–8GB

Note: ARM64 devices are recommended for best performance. Metal GPU acceleration on iOS and NEON SIMD on Android provide significant speedups over CPU-only inference.

Installation

Add Dependencies

Add the packages you need to your pubspec.yaml:

Core + LlamaCpp (LLM):

dependencies:
  runanywhere: ^0.19.13
  runanywhere_llamacpp: ^0.19.13

Core + ONNX (STT/TTS/VAD):

dependencies:
  runanywhere: ^0.19.13
  runanywhere_onnx: ^0.19.13

All Backends (LLM + STT + TTS + VAD):

dependencies:
  runanywhere: ^0.19.13
  runanywhere_llamacpp: ^0.19.13
  runanywhere_onnx: ^0.19.13

Then run:

flutter pub get

Platform Setup

iOS Setup (Required)

After adding the packages, update your iOS Podfile:

1. Update ios/Podfile:

# Set minimum iOS version to 15.1
platform :ios, '15.1'

target 'Runner' do
  # REQUIRED: Add static linkage
  use_frameworks! :linkage => :static

  flutter_install_all_ios_pods File.dirname(File.realpath(__FILE__))
end

post_install do |installer|
  installer.pods_project.targets.each do |target|
    flutter_additional_ios_build_settings(target)
    target.build_configurations.each do |config|
      config.build_settings['IPHONEOS_DEPLOYMENT_TARGET'] = '15.1'
      # Required for microphone permission (STT/Voice features)
      config.build_settings['GCC_PREPROCESSOR_DEFINITIONS'] ||= [
        '$(inherited)',
        'PERMISSION_MICROPHONE=1',
      ]
    end
  end
end

Important: Without use_frameworks! :linkage => :static, you will see “symbol not found” errors at runtime.

2. Update ios/Runner/Info.plist:

Add microphone permission for STT/Voice features:

<key>NSMicrophoneUsageDescription</key>
<string>This app needs microphone access for speech recognition</string>

3. Run pod install:

cd ios && pod install && cd ..

Android Setup

Add microphone permission to android/app/src/main/AndroidManifest.xml:

<uses-permission android:name="android.permission.RECORD_AUDIO" />

Quick Start

1. Initialize the SDK

import 'package:runanywhere/runanywhere.dart';
import 'package:runanywhere_llamacpp/runanywhere_llamacpp.dart';
import 'package:runanywhere_onnx/runanywhere_onnx.dart';

void main() async {
  WidgetsFlutterBinding.ensureInitialized();

  // 1. Initialize SDK (development mode - no API key needed)
  await RunAnywhere.initialize();

  // 2. Register backend modules
  await LlamaCpp.register();  // LLM backend (GGUF models)
  await Onnx.register();      // STT/TTS backend (Whisper, Piper)

  print('RunAnywhere SDK initialized: v${RunAnywhere.version}');

  runApp(const MyApp());
}

2. Register Models

// Register an LLM model
RunAnywhere.models.register(
  id: 'smollm2-360m-q8_0',
  name: 'SmolLM2 360M Q8_0',
  url: Uri.parse('https://huggingface.co/prithivMLmods/SmolLM2-360M-GGUF/resolve/main/SmolLM2-360M.Q8_0.gguf'),
  framework: InferenceFramework.INFERENCE_FRAMEWORK_LLAMA_CPP,
  memoryRequirement: 500000000,
);

// Register an STT model
RunAnywhere.models.register(
  id: 'sherpa-onnx-whisper-tiny.en',
  name: 'Whisper Tiny English',
  url: Uri.parse('https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz'),
  framework: InferenceFramework.INFERENCE_FRAMEWORK_SHERPA,
  modality: ModelCategory.MODEL_CATEGORY_SPEECH_RECOGNITION,
);

// Register a TTS voice
RunAnywhere.models.register(
  id: 'vits-piper-en_US-lessac-medium',
  name: 'Piper US English',
  url: Uri.parse('https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-lessac-medium.tar.gz'),
  framework: InferenceFramework.INFERENCE_FRAMEWORK_SHERPA,
  modality: ModelCategory.MODEL_CATEGORY_SPEECH_SYNTHESIS,
);

3. Download & Load Models

// Download with progress
final progressStream = RunAnywhere.downloads.start('smollm2-360m-q8_0');
await for (final p in progressStream) {
  print('Stage: ${p.stage}, progress: ${(p.stageProgress * 100).toStringAsFixed(1)}%');
  if (p.stage == DownloadStage.DOWNLOAD_STAGE_COMPLETED) break;
}

// Load the model
await RunAnywhere.llm.load('smollm2-360m-q8_0');
print('Model loaded: ${RunAnywhere.isLLMModelLoaded}');

4. Generate Text

// Non-streaming with full metrics
final result = await RunAnywhere.llm.generate(
  'Explain quantum computing in simple terms',
  LLMGenerationOptions(
    maxTokens: 200,
    temperature: 0.7,
  ),
);
print('Response: ${result.text}');
print('Speed: ${result.tokensPerSecond.toStringAsFixed(1)} tok/s');

5. Streaming Generation

final stream = RunAnywhere.llm.generateStream(
  'Write a short poem about AI',
  LLMGenerationOptions(maxTokens: 150),
);

// Display tokens in real-time
await for (final event in stream) {
  if (event.isFinal) break;
  if (event.token.isNotEmpty) {
    stdout.write(event.token);
  }
}

6. Speech-to-Text

// Load STT model
await RunAnywhere.stt.load('sherpa-onnx-whisper-tiny.en');

// Transcribe audio data (PCM16 at 16kHz mono)
final result = await RunAnywhere.stt.transcribe(audioBytes);
print('Text: ${result.text}');
print('Confidence: ${result.confidence}');

7. Text-to-Speech

// Load TTS voice
await RunAnywhere.tts.loadVoice('vits-piper-en_US-lessac-medium');

// Synthesize speech
final ttsResult = await RunAnywhere.tts.synthesize(
  'Hello! Welcome to RunAnywhere.',
  TTSOptions(rate: 1.0, pitch: 1.0),
);
// ttsResult.audio is Uint8List PCM16
// ttsResult.sampleRate is typically 22050 Hz

8. Voice Agent Pipeline

// Ensure all components are loaded
await RunAnywhere.stt.load('sherpa-onnx-whisper-tiny.en');
await RunAnywhere.llm.load('smollm2-360m-q8_0');
await RunAnywhere.tts.loadVoice('vits-piper-en_US-lessac-medium');

// Initialize voice pipeline with the loaded models
await RunAnywhere.voice.initializeWithLoadedModels();

// Subscribe to the voice event stream
final sub = RunAnywhere.voice.eventStream().listen((event) {
  if (event.hasUserSaid()) {
    print('User: ${event.userSaid.text}');
  } else if (event.hasAssistantToken()) {
    stdout.write(event.assistantToken.text);
  }
});

// Cancel when done
await sub.cancel();

Architecture Overview

The RunAnywhere Flutter SDK follows a modular, provider-based architecture with a C++ commons layer for cross-platform performance:

┌─────────────────────────────────────────────────────────────────┐
│                      Your Flutter Application                     │
├─────────────────────────────────────────────────────────────────┤
│                    RunAnywhere Flutter SDK                        │
│  ┌──────────────┐  ┌───────────────┐  ┌──────────────────────┐  │
│  │ Public APIs  │  │  EventBus     │  │  ModelRegistry       │  │
│  │ (generate,   │  │  (events,     │  │  (model discovery,   │  │
│  │  transcribe) │  │   lifecycle)  │  │   download)          │  │
│  └──────────────┘  └───────────────┘  └──────────────────────┘  │
├─────────────────────────────────────────────────────────────────┤
│                    Native Bridge Layer (FFI)                      │
│                  DartBridge → C++ Commons APIs                    │
├────────────┬─────────────┬──────────────────────────────────────┤
│  LlamaCpp  │    ONNX     │        Future Backends...            │
│  Backend   │   Backend   │                                       │
│  (LLM)     │ (STT/TTS)   │                                       │
└────────────┴─────────────┴──────────────────────────────────────┘

Key Components

Component	Description
RunAnywhere	Singleton entry point providing 20 capability accessors (llm, stt, tts, vad, vlm, voice, voice, models, downloads, tools, rag, …)
EventBus	Pure `dart:async` broadcast stream for SDK events (no `rxdart` dependency)
DartBridge	FFI bridge slices to the C++ commons library (33 `dart_bridge_*.dart` files)
ModelRegistry	Model discovery, registration, and persistence via the C++ registry

Package Composition

Package	Size	Provides
`runanywhere`	~5MB	Core SDK, capability surface, registries, events
`runanywhere_llamacpp`	~15-25MB	LLM + VLM (GGUF models)
`runanywhere_onnx`	~50-70MB	STT, TTS, VAD (Sherpa/ONNX models)
`runanywhere_genie`	varies	Qualcomm Genie NPU LLM (Android/Snapdragon only)

Configuration

SDK Initialization Parameters

// Development mode (default) - no API key needed
await RunAnywhere.initialize();

// Production mode - requires API key and backend URL
await RunAnywhere.initialize(
  apiKey: '<YOUR_API_KEY>',
  baseURL: 'https://api.runanywhere.ai',
  environment: SDKEnvironment.SDK_ENVIRONMENT_PRODUCTION,
);

Environment Modes

Environment	Description
`SDKEnvironment.SDK_ENVIRONMENT_DEVELOPMENT`	Verbose logging, local-only, no auth required
`SDKEnvironment.SDK_ENVIRONMENT_STAGING`	Testing with real services
`SDKEnvironment.SDK_ENVIRONMENT_PRODUCTION`	Minimal logging, full authentication, telemetry

Generation Options

final options = LLMGenerationOptions(
  maxTokens: 256,              // Maximum tokens to generate
  temperature: 0.7,            // Sampling temperature (0.0–2.0)
  topP: 0.95,                  // Top-p sampling parameter
  stopSequences: ['END'],      // Stop generation at these sequences
  systemPrompt: 'You are a helpful assistant.',
);

Error Handling

The SDK provides comprehensive error handling through SDKError:

try {
  final result = await RunAnywhere.llm.generate(
    'Hello!',
    LLMGenerationOptions(maxTokens: 64),
  );
} on SDKException catch (error) {
  // SDKException is a proto-backed unified error type with 40+ factory constructors.
  // Inspect error.message and error.errorCode (proto enum) to branch.
  print('SDK error [${error.errorCode}]: ${error.message}');
}

Error Categories

Category	Description
`general`	General SDK errors
`llm`	LLM generation errors
`stt`	Speech-to-text errors
`tts`	Text-to-speech errors
`voice`	Voice pipeline errors
`download`	Model download errors
`validation`	Input validation errors

Logging & Observability

// Subscribe to all SDK events
RunAnywhere.events.allEvents.listen((event) {
  print('Event: $event');
});

Event Types

Event	Description
`SDKInitializationStarted`	SDK initialization began
`SDKInitializationCompleted`	SDK initialized successfully
`SDKModelEvent.loadStarted`	Model loading started
`SDKModelEvent.loadCompleted`	Model loaded successfully
`SDKModelEvent.downloadProgress`	Download progress update

Performance & Best Practices

Model Selection

Model Size	RAM Required	Use Case
360M–500M (Q8)	~500MB	Fast, lightweight chat
1B–3B (Q4/Q6)	1–2GB	Balanced quality/speed
7B (Q4)	4–5GB	High quality, slower

Memory Management

// Unload models when not in use
await RunAnywhere.llm.unload();
await RunAnywhere.stt.unload();
await RunAnywhere.tts.unloadVoice();

// Check storage before downloading
final storageInfo = await RunAnywhere.downloads.getStorageInfo();
print('Available: ${storageInfo.freeBytes} bytes');

// Delete unused models
await RunAnywhere.downloads.delete('old-model-id');

Best Practices

Prefer streaming for better perceived latency
Unload unused models to free memory
Handle errors gracefully with user-friendly messages
Test on physical devices — emulators may be slow
Use smaller models for faster iteration during development
Register models at startup before calling availableModels()

Troubleshooting

Model Download Fails

Symptoms: Download stuck or fails with network error

Solutions:

Check internet connection
Verify sufficient storage (need 2x model size for extraction)
Try on WiFi instead of cellular
Check if model URL is accessible

Out of Memory

Symptoms: App crashes during model loading or inference

Solutions:

Use a smaller model (360M instead of 7B)
Unload unused models first
Close other memory-intensive apps
Test on device with more RAM

iOS: Symbol Not Found

Symptoms: Runtime crash with “symbol not found” error

Solutions:

Ensure use_frameworks! :linkage => :static in Podfile
Run cd ios && pod install --repo-update
Clean and rebuild: flutter clean && flutter run

Android: Library Load Failed

Symptoms: UnsatisfiedLinkError or library load failure

Solutions:

Ensure NDK is properly installed
Check that jniLibs folder contains .so files
Rebuild native libraries with ./scripts/build/build-core-android.sh <ABI> from the repo root

Model Not Found After Download

Symptoms: modelNotFound error even though download completed

Solutions:

Call await RunAnywhere.refreshModelRegistry() to refresh the registry
Check the model path under the SDK model directory
Delete and re-download the model

FAQ

Q: Do I need an internet connection?

A: Only for initial model download. Once downloaded, all inference runs 100% on-device with no network required.

Q: How much storage do models need?

A: Varies by model:

Small LLMs (360M–1B): 200MB–1GB
Medium LLMs (3B–7B Q4): 2–5GB
STT models (Whisper): 50–250MB
TTS voices (Piper): 20–100MB

Q: Is user data sent to the cloud?

A: No. All inference happens on-device. Only anonymous analytics (latency, error rates) are collected in production mode, and this can be disabled.

Q: Which devices are supported?

A: iOS 15.1+ and Android API 24+. ARM64 devices are recommended for best performance.

Q: Can I use custom models?

A: Yes! Any GGUF model works with LlamaCpp backend. ONNX models work for STT/TTS with the appropriate format.

Q: How do I test on iOS Simulator?

A: The SDK supports both arm64 and x86_64 simulators, but performance will be significantly slower than physical devices.

Local Development & Contributing

Contributions are welcome. This section explains how to set up your development environment to build the SDK from source and test your changes with the sample app.

Prerequisites

Flutter 3.24.0 or later
Xcode 15+ (for iOS builds)
Android Studio with NDK 27.0.12077973 (for Android builds)
CMake 3.22+

First-Time Setup (Build from Source)

The SDK depends on native C++ libraries from runanywhere-commons. The setup script builds these locally so you can develop and test the SDK end-to-end.

# 1. Clone the repository
git clone https://github.com/RunanywhereAI/runanywhere-sdks.git
cd runanywhere-sdks

# 2. Build the native artifacts (from repo root)
./sdk/runanywhere-swift/scripts/build-core-xcframework.sh                # iOS XCFrameworks
./scripts/build/build-core-android.sh arm64-v8a          # Android .so files

# 3. Bootstrap the Flutter workspace
cd sdk/runanywhere-flutter
melos bootstrap

What the build scripts do:

sdk/runanywhere-swift/scripts/build-core-xcframework.sh builds RACommons, RABackendLLAMACPP, RABackendONNX, and RABackendSherpa XCFrameworks and stages them into sdk/runanywhere-flutter/packages/*/ios/Frameworks/.
scripts/build/build-core-android.sh <ABI> builds librac_commons.so + per-backend .so libraries and stages them into sdk/runanywhere-flutter/packages/*/android/src/main/jniLibs/<ABI>/.

Local vs Remote Natives

Mode	Description
Local (default)	`runanywhere.useLocalNatives=true` in `gradle.properties`; iOS podspecs vendor local `Frameworks/`
Remote	CI override `-Prunanywhere.useLocalNatives=false`; downloads from GitHub Releases

Testing with the Flutter Sample App

The recommended way to test SDK changes is with the sample app:

# 1. Ensure SDK is set up (from previous step)

# 2. Navigate to the sample app
cd ../../examples/flutter/RunAnywhereAI

# 3. Install dependencies
flutter pub get

# 4. Run on iOS
cd ios && pod install && cd ..
flutter run

# 5. Or run on Android
flutter run

You can open the sample app in Android Studio or VS Code for development.

The sample app’s pubspec.yaml uses path dependencies to reference the local SDK packages:

Sample App → Local Flutter SDK Packages → Local Frameworks/JNI libs
                                                ↑
                               Built by scripts/build/build-core-*.sh

Development Workflow

After modifying Dart SDK code:

Changes are picked up automatically when you run flutter run.

After modifying runanywhere-commons (C++ code):

# From repo root
./sdk/runanywhere-swift/scripts/build-core-xcframework.sh
./scripts/build/build-core-android.sh arm64-v8a

Build Script Reference

Script	Description
`sdk/runanywhere-swift/scripts/build-core-xcframework.sh`	iOS: builds `RACommons`/`RABackendLLAMACPP`/`RABackendONNX`/`RABackendSherpa` XCFrameworks and stages into Flutter packages’ `ios/Frameworks/`.
`scripts/build/build-core-android.sh <ABI>`	Android: builds backend `.so` files and stages into Flutter packages’ `android/src/main/jniLibs/<ABI>/`.
`sdk/runanywhere-web/scripts/build-core-wasm.sh`	(Not used by Flutter; targets the Web SDK.)
`sdk/runanywhere-flutter/scripts/package-sdk.sh`	Validate all 4 Flutter packages via `pub publish --dry-run`.

Code Style

We follow standard Dart style guidelines:

# Format code
dart format lib/ test/

# Analyze code
flutter analyze

# Fix issues automatically
dart fix --apply

Pull Request Process

Fork the repository
Create a feature branch: git checkout -b feature/my-feature
Make your changes with tests
Ensure all tests pass: flutter test
Run analyzer: flutter analyze
Commit with a descriptive message
Push and open a Pull Request

Reporting Issues

Open an issue on GitHub with:

SDK version: RunAnywhere.version
Flutter version: flutter --version
Platform and OS version
Device model
Steps to reproduce
Expected vs actual behavior
Relevant logs (with sensitive info redacted)

Support

Discord: discord.gg/N359FBbDVd
GitHub Issues: github.com/RunanywhereAI/runanywhere-sdks/issues
Email: san@runanywhere.ai
Twitter: @RunanywhereAI

License

Apache License 2.0 — See LICENSE for details.

For commercial licensing inquiries, contact san@runanywhere.ai.

API Reference — Complete public API documentation
Flutter Starter Example — Minimal starter project
Swift SDK — iOS/macOS native SDK
Kotlin SDK — Android native SDK
React Native SDK — Cross-platform option

Packages on pub.dev

runanywhere — Core SDK
runanywhere_llamacpp — LLM backend
runanywhere_onnx — STT/TTS/VAD backend

This site is open source. Improve this page.

runanywhere-sdks

RunAnywhere Flutter SDK

Quick Links

Features

Large Language Models (LLM)

Speech-to-Text (STT)

Text-to-Speech (TTS)

Voice Activity Detection (VAD)

Voice Agent Pipeline

Infrastructure

System Requirements

Installation

Add Dependencies

Platform Setup

iOS Setup (Required)

Android Setup

Quick Start

1. Initialize the SDK

2. Register Models

3. Download & Load Models

4. Generate Text

5. Streaming Generation

6. Speech-to-Text

7. Text-to-Speech

8. Voice Agent Pipeline

Architecture Overview

Key Components

Package Composition

Configuration

SDK Initialization Parameters

Environment Modes

Generation Options

Error Handling

Error Categories

Logging & Observability

Subscribe to Events

Event Types

Performance & Best Practices

Model Selection

Memory Management

Best Practices

Troubleshooting

Model Download Fails

Out of Memory

iOS: Symbol Not Found

Android: Library Load Failed

Model Not Found After Download

FAQ

Q: Do I need an internet connection?

Q: How much storage do models need?

Q: Is user data sent to the cloud?

Q: Which devices are supported?

Q: Can I use custom models?

Q: How do I test on iOS Simulator?

Local Development & Contributing

Prerequisites

First-Time Setup (Build from Source)

Local vs Remote Natives

Testing with the Flutter Sample App

Development Workflow

Build Script Reference

Code Style

Pull Request Process

Reporting Issues

Support

License

Related Documentation

Packages on pub.dev