On-Device AI for Flutter Applications
Run LLMs, Speech-to-Text, Text-to-Speech, and Voice AI pipelines locally—privacy-first, offline-capable, production-ready.
<think>...</think> patterns)flutter_ttsEventBusSDKLogger| Component | Minimum | Recommended |
|---|---|---|
| Flutter | 3.10.0+ | 3.24.0+ |
| Dart | 3.0.0+ | 3.5.0+ |
| iOS | 14.0+ | 15.0+ |
| Android | API 24 (7.0) | API 28+ |
| Xcode | 14.0+ | 15.0+ |
| RAM | 2GB | 4GB+ for larger models |
| Storage | Variable | Models: 100MB–8GB |
Note: ARM64 devices are recommended for best performance. Metal GPU acceleration on iOS and NEON SIMD on Android provide significant speedups over CPU-only inference.
Add the packages you need to your pubspec.yaml:
Core + LlamaCpp (LLM):
dependencies:
runanywhere: ^0.15.11
runanywhere_llamacpp: ^0.15.11
Core + ONNX (STT/TTS/VAD):
dependencies:
runanywhere: ^0.15.11
runanywhere_onnx: ^0.15.11
All Backends (LLM + STT + TTS + VAD):
dependencies:
runanywhere: ^0.15.11
runanywhere_llamacpp: ^0.15.11
runanywhere_onnx: ^0.15.11
Then run:
flutter pub get
After adding the packages, update your iOS Podfile:
1. Update ios/Podfile:
# Set minimum iOS version to 14.0
platform :ios, '14.0'
target 'Runner' do
# REQUIRED: Add static linkage
use_frameworks! :linkage => :static
flutter_install_all_ios_pods File.dirname(File.realpath(__FILE__))
end
post_install do |installer|
installer.pods_project.targets.each do |target|
flutter_additional_ios_build_settings(target)
target.build_configurations.each do |config|
config.build_settings['IPHONEOS_DEPLOYMENT_TARGET'] = '14.0'
# Required for microphone permission (STT/Voice features)
config.build_settings['GCC_PREPROCESSOR_DEFINITIONS'] ||= [
'$(inherited)',
'PERMISSION_MICROPHONE=1',
]
end
end
end
Important: Without
use_frameworks! :linkage => :static, you will see “symbol not found” errors at runtime.
2. Update ios/Runner/Info.plist:
Add microphone permission for STT/Voice features:
<key>NSMicrophoneUsageDescription</key>
<string>This app needs microphone access for speech recognition</string>
3. Run pod install:
cd ios && pod install && cd ..
Add microphone permission to android/app/src/main/AndroidManifest.xml:
<uses-permission android:name="android.permission.RECORD_AUDIO" />
import 'package:runanywhere/runanywhere.dart';
import 'package:runanywhere_llamacpp/runanywhere_llamacpp.dart';
import 'package:runanywhere_onnx/runanywhere_onnx.dart';
void main() async {
WidgetsFlutterBinding.ensureInitialized();
// 1. Initialize SDK (development mode - no API key needed)
await RunAnywhere.initialize();
// 2. Register backend modules
await LlamaCpp.register(); // LLM backend (GGUF models)
await Onnx.register(); // STT/TTS backend (Whisper, Piper)
print('RunAnywhere SDK initialized: v${RunAnywhere.version}');
runApp(const MyApp());
}
// Register an LLM model
LlamaCpp.addModel(
id: 'smollm2-360m-q8_0',
name: 'SmolLM2 360M Q8_0',
url: 'https://huggingface.co/prithivMLmods/SmolLM2-360M-GGUF/resolve/main/SmolLM2-360M.Q8_0.gguf',
memoryRequirement: 500000000,
);
// Register an STT model
Onnx.addModel(
id: 'sherpa-onnx-whisper-tiny.en',
name: 'Whisper Tiny English',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/sherpa-onnx-whisper-tiny.en.tar.gz',
modality: ModelCategory.speechRecognition,
);
// Register a TTS voice
Onnx.addModel(
id: 'vits-piper-en_US-lessac-medium',
name: 'Piper US English',
url: 'https://github.com/RunanywhereAI/sherpa-onnx/releases/download/runanywhere-models-v1/vits-piper-en_US-lessac-medium.tar.bz2',
modality: ModelCategory.textToSpeech,
);
// Download with progress
await for (final progress in RunAnywhere.downloadModel('smollm2-360m-q8_0')) {
print('Download: ${(progress.bytesDownloaded / progress.totalBytes * 100).toStringAsFixed(1)}%');
if (progress.state == DownloadProgressState.completed) break;
}
// Load the model
await RunAnywhere.loadModel('smollm2-360m-q8_0');
print('Model loaded: ${RunAnywhere.currentModelId}');
// Simple chat interface
final response = await RunAnywhere.chat('What is the capital of France?');
print(response); // "The capital of France is Paris."
// Full generation with metrics
final result = await RunAnywhere.generate(
'Explain quantum computing in simple terms',
options: LLMGenerationOptions(
maxTokens: 200,
temperature: 0.7,
),
);
print('Response: ${result.text}');
print('Speed: ${result.tokensPerSecond.toStringAsFixed(1)} tok/s');
print('Latency: ${result.latencyMs.toStringAsFixed(0)}ms');
final streamResult = await RunAnywhere.generateStream(
'Write a short poem about AI',
options: LLMGenerationOptions(maxTokens: 150),
);
// Display tokens in real-time
await for (final token in streamResult.stream) {
print(token, terminator: '');
}
// Get final metrics
final metrics = await streamResult.result;
print('\nSpeed: ${metrics.tokensPerSecond.toStringAsFixed(1)} tok/s');
// Cancel if needed
// streamResult.cancel();
// Load STT model
await RunAnywhere.loadSTTModel('sherpa-onnx-whisper-tiny.en');
// Transcribe audio data (PCM16 at 16kHz mono)
final transcription = await RunAnywhere.transcribe(audioBytes);
print('Transcription: $transcription');
// With detailed result
final result = await RunAnywhere.transcribeWithResult(audioBytes);
print('Text: ${result.text}');
print('Confidence: ${result.confidence}');
// Load TTS voice
await RunAnywhere.loadTTSVoice('vits-piper-en_US-lessac-medium');
// Synthesize speech
final ttsResult = await RunAnywhere.synthesize(
'Hello! Welcome to RunAnywhere.',
rate: 1.0,
pitch: 1.0,
);
// ttsResult.samples contains PCM Float32 audio
// ttsResult.sampleRate is typically 22050 Hz
// Ensure all components are loaded
if (!RunAnywhere.isVoiceAgentReady) {
await RunAnywhere.loadSTTModel('sherpa-onnx-whisper-tiny.en');
await RunAnywhere.loadModel('smollm2-360m-q8_0');
await RunAnywhere.loadTTSVoice('vits-piper-en_US-lessac-medium');
}
// Start voice session
final session = await RunAnywhere.startVoiceSession();
// Listen to session events
session.events.listen((event) {
switch (event.runtimeType) {
case VoiceSessionListening:
print('Listening... Level: ${(event as VoiceSessionListening).audioLevel}');
case VoiceSessionTurnCompleted:
final completed = event as VoiceSessionTurnCompleted;
print('User: ${completed.transcript}');
print('AI: ${completed.response}');
}
});
// Stop when done
await session.stop();
The RunAnywhere Flutter SDK follows a modular, provider-based architecture with a C++ commons layer for cross-platform performance:
┌─────────────────────────────────────────────────────────────────┐
│ Your Flutter Application │
├─────────────────────────────────────────────────────────────────┤
│ RunAnywhere Flutter SDK │
│ ┌──────────────┐ ┌───────────────┐ ┌──────────────────────┐ │
│ │ Public APIs │ │ EventBus │ │ ModelRegistry │ │
│ │ (generate, │ │ (events, │ │ (model discovery, │ │
│ │ transcribe) │ │ lifecycle) │ │ download) │ │
│ └──────────────┘ └───────────────┘ └──────────────────────┘ │
├─────────────────────────────────────────────────────────────────┤
│ Native Bridge Layer (FFI) │
│ DartBridge → C++ Commons APIs │
├────────────┬─────────────┬──────────────────────────────────────┤
│ LlamaCpp │ ONNX │ Future Backends... │
│ Backend │ Backend │ │
│ (LLM) │ (STT/TTS) │ │
└────────────┴─────────────┴──────────────────────────────────────┘
| Component | Description |
|---|---|
| RunAnywhere | Static class providing all public SDK methods |
| EventBus | Dart Stream-based event subscription for reactive UI |
| DartBridge | FFI bridge to C++ native libraries |
| ModelRegistry | Model discovery, registration, and persistence |
| Package | Size | Provides |
|---|---|---|
runanywhere |
~5MB | Core SDK, APIs, infrastructure |
runanywhere_llamacpp |
~15-25MB | LLM capability (GGUF models) |
runanywhere_onnx |
~50-70MB | STT, TTS, VAD (ONNX models) |
// Development mode (default) - no API key needed
await RunAnywhere.initialize();
// Production mode - requires API key and backend URL
await RunAnywhere.initialize(
apiKey: '<YOUR_API_KEY>',
baseURL: 'https://api.runanywhere.ai',
environment: SDKEnvironment.production,
);
| Environment | Description |
|---|---|
.development |
Verbose logging, local-only, no auth required |
.staging |
Testing with real services |
.production |
Minimal logging, full authentication, telemetry |
final options = LLMGenerationOptions(
maxTokens: 256, // Maximum tokens to generate
temperature: 0.7, // Sampling temperature (0.0–2.0)
topP: 0.95, // Top-p sampling parameter
stopSequences: ['END'], // Stop generation at these sequences
systemPrompt: 'You are a helpful assistant.',
);
The SDK provides comprehensive error handling through SDKError:
try {
final response = await RunAnywhere.generate('Hello!');
} on SDKError catch (error) {
switch (error.code) {
case SDKErrorCode.notInitialized:
print('SDK not initialized. Call RunAnywhere.initialize() first.');
case SDKErrorCode.modelNotFound:
print('Model not found. Download it first.');
case SDKErrorCode.modelNotDownloaded:
print('Model not downloaded. Call downloadModel() first.');
case SDKErrorCode.componentNotReady:
print('Component not ready. Load the model first.');
default:
print('Error: ${error.message}');
}
}
| Category | Description |
|---|---|
general |
General SDK errors |
llm |
LLM generation errors |
stt |
Speech-to-text errors |
tts |
Text-to-speech errors |
voiceAgent |
Voice pipeline errors |
download |
Model download errors |
validation |
Input validation errors |
// Subscribe to all events
RunAnywhere.events.events.listen((event) {
print('Event: ${event.type}');
});
// Subscribe to specific event types
RunAnywhere.events.events
.where((e) => e is SDKModelEvent)
.listen((event) {
print('Model Event: ${event.type}');
});
| Event | Description |
|---|---|
SDKInitializationStarted |
SDK initialization began |
SDKInitializationCompleted |
SDK initialized successfully |
SDKModelEvent.loadStarted |
Model loading started |
SDKModelEvent.loadCompleted |
Model loaded successfully |
SDKModelEvent.downloadProgress |
Download progress update |
| Model Size | RAM Required | Use Case |
|---|---|---|
| 360M–500M (Q8) | ~500MB | Fast, lightweight chat |
| 1B–3B (Q4/Q6) | 1–2GB | Balanced quality/speed |
| 7B (Q4) | 4–5GB | High quality, slower |
// Unload models when not in use
await RunAnywhere.unloadModel();
await RunAnywhere.unloadSTTModel();
await RunAnywhere.unloadTTSVoice();
// Check storage before downloading
final storageInfo = await RunAnywhere.getStorageInfo();
print('Available: ${storageInfo.deviceStorage.freeSpace} bytes');
// Delete unused models
await RunAnywhere.deleteStoredModel('old-model-id');
availableModels()Symptoms: Download stuck or fails with network error
Solutions:
Symptoms: App crashes during model loading or inference
Solutions:
Symptoms: Runtime crash with “symbol not found” error
Solutions:
use_frameworks! :linkage => :static in Podfilecd ios && pod install --repo-updateflutter clean && flutter runSymptoms: UnsatisfiedLinkError or library load failure
Solutions:
jniLibs folder contains .so files./scripts/build-flutter.sh --setupSymptoms: modelNotFound error even though download completed
Solutions:
await RunAnywhere.refreshDiscoveredModels() to refresh registryA: Only for initial model download. Once downloaded, all inference runs 100% on-device with no network required.
A: Varies by model:
A: No. All inference happens on-device. Only anonymous analytics (latency, error rates) are collected in production mode, and this can be disabled.
A: iOS 14+ and Android API 24+. ARM64 devices are recommended for best performance.
A: Yes! Any GGUF model works with LlamaCpp backend. ONNX models work for STT/TTS with the appropriate format.
A: The SDK supports both arm64 and x86_64 simulators, but performance will be significantly slower than physical devices.
Contributions are welcome. This section explains how to set up your development environment to build the SDK from source and test your changes with the sample app.
The SDK depends on native C++ libraries from runanywhere-commons. The setup script builds these locally so you can develop and test the SDK end-to-end.
# 1. Clone the repository
git clone https://github.com/RunanywhereAI/runanywhere-sdks.git
cd runanywhere-sdks/sdk/runanywhere-flutter
# 2. Run first-time setup (~10-20 minutes)
./scripts/build-flutter.sh --setup
# 3. Bootstrap Flutter packages
melos bootstrap # If melos is installed
# OR manually:
cd packages/runanywhere && flutter pub get && cd ..
cd packages/runanywhere_llamacpp && flutter pub get && cd ..
cd packages/runanywhere_onnx && flutter pub get && cd ..
What the setup script does:
RACommons.xcframework and JNI librariesRABackendLLAMACPP (LLM backend)RABackendONNX (STT/TTS/VAD backend)ios/Frameworks/ and JNI libs to android/src/main/jniLibs/.testlocal marker files (enables local library consumption)The SDK has two modes:
| Mode | Description |
|---|---|
| Local | Uses frameworks/JNI libs from package directories (for development) |
| Remote | Downloads from GitHub releases during pod install/Gradle sync (for end users) |
When you run --setup, the script automatically enables local mode via:
.testlocal marker files in ios/ directoriestestLocal = true in binary_config.gradle filesThe recommended way to test SDK changes is with the sample app:
# 1. Ensure SDK is set up (from previous step)
# 2. Navigate to the sample app
cd ../../examples/flutter/RunAnywhereAI
# 3. Install dependencies
flutter pub get
# 4. Run on iOS
cd ios && pod install && cd ..
flutter run
# 5. Or run on Android
flutter run
You can open the sample app in Android Studio or VS Code for development.
The sample app’s pubspec.yaml uses path dependencies to reference the local SDK packages:
Sample App → Local Flutter SDK Packages → Local Frameworks/JNI libs
↑
Built by build-flutter.sh --setup
After modifying Dart SDK code:
flutter runAfter modifying runanywhere-commons (C++ code):
cd sdk/runanywhere-flutter
./scripts/build-flutter.sh --local --rebuild-commons
| Command | Description |
|---|---|
--setup |
First-time setup: downloads deps, builds all libraries, enables local mode |
--local |
Use local libraries from package directories |
--remote |
Use remote libraries from GitHub releases |
--rebuild-commons |
Rebuild runanywhere-commons from source |
--ios |
Build for iOS only |
--android |
Build for Android only |
--clean |
Clean build artifacts before building |
--abis=ABIS |
Android ABIs to build (default: arm64-v8a) |
We follow standard Dart style guidelines:
# Format code
dart format lib/ test/
# Analyze code
flutter analyze
# Fix issues automatically
dart fix --apply
git checkout -b feature/my-featureflutter testflutter analyzeOpen an issue on GitHub with:
RunAnywhere.versionflutter --versionApache License 2.0 — See LICENSE for details.
For commercial licensing inquiries, contact san@runanywhere.ai.