A production-grade, on-device AI SDK for iOS and macOS. The SDK enables low-latency, privacy-preserving inference for large language models, speech recognition, and voice synthesis with modular backend support.
The RunAnywhere Swift SDK enables developers to run AI models directly on Apple devices without requiring network connectivity for inference. By keeping data on-device, the SDK ensures minimal latency and maximum privacy for your users.
The SDK provides a unified interface to multiple AI capabilities, including large language models (LLMs), speech-to-text (STT), text-to-speech (TTS), and voice activity detection (VAD). These capabilities are delivered through pluggable backend modules that can be included as needed.
Generatable protocolEventBus| Platform | Minimum Version |
|---|---|
| iOS | 17.5+ |
| macOS | 14.5+ |
Swift Version: 5.9+
Xcode: 15.2+
Some optional modules have higher runtime requirements:
RunAnywhereAppleAI): iOS 26+ / macOS 26+ at runtimeAdd the RunAnywhere SDK to your project using Xcode:
https://github.com/RunanywhereAI/runanywhere-sdks
from: "1.0.0")dependencies: [
.package(url: "https://github.com/RunanywhereAI/runanywhere-sdks", from: "1.0.0")
],
targets: [
.target(
name: "YourApp",
dependencies: [
.product(name: "RunAnywhere", package: "runanywhere-sdks"),
.product(name: "RunAnywhereLlamaCPP", package: "runanywhere-sdks"),
.product(name: "RunAnywhereONNX", package: "runanywhere-sdks"),
]
)
]
This repository contains two Package.swift files for different use cases:
| File | Location | Purpose |
|---|---|---|
| Root Package.swift | runanywhere-sdks/Package.swift |
For external SPM consumers. Downloads pre-built XCFrameworks from GitHub releases. |
| Local Package.swift | runanywhere-sdks/sdk/runanywhere-swift/Package.swift |
For SDK developers. Uses local XCFrameworks from Binaries/ directory. |
For app developers: Use the root-level package via the GitHub URL (as shown above).
For SDK contributors: Set useLocalNatives = true in the root Package.swift (NOT sdk/runanywhere-swift/Package.swift, which is always-local and has no flag) after building the XCFrameworks (see Local Development & Contributing below).
import RunAnywhere
import LlamaCPPRuntime
@main
struct MyApp: App {
init() {
Task { @MainActor in
// Register the LlamaCPP module for LLM support
LlamaCPP.register()
// Initialize the SDK
do {
try RunAnywhere.initialize(
apiKey: "<YOUR_API_KEY>",
baseURL: "https://api.runanywhere.ai",
environment: .production
)
} catch {
print("SDK initialization failed: \(error)")
}
}
}
var body: some Scene {
WindowGroup {
ContentView()
}
}
}
// Build a proto-backed generate request via the v2 surface
var options = RALLMGenerationOptions.defaults()
options.maxTokens = 200
options.temperature = 0.7
var request = options.toRALLMGenerateRequest(
prompt: "Explain quantum computing in simple terms"
)
let result = try await RunAnywhere.generate(request)
print("Response: \(result.text)")
print("Tokens generated: \(result.tokensGenerated)")
// Load an LLM through the canonical lifecycle (RAModelLoadRequest).
var loadRequest = RAModelLoadRequest()
loadRequest.modelID = "llama-3.2-1b-instruct-q4"
loadRequest.category = .language
let loadResult = await RunAnywhere.loadModel(loadRequest)
guard loadResult.success else {
print("Load failed: \(loadResult.errorMessage)")
return
}
// Check if a model is loaded for a given modality via the lifecycle service.
var current = RACurrentModelRequest()
current.category = .language
let snapshot = RunAnywhere.currentModel(current)
print("Loaded:", snapshot.found, "id=", snapshot.modelID)
try RunAnywhere.initialize(
apiKey: "<YOUR_API_KEY>",
baseURL: "https://api.runanywhere.ai",
environment: .production
)
| Environment | Description |
|---|---|
.development |
Verbose logging, mock services, local analytics |
.staging |
Testing with real services |
.production |
Minimal logging, full authentication, telemetry |
var options = RALLMGenerationOptions.defaults()
options.maxTokens = 100
options.temperature = 0.8
options.topP = 1.0
options.stopSequences = ["END"]
options.streamingEnabled = false
options.systemPrompt = "You are a helpful assistant."
Register modules at app startup before using their capabilities:
import RunAnywhere
import LlamaCPPRuntime
import ONNXRuntime
@MainActor
func setupSDK() {
LlamaCPP.register() // LLM (priority: 100)
ONNX.register() // STT + TTS (priority: 100)
}
var options = RALLMGenerationOptions.defaults()
options.maxTokens = 150
options.streamingEnabled = true
let request = options.toRALLMGenerateRequest(
prompt: "Write a short poem about AI"
)
let stream = try await RunAnywhere.generateStream(request)
for await event in stream {
if event.kind == .answer {
print(event.token, terminator: "")
}
}
// Commons owns the full structured-output pipeline (prepare → generate →
// strip thinking tags → extract JSON → validate). Build a `RAJSONSchema`
// by populating its typed proto fields (the canonical JSON Schema text is
// produced by the read-only `jsonSchemaString` computed property).
var schema = RAJSONSchema()
schema.type = .object
schema.required = ["question", "options", "correctAnswer"]
var questionProp = RAJSONSchemaProperty()
questionProp.type = .string
schema.properties["question"] = questionProp
var optionsProp = RAJSONSchemaProperty()
optionsProp.type = .array
optionsProp.itemsSchema.type = .string
schema.properties["options"] = optionsProp
var correctAnswerProp = RAJSONSchemaProperty()
correctAnswerProp.type = .integer
schema.properties["correctAnswer"] = correctAnswerProp
let result = try await RunAnywhere.generateStructured(
prompt: "Create a quiz question about Swift programming",
schema: schema
)
let jsonString = String(data: result.parsedJson, encoding: .utf8) ?? ""
print("Validated JSON:", jsonString)
import RunAnywhere
import ONNXRuntime
ONNX.register()
// Load the STT model through the canonical lifecycle.
var loadRequest = RAModelLoadRequest()
loadRequest.modelID = "whisper-base-onnx"
loadRequest.category = .speechRecognition
_ = await RunAnywhere.loadModel(loadRequest)
let audioData: Data = // your audio data (16kHz, mono, Float32)
let transcription = try await RunAnywhere.transcribe(audio: audioData)
print("Transcribed: \(transcription.text)")
// Load a TTS voice through the canonical lifecycle.
var loadRequest = RAModelLoadRequest()
loadRequest.modelID = "piper-en-us-amy"
loadRequest.category = .speechSynthesis
_ = await RunAnywhere.loadModel(loadRequest)
var options = RATTSOptions.defaults()
options.speakingRate = 1.0
options.pitch = 1.0
options.volume = 0.8
let output = try await RunAnywhere.synthesize(
"Hello! Welcome to RunAnywhere.",
options: options
)
// Once STT, LLM, and TTS models are loaded via RAModelLoadRequest, compose
// the voice agent from the lifecycle snapshots:
try await RunAnywhere.initializeVoiceAgentWithLoadedModels()
let audioData: Data = // recorded audio
let result = try await RunAnywhere.processVoiceTurn(audioData)
print("User said:", result.transcription)
print("AI response:", result.assistantResponse)
import Combine
class ViewModel: ObservableObject {
private var cancellables = Set<AnyCancellable>()
init() {
RunAnywhere.events.events
.receive(on: DispatchQueue.main)
.sink { event in
print("Event: \(event.category)")
}
.store(in: &cancellables)
RunAnywhere.events.events(for: .llm)
.sink { event in
print("LLM Event: \(event.category)")
}
.store(in: &cancellables)
}
}
// List registered models via the public proto-backed registry API.
let listResult = await RunAnywhere.listModels()
guard let model = listResult.models.models.first(where: { $0.id == "llama-3.2-1b-instruct-q4" }) else {
return
}
// Download with the closure-based progress callback. Commons owns plan →
// start → progress polling → registry import; the closure receives each
// `RADownloadProgress` snapshot in real time.
let final = try await RunAnywhere.downloadModel(model) { progress in
let percent = Int(progress.overallProgress * 100)
print("\(progress.stage): \(percent)%")
}
print("Local path: \(final.localPath)")
The RunAnywhere SDK follows a modular, provider-based architecture that separates core functionality from specific backend implementations:
+------------------------------------------------------------------+
| Public API |
| RunAnywhere.generate() / transcribe() / synthesize() |
+------------------------------------------------------------------+
|
+------------------------------------------------------------------+
| Capability Layer |
| LLMCapability | STTCapability | TTSCapability | ... |
+------------------------------------------------------------------+
|
+------------------------------------------------------------------+
| ServiceRegistry |
| Routes requests to registered service providers |
+------------------------------------------------------------------+
|
+--------------------+--------------------+
v v v
+------------------+ +------------------+ +------------------+
| LlamaCPP Module | | ONNX Module | | AppleAI Module |
| (LLM: GGUF) | | (STT + TTS) | | (LLM: iOS 26+) |
+------------------+ +------------------+ +------------------+
| | |
v v v
+------------------------------------------------------------------+
| Native Runtime / XCFramework |
| RunAnywhereCore (C++ with Metal acceleration) |
+------------------------------------------------------------------+
Key Components:
RunAnywhere.setLogLevel(.debug)
RunAnywhere.setLocalLoggingEnabled(true)
RunAnywhere.setDebugMode(true)
RunAnywhere.flushLogs()
| Level | Description |
|---|---|
.debug |
Detailed information for debugging |
.info |
General operational information |
.warning |
Potential issues that don’t prevent operation |
.error |
Errors that affect specific operations |
.fault |
Critical errors indicating serious problems |
The SDK automatically tracks key metrics:
All SDK errors are thrown as SDKException, which carries a typed error
code (RASDKErrorCode), a developer-facing message, and a category
identifying which subsystem produced the error.
RASDKErrorCode covers the v2 surface, including:
.notInitialized.invalidAPIKey.modelNotLoaded.modelLoadFailed.generationFailed.processingFailed.networkError.cancelled(See SDKException.swift and Generated/sdk_errors.pb.swift for the full
list of codes and categories.)
do {
var options = RALLMGenerationOptions.defaults()
options.maxTokens = 64
let request = options.toRALLMGenerateRequest(prompt: "Hello")
let result = try await RunAnywhere.generate(request)
print(result.text)
} catch let error as SDKException {
switch error.code {
case .notInitialized:
print("Please call RunAnywhere.initialize() first")
case .modelNotLoaded:
print("Model not loaded. Call RunAnywhere.loadModel(_:) first.")
case .generationFailed:
print("Generation failed: \(error.message)")
default:
print("Error (\(error.category)): \(error.message)")
}
}
// Unload a model through the canonical lifecycle.
var unloadRequest = RAModelUnloadRequest()
unloadRequest.category = .language
_ = await RunAnywhere.unloadModel(unloadRequest)
// Check storage before downloading
let storageInfo = await RunAnywhere.getStorageInfo()
if storageInfo.availableBytes > model.downloadSize ?? 0 {
// Safe to download
}
// Clean up temporary files periodically
try await RunAnywhere.cleanTempFiles()
var options = RALLMGenerationOptions.defaults()
options.streamingEnabled = true
let request = options.toRALLMGenerateRequest(prompt: prompt)
let stream = try await RunAnywhere.generateStream(request)
for await event in stream where event.kind == .answer {
await MainActor.run { self.text += event.token }
}
No, once models are downloaded, all inference happens on-device. You only need internet for:
The SDK supports:
Model sizes vary significantly:
Currently, one LLM can be loaded at a time. STT and TTS models can be loaded alongside LLM models. Use RunAnywhere.unloadModel(RAModelUnloadRequest()) before loading a different LLM.
Call RunAnywhere.listModels() (or RunAnywhere.queryModels(_:) / RunAnywhere.downloadedModels()) to refresh the in-memory model catalog from the registry, then call RunAnywhere.downloadModel(_:onProgress:) to fetch any new or updated entries alongside existing models. Model assignment discovery runs automatically as part of the SDK’s Phase-2 initialization.
By default, only anonymous analytics (latency, error rates) are collected. Actual prompts, responses, and audio data never leave the device.
RunAnywhere.setDebugMode(true)RunAnywhere.events.on(.error) { ... }Build a RALLMGenerationOptions, call toRALLMGenerateRequest(prompt:) to
produce a RALLMGenerateRequest, and pass it to RunAnywhere.generate(_:)
or RunAnywhere.generateStream(_:). There is no longer a separate chat()
convenience — the proto-backed generate(_:) API returns a full
RALLMGenerationResult with metrics.
We welcome contributions to the RunAnywhere Swift SDK. This section explains how to set up your development environment to build the SDK from source and test your changes with the sample app.
The SDK depends on native C++ libraries from runanywhere-commons. Build XCFrameworks locally so you can develop and test the SDK end-to-end.
# 1. Clone the repository
git clone https://github.com/RunanywhereAI/runanywhere-sdks.git
cd runanywhere-sdks
# 2. Build XCFrameworks (~5-15 minutes)
./sdk/runanywhere-swift/scripts/build-core-xcframework.sh
What the build script does:
RACommons.xcframework (core infrastructure)RABackendLLAMACPP.xcframework (LLM backend)RABackendONNX.xcframework (STT/TTS/VAD backend)sdk/runanywhere-swift/Binaries/The SDK has two modes controlled by useLocalNatives in the root Package.swift (the flag does not exist in sdk/runanywhere-swift/Package.swift, which is always-local):
| Mode | Setting | Description |
|---|---|---|
| Local | useLocalNatives = true |
Uses XCFrameworks from Binaries/ (for development) |
| Remote | useLocalNatives = false |
Downloads XCFrameworks from GitHub releases (for end users) |
The recommended way to test SDK changes is with the sample app:
# 1. Ensure XCFrameworks are built (from previous step)
# 2. Navigate to the sample app
cd examples/ios/RunAnywhereAI
# 3. Open in Xcode
open RunAnywhereAI.xcodeproj
# 4. If Xcode shows package errors, reset caches:
# File > Packages > Reset Package Caches
# 5. Build and Run (Cmd+R)
The sample app’s Package.swift references the local SDK, which in turn uses the local frameworks from Binaries/. This creates a complete local development loop:
Sample App → Local Swift SDK → Local XCFrameworks (Binaries/)
↑
Built by sdk/runanywhere-swift/scripts/build-core-xcframework.sh
After modifying Swift SDK code:
After modifying runanywhere-commons (C++ code):
./sdk/runanywhere-swift/scripts/build-core-xcframework.sh
swift test
The project uses SwiftLint for code style enforcement:
brew install swiftlint
swiftlint
git checkout -b feature/my-featureswift testswiftlintOpen an issue on GitHub with:
RunAnywhere.version)Copyright 2025 RunAnywhere AI. All rights reserved.
See the repository for license terms. For commercial licensing inquiries, contact san@runanywhere.ai.