runanywhere-sdks

RunAnywhere Web SDK

On-device AI for the browser. Run LLMs, Speech-to-Text, Text-to-Speech, Vision, and Voice AI locally via WebAssembly – private, offline-capable, zero server dependencies.

WebAssembly TypeScript 5.6+ Chrome 96+ Node.js 18+ License

Beta (v0.1.0) – This is an early release for testing and feedback. The API surface is stable but may change before v1.0. Not yet recommended for production deployments without thorough testing.



Features

Large Language Models (LLM)

Speech-to-Text (STT)

Text-to-Speech (TTS)

Voice Activity Detection (VAD)

Vision Language Models (VLM)

Voice Pipeline

Tool Calling and Structured Output

Embeddings

Infrastructure


System Requirements

Component Minimum Recommended
Browser Chrome 96+ / Edge 96+ Chrome 120+ / Edge 120+
WebAssembly Required Required
SharedArrayBuffer For multi-threaded WASM Requires Cross-Origin Isolation headers
WebGPU For GPU-accelerated diffusion Chrome 120+
OPFS For persistent model storage All modern browsers
RAM 2GB 4GB+ for larger models
Storage Variable Models: 40MB – 4GB depending on model

Package Structure

The Web SDK is a single npm package. Unlike native SDKs (iOS, Android, React Native, Flutter) which use separate packages per backend, the Web SDK compiles all inference backends into a single WebAssembly binary. Backend selection happens at WASM build time, not at the package level.

@runanywhere/web           -- TypeScript API + pre-built WASM (all backends)

The pre-built WASM includes llama.cpp (LLM/VLM), whisper.cpp (STT), and sherpa-onnx (TTS/VAD). Developers who need a smaller WASM binary with specific backends can build from source with selective flags.


Installation

npm install @runanywhere/web

Serve WASM Files + Cross-Origin Isolation

The package includes pre-built WASM files in node_modules/@runanywhere/web/wasm/. Configure your bundler to serve these as static assets.

Important: Your server must set Cross-Origin Isolation headers for SharedArrayBuffer and multi-threaded WASM to work. Without these headers the SDK falls back to single-threaded mode, which is significantly slower. See Cross-Origin Isolation Headers for all platforms (Nginx, Vercel, Netlify, Cloudflare, AWS, Apache).

Vite:

// vite.config.ts
export default defineConfig({
  assetsInclude: ['**/*.wasm'],
  server: {
    headers: {
      // Required for SharedArrayBuffer / multi-threaded WASM
      'Cross-Origin-Opener-Policy': 'same-origin',
      'Cross-Origin-Embedder-Policy': 'credentialless',
    },
  },
});

Webpack:

// webpack.config.js
module.exports = {
  module: {
    rules: [
      { test: /\.wasm$/, type: 'asset/resource' },
    ],
  },
  devServer: {
    headers: {
      // Required for SharedArrayBuffer / multi-threaded WASM
      'Cross-Origin-Opener-Policy': 'same-origin',
      'Cross-Origin-Embedder-Policy': 'credentialless',
    },
  },
};

Safari/iOS: Safari does not support credentialless COEP. Use the COI service worker pattern shown in the demo app — it intercepts responses and injects require-corp headers at runtime. See public/coi-serviceworker.js and the ensureCrossOriginIsolation() call in src/main.ts.


TypeScript Usage

@runanywhere/web ships with full TypeScript definitions. No @types/ package is needed.

import {
  RunAnywhere,
  SDKEnvironment,
  SDKError,
  SDKErrorCode,
  isSDKError,
  type SDKInitOptions,    // canonical name (or InitializeOptions alias)
  type GenerationOptions, // canonical name (or GenerateOptions alias)
  type ChatMessage,
  type ModelDescriptor,
} from '@runanywhere/web';

// Fully typed initialization
const options: SDKInitOptions = {
  environment: SDKEnvironment.Development,
};
await RunAnywhere.initialize(options);

// Typed generation options (used by backend packages: LlamaCPP, ONNX)
const genOptions: GenerationOptions = {
  systemPrompt: 'You are a helpful assistant.',
  maxTokens: 256,
  temperature: 0.7,
};

// Typed error handling
try {
  // ... any SDK call (e.g. loadModel, or backend TextGeneration.generate, etc.)
} catch (error) {
  if (isSDKError(error)) {
    switch (error.code) {
      case SDKErrorCode.NotInitialized:
        console.error('Call RunAnywhere.initialize() first.');
        break;
      case SDKErrorCode.ModelNotLoaded:
        console.error('Load a model first.');
        break;
      default:
        console.error('SDK error:', error.message);
    }
  }
}

Note: InitializeOptions is a convenience alias for SDKInitOptions; GenerateOptions for GenerationOptions. Backend-specific APIs (e.g. TextGeneration, STT, TTS) live in @runanywhere/web-llamacpp and @runanywhere/web-onnx when using the split-package layout.


Quick Start

1. Initialize the SDK

import { RunAnywhere } from '@runanywhere/web';

await RunAnywhere.initialize({ environment: 'development', debug: true });

2. Text Generation (LLM)

import { TextGeneration } from '@runanywhere/web';

// Load a GGUF model
await TextGeneration.loadModel('/models/qwen2.5-0.5b-instruct-q4_0.gguf', 'qwen2.5-0.5b');

// Generate
const result = await TextGeneration.generate('Explain quantum computing briefly.');
console.log(result.text);

// Stream tokens
for await (const token of TextGeneration.generateStream('Write a haiku about code.')) {
  process.stdout.write(token);
}

3. Speech-to-Text (STT)

import { STT } from '@runanywhere/web';

await STT.loadModel({
  modelId: 'whisper-tiny',
  type: STTModelType.Whisper,
  modelFiles: { encoder: '/models/encoder.onnx', decoder: '/models/decoder.onnx', tokens: '/models/tokens.txt' },
  sampleRate: 16000,
});

const result = await STT.transcribe(audioFloat32Array);
console.log(result.text);

4. Text-to-Speech (TTS)

import { TTS } from '@runanywhere/web';

await TTS.loadVoice({
  voiceId: 'piper-en',
  modelPath: '/models/piper-en.onnx',
  tokensPath: '/models/tokens.txt',
  dataDir: '/models/espeak-ng-data',
});

const result = await TTS.synthesize('Hello from RunAnywhere!');
// result.audioData is Float32Array, result.sampleRate is the sample rate

5. Voice Activity Detection (VAD)

import { VAD, SpeechActivity } from '@runanywhere/web';

await VAD.initialize({ modelPath: '/models/silero_vad.onnx' });

VAD.onSpeechActivity((activity) => {
  if (activity === SpeechActivity.Ended) {
    const segment = VAD.popSpeechSegment();
    if (segment) console.log(`Speech: ${segment.samples.length} samples`);
  }
});

// Feed audio chunks from microphone
VAD.processSamples(audioChunk);

6. Vision Language Model (VLM)

import { VLM, VLMImageFormat } from '@runanywhere/web';

await VLM.loadModel('/models/qwen2-vl.gguf', '/models/mmproj.gguf', 'qwen2-vl');

const result = await VLM.process(
  { format: VLMImageFormat.RGB, rgbPixels: pixelData, width: 256, height: 256 },
  'Describe this image.',
  { maxTokens: 100 },
);
console.log(result.text);

Architecture

+---------------------------------------------+
|  TypeScript API                              |
|  RunAnywhere / TextGeneration / STT / TTS   |
|  VAD / VLM / VoicePipeline / Embeddings     |
+---------------------------------------------+
|  WASMBridge + PlatformAdapter               |
|  (Emscripten addFunction / ccall / cwrap)   |
+---------------------------------------------+
|  RACommons C++ (compiled to WASM)           |
|   - Service Registry   - Event System       |
|   - Model Management   - Lifecycle          |
+---------------------------------------------+
|  Inference Backends (WASM)                  |
|   - llama.cpp  (LLM / VLM)                 |
|   - whisper.cpp (STT)                       |
|   - sherpa-onnx (TTS / VAD)                |
+---------------------------------------------+

The Web SDK compiles the same C++ core (runanywhere-commons) used by the iOS and Android SDKs to WebAssembly via Emscripten. The inference engines (llama.cpp, whisper.cpp, sherpa-onnx) are the same native code running in the browser, with identical vtable dispatch, service registry, and event system.

Key Components

Layer Component Description
Public RunAnywhere SDK lifecycle (initialize, shutdown, device capabilities)
Public TextGeneration LLM text generation and streaming
Public STT Speech-to-text transcription
Public TTS Text-to-speech synthesis
Public VAD Voice activity detection
Public VLM Vision-language model inference
Public VoicePipeline STT -> LLM -> TTS orchestration
Public ToolCalling Function calling with typed definitions
Public StructuredOutput JSON schema-guided generation
Public Embeddings Vector embedding generation
Foundation WASMBridge Emscripten module loader and C interop
Foundation SDKLogger Structured logging with configurable levels
Foundation EventBus Typed event system for SDK lifecycle events
Foundation SDKError Typed error hierarchy with error codes
Infrastructure ModelManager Model download, storage, and loading orchestration
Infrastructure OPFSStorage Persistent storage via Origin Private File System
Infrastructure AudioCapture Microphone capture with Web Audio API
Infrastructure VideoCapture Camera capture and frame extraction
Infrastructure AudioPlayback Audio playback via Web Audio API
Infrastructure VLMWorkerBridge Web Worker bridge for off-main-thread VLM inference

Project Structure

sdk/runanywhere-web/
+-- packages/
|   +-- core/                       # @runanywhere/web npm package
|       +-- src/
|       |   +-- Public/             # Public API
|       |   |   +-- RunAnywhere.ts
|       |   |   +-- Extensions/
|       |   |       +-- RunAnywhere+TextGeneration.ts
|       |   |       +-- RunAnywhere+STT.ts
|       |   |       +-- RunAnywhere+TTS.ts
|       |   |       +-- RunAnywhere+VAD.ts
|       |   |       +-- RunAnywhere+VLM.ts
|       |   |       +-- RunAnywhere+VoiceAgent.ts
|       |   |       +-- RunAnywhere+VoicePipeline.ts
|       |   |       +-- RunAnywhere+ToolCalling.ts
|       |   |       +-- RunAnywhere+StructuredOutput.ts
|       |   |       +-- RunAnywhere+Embeddings.ts
|       |   |       +-- RunAnywhere+Diffusion.ts
|       |   |       +-- RunAnywhere+ModelManagement.ts
|       |   +-- Foundation/         # Core infrastructure
|       |   |   +-- WASMBridge.ts
|       |   |   +-- PlatformAdapter.ts
|       |   |   +-- EventBus.ts
|       |   |   +-- SDKLogger.ts
|       |   |   +-- ErrorTypes.ts
|       |   |   +-- SherpaONNXBridge.ts
|       |   +-- Infrastructure/     # Browser services
|       |   |   +-- ModelManager.ts
|       |   |   +-- ModelDownloader.ts
|       |   |   +-- ModelRegistry.ts
|       |   |   +-- OPFSStorage.ts
|       |   |   +-- AudioCapture.ts
|       |   |   +-- AudioPlayback.ts
|       |   |   +-- VideoCapture.ts
|       |   |   +-- VLMWorkerBridge.ts
|       |   |   +-- DeviceCapabilities.ts
|       |   |   +-- ArchiveUtility.ts
|       |   +-- types/              # Shared type definitions
|       +-- wasm/                   # WASM build output (generated)
|       +-- dist/                   # TypeScript build output (generated)
+-- wasm/                           # Emscripten build system
|   +-- CMakeLists.txt
|   +-- src/wasm_exports.cpp
|   +-- platform/wasm_platform_shims.cpp
|   +-- scripts/
|       +-- build.sh                # Main WASM build script
|       +-- setup-emsdk.sh          # Emscripten SDK installer
|       +-- build-sherpa-onnx.sh    # Sherpa-ONNX WASM build
+-- package.json                    # Workspace root
+-- tsconfig.base.json

Building from Source

Building from source is only required if you want to modify the C++ core or build a custom WASM binary with specific backends. Pre-built WASM files are included in the npm package.

Prerequisites

Setup Emscripten

# One-time setup
./wasm/scripts/setup-emsdk.sh
source ~/emsdk/emsdk_env.sh

Build WASM

# All backends (LLM + STT + TTS/VAD) -- produces racommons.wasm (~3.6 MB)
./wasm/scripts/build.sh --all-backends

# Individual backends
./wasm/scripts/build.sh --llamacpp          # LLM only (llama.cpp)
./wasm/scripts/build.sh --whispercpp        # STT only (whisper.cpp)
./wasm/scripts/build.sh --onnx              # TTS/VAD only (sherpa-onnx)
./wasm/scripts/build.sh --llamacpp --vlm    # LLM + VLM (llama.cpp + mtmd)

# WebGPU-accelerated build
./wasm/scripts/build.sh --webgpu

# Debug build with pthreads
./wasm/scripts/build.sh --debug --pthreads --all-backends

# Clean rebuild
./wasm/scripts/build.sh --clean --all-backends

Build outputs are copied to packages/core/wasm/.

Build TypeScript

cd sdk/runanywhere-web
npm install
npm run build:ts

Output: packages/core/dist/index.js and packages/core/dist/index.d.ts.

Typecheck

cd packages/core && npx tsc --noEmit

Browser Requirements

Feature Required Fallback
WebAssembly Yes N/A
SharedArrayBuffer For pthreads (multi-threaded) Single-threaded mode
Cross-Origin Isolation For SharedArrayBuffer Single-threaded mode
WebGPU For Diffusion backend N/A (Diffusion unavailable)
OPFS For persistent model storage MEMFS (volatile, models re-downloaded each session)
Web Audio API For microphone capture / playback N/A

Use detectCapabilities() to check browser support at runtime:

import { detectCapabilities } from '@runanywhere/web';

const caps = await detectCapabilities();
console.log('Cross-Origin Isolated:', caps.isCrossOriginIsolated);
console.log('SharedArrayBuffer:', caps.hasSharedArrayBuffer);
console.log('WebGPU:', caps.hasWebGPU);
console.log('OPFS:', caps.hasOPFS);

Cross-Origin Isolation Headers

For multi-threaded WASM (pthreads), your server must set two HTTP headers on every response:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

These headers enable SharedArrayBuffer, which is required for multi-threaded WASM. Without them, crossOriginIsolated will be false and the SDK falls back to single-threaded mode.

Note: require-corp means all sub-resources (images, scripts, fonts, iframes) must either be same-origin or include a Cross-Origin-Resource-Policy: cross-origin header. Plan accordingly for CDN assets.

Configuration by Platform

Nginx ```nginx server { listen 443 ssl; server_name app.example.com; add_header Cross-Origin-Opener-Policy "same-origin" always; add_header Cross-Origin-Embedder-Policy "require-corp" always; types { application/wasm wasm; } location ~* \.wasm$ { add_header Cross-Origin-Opener-Policy "same-origin" always; add_header Cross-Origin-Embedder-Policy "require-corp" always; add_header Cache-Control "public, max-age=31536000, immutable"; } } ```
Vercel ```json { "headers": [ { "source": "/(.*)", "headers": [ { "key": "Cross-Origin-Opener-Policy", "value": "same-origin" }, { "key": "Cross-Origin-Embedder-Policy", "value": "require-corp" } ] } ] } ```
Netlify ```toml [[headers]] for = "/*" [headers.values] Cross-Origin-Opener-Policy = "same-origin" Cross-Origin-Embedder-Policy = "require-corp" ```
Cloudflare Pages Create a `_headers` file in the project root: ``` /* Cross-Origin-Opener-Policy: same-origin Cross-Origin-Embedder-Policy: require-corp ```
CloudFront (AWS) Add a **Response Headers Policy** with: - `Cross-Origin-Opener-Policy: same-origin` - `Cross-Origin-Embedder-Policy: require-corp` Or use a CloudFront Function: ```javascript function handler(event) { var response = event.response; var headers = response.headers; headers['cross-origin-opener-policy'] = { value: 'same-origin' }; headers['cross-origin-embedder-policy'] = { value: 'require-corp' }; return response; } ```
Apache (.htaccess) ```apache Header always set Cross-Origin-Opener-Policy "same-origin" Header always set Cross-Origin-Embedder-Policy "require-corp" AddType application/wasm .wasm ```
Vite (development) ```typescript export default defineConfig({ server: { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'credentialless', }, }, }); ```

Configuration

SDK Initialization

await RunAnywhere.initialize({
  environment: 'development',  // 'development' | 'staging' | 'production'
  debug: true,                 // Enable verbose logging
});

Logging

The SDK uses SDKLogger for all internal logging. Configure log level and enable/disable:

import { SDKLogger, LogLevel } from '@runanywhere/web';

SDKLogger.level = LogLevel.Debug;    // Trace | Debug | Info | Warning | Error | Fatal
SDKLogger.enabled = true;           // Toggle all SDK logging

Events

Subscribe to SDK lifecycle events:

import { EventBus } from '@runanywhere/web';

EventBus.shared.on('model.downloadProgress', (event) => {
  console.log(`Download: ${(event.data.progress * 100).toFixed(0)}%`);
});

EventBus.shared.on('model.loadCompleted', (event) => {
  console.log(`Model loaded: ${event.data.modelId}`);
});

Error Handling

The SDK uses typed errors with error codes:

import { SDKError, SDKErrorCode } from '@runanywhere/web';

try {
  await TextGeneration.generate('Hello');
} catch (err) {
  if (err instanceof SDKError) {
    switch (err.code) {
      case SDKErrorCode.NotInitialized:
        console.error('SDK not initialized');
        break;
      case SDKErrorCode.ModelNotLoaded:
        console.error('No model loaded');
        break;
      default:
        console.error(`SDK error [${err.code}]: ${err.message}`);
    }
  }
}

Demo App

A full-featured example application is included at examples/web/RunAnywhereAI/. It demonstrates all SDK capabilities across seven tabs: Chat, Vision, Voice, Transcribe, Speak, Storage, and Settings.

cd examples/web/RunAnywhereAI
npm install
npm run dev

The demo app runs on Vite with Cross-Origin Isolation headers pre-configured.


npm Package

@runanywhere/web

Published Exports

Export Description
RunAnywhere SDK lifecycle (initialize, shutdown, capabilities)
TextGeneration LLM text generation and streaming
STT Speech-to-text transcription
TTS Text-to-speech synthesis
VAD Voice activity detection
VLM Vision-language model inference
VoicePipeline STT -> LLM -> TTS orchestration
VoiceAgent Complete voice agent with C API pipeline
ToolCalling Function calling with typed tool definitions
StructuredOutput JSON schema-guided generation
Embeddings Vector embedding generation
Diffusion Image generation (WebGPU, scaffold)
AudioCapture Microphone capture via Web Audio API
AudioPlayback Audio playback via Web Audio API
VideoCapture Camera capture and frame extraction
ModelManager Advanced model download/storage/loading
OPFSStorage Low-level OPFS persistence
VLMWorkerBridge Web Worker bridge for VLM inference
SDKLogger Structured logging
SDKError Typed error hierarchy
EventBus SDK event system
detectCapabilities Browser feature detection

FAQ

Does this work offline?

Yes. Once models are downloaded and cached in OPFS, the SDK works entirely offline. No server, API key, or network connection is needed for inference.

Where are models stored?

Models are stored in the browser’s Origin Private File System (OPFS), a sandboxed persistent storage API. Files persist across browser sessions but are origin-scoped and not accessible via the regular file system. If OPFS quota is exceeded, the SDK falls back to an in-memory cache for the current session.

How large are the WASM files?

The core racommons.wasm is approximately 3.6 MB (all backends). The sherpa-onnx WASM (for TTS/VAD) is approximately 12 MB and is loaded separately only when needed. These are downloaded once and cached by the browser.

Is my data private?

Yes. All inference runs entirely in the browser via WebAssembly. No data is sent to any server. Audio, text, and images never leave the device.

Which browsers are supported?

Chrome 96+ and Edge 96+ are fully supported. Firefox 119+ works but lacks WebGPU. Safari 17+ has basic support but limited OPFS reliability. Mobile browsers have memory constraints that limit larger models.

Can I use a custom model?

Yes. Any GGUF-format model compatible with llama.cpp works for LLM/VLM. STT models use ONNX format via whisper.cpp or sherpa-onnx. TTS models use Piper ONNX format.


Troubleshooting

“SharedArrayBuffer is not defined”

Cause: Missing Cross-Origin Isolation headers.

Fix: Add the required headers to your server configuration. See Cross-Origin Isolation Headers. The SDK will fall back to single-threaded mode if headers are missing.

“Model failed to load”

Cause: CORS error, wrong file path, or corrupted download.

Fix: Ensure the model URL has proper CORS headers or serve from the same origin. Check the browser console for network errors. Try deleting the model from OPFS storage and re-downloading.

“Out of memory” / tab crashes

Cause: Model too large for available browser memory.

Fix: Use smaller quantized models (Q4_0 instead of Q8_0). Close other browser tabs. On mobile, models larger than 1 GB may exceed available memory.

VLM inference is slow

Cause: CLIP image encoding is computationally expensive in WASM.

Fix: Use smaller capture dimensions (256x256 is recommended). The VLM runs in a dedicated Web Worker so the UI remains responsive during inference.

OPFS storage not persisting

Cause: Browser may evict storage under memory pressure, or Incognito mode.

Fix: The SDK requests persistent storage automatically. Ensure you are not in Incognito/Private mode. Safari has known OPFS reliability issues.


Known Limitations (Beta)


Contributing

See the repository Contributing Guide for details.

# Clone and set up
git clone https://github.com/RunanywhereAI/runanywhere-sdks.git
cd runanywhere-sdks/sdk/runanywhere-web

# Install dependencies
npm install

# Build TypeScript
npm run build:ts

# Run the demo app
cd ../../examples/web/RunAnywhereAI
npm install
npm run dev

Support


License

Apache 2.0 – see LICENSE for details.