runanywhere-sdks

RunAnywhere Web SDK

On-device AI for the browser. Run LLMs, Speech-to-Text, Text-to-Speech, Vision, and Voice AI locally via WebAssembly – private, offline-capable, zero server dependencies.

WebAssembly TypeScript 5.6+ Chrome 96+ Node.js 18+ License

Beta (v0.1.0) – This is an early release for testing and feedback. The API surface is stable but may change before v1.0. Not yet recommended for production deployments without thorough testing.

Current runtime status: LLM is the only fully exercised browser E2E path in the current Web artifacts. VLM downloads and loads SmolVLM2 primary GGUF plus mmproj through the shared lifecycle, but Chrome/WebGPU inference is still blocked: RunAnywhere.processImage(...) reaches CLIP encoding image slice... and times out before token decode. STT, TTS, model-backed VAD, RAG, and VoiceAgent are blocked until ONNX Runtime and Sherpa-ONNX WASM static archives are present in sdk/runanywhere-commons/third_party/*-wasm and the unified RACommons WASM artifact exports the required backend symbols.



Features

Large Language Models (LLM)

Speech-to-Text (STT)

Text-to-Speech (TTS)

Voice Activity Detection (VAD)

Vision Language Models (VLM)

Voice Pipeline

Tool Calling and Structured Output

Embeddings

Infrastructure


System Requirements

Component Minimum Recommended
Browser Chrome 96+ / Edge 96+ Chrome 120+ / Edge 120+
WebAssembly Required Required
SharedArrayBuffer For multi-threaded WASM Requires Cross-Origin Isolation headers
WebGPU + JSPI For GPU-accelerated llama.cpp/VLM paths Chrome/Edge with WebAssembly.promising and WebAssembly.Suspending
OPFS For persistent model storage All modern browsers
RAM 2GB 4GB+ for larger models
Storage Variable Models: 40MB – 4GB depending on model

Package Structure

The Web SDK is split into a small core package and backend registration packages. App code imports the Swift-shaped facade from @runanywhere/web; backend packages only install RACommons WASM modules and native plugin vtables.

Package Description Includes
@runanywhere/web Core SDK facade, proto-derived types, browser helpers TypeScript only
@runanywhere/web-llamacpp LLM and VLM backend registration llama.cpp RACommons WASM CPU/WebGPU artifacts
@runanywhere/web-onnx STT, TTS, and VAD backend registration shell Requires ONNX/Sherpa RACommons WASM artifacts that are not vendored yet

Installation

# Core + all current backends
npm install @runanywhere/web @runanywhere/web-llamacpp @runanywhere/web-onnx

# LLM/VLM only
npm install @runanywhere/web @runanywhere/web-llamacpp

# Speech only
npm install @runanywhere/web @runanywhere/web-onnx

Serve WASM Files + Cross-Origin Isolation

Backend packages include their WASM files and resolve them with import.meta.url. Configure your bundler to serve .wasm assets without pre-bundling backend packages.

Important: Your server must set Cross-Origin Isolation headers for SharedArrayBuffer and multi-threaded WASM to work. Without these headers the SDK falls back to single-threaded mode, which is significantly slower. See Cross-Origin Isolation Headers for all platforms (Nginx, Vercel, Netlify, Cloudflare, AWS, Apache).

Vite:

// vite.config.ts
export default defineConfig({
  assetsInclude: ['**/*.wasm'],
  server: {
    headers: {
      // Required for SharedArrayBuffer / multi-threaded WASM
      'Cross-Origin-Opener-Policy': 'same-origin',
      'Cross-Origin-Embedder-Policy': 'credentialless',
    },
  },
});

Webpack:

// webpack.config.js
module.exports = {
  module: {
    rules: [
      { test: /\.wasm$/, type: 'asset/resource' },
    ],
  },
  devServer: {
    headers: {
      // Required for SharedArrayBuffer / multi-threaded WASM
      'Cross-Origin-Opener-Policy': 'same-origin',
      'Cross-Origin-Embedder-Policy': 'credentialless',
    },
  },
};

Safari/iOS: Safari does not support credentialless COEP. Use the COI service worker pattern shown in the demo app — it intercepts responses and injects require-corp headers at runtime. See public/coi-serviceworker.js and the ensureCrossOriginIsolation() call in src/main.ts.


TypeScript Usage

@runanywhere/web ships with full TypeScript definitions. No @types/ package is needed.

import {
  RunAnywhere,
  SDKEnvironment,
  SDKException,
  SDKErrorCode,
  isSDKException,
  type SDKInitOptions,
  type LLMGenerationOptions,
  type ChatMessage,
} from '@runanywhere/web';

// Fully typed initialization
const options: SDKInitOptions = {
  environment: SDKEnvironment.SDK_ENVIRONMENT_DEVELOPMENT,
};
await RunAnywhere.initialize(options);

// Typed generation options (proto-generated, used by backend packages: LlamaCPP, ONNX)
const genOptions: Partial<LLMGenerationOptions> = {
  systemPrompt: 'You are a helpful assistant.',
  maxTokens: 256,
  temperature: 0.7,
};

// Typed error handling
try {
  // ... any SDK call (e.g. loadModel, or backend TextGeneration.generate, etc.)
} catch (error) {
  if (isSDKException(error)) {
    switch (error.code) {
      case SDKErrorCode.NotInitialized:
        console.error('Call RunAnywhere.initialize() first.');
        break;
      case SDKErrorCode.ModelNotLoaded:
        console.error('Load a model first.');
        break;
      default:
        console.error('SDK error:', error.message);
    }
  }
}

Note: LLMGenerationOptions is the proto-generated type for LLM generation parameters. App code uses the Swift-shaped RunAnywhere.* namespaces from @runanywhere/web; backend packages only register native WASM modules.


Quick Start

1. Initialize the SDK and Backends

import { RunAnywhere, SDKEnvironment } from '@runanywhere/web';
import { LlamaCPP } from '@runanywhere/web-llamacpp';

await RunAnywhere.initialize({
  environment: SDKEnvironment.SDK_ENVIRONMENT_DEVELOPMENT,
  debug: true,
});
await LlamaCPP.register({ acceleration: 'auto' });
// ONNX.register() is currently a shell until ONNX/Sherpa WASM archives are vendored.

2. Text Generation (LLM)

await RunAnywhere.modelRegistry.registerModel({
  id: 'qwen2.5-0.5b',
  name: 'Qwen 2.5 0.5B',
  localPath: '/models/qwen2.5-0.5b-instruct-q4_0.gguf',
});
await RunAnywhere.loadModel({ modelId: 'qwen2.5-0.5b' });

const result = await RunAnywhere.generate({
  prompt: 'Explain quantum computing briefly.',
});
console.log(result.text);

const stream = await RunAnywhere.generateStream({
  prompt: 'Write a haiku about code.',
});
for await (const token of stream.stream) {
  process.stdout.write(token);
}

3. Speech-to-Text (STT)

// Blocked in current artifacts: requires ONNX/Sherpa WASM static archives.
const result = await RunAnywhere.transcribe(audioFloat32Array, { sampleRate: 16000 });
console.log(result.text);

4. Text-to-Speech (TTS)

// Blocked in current artifacts: requires ONNX/Sherpa/Piper WASM static archives.
const result = await RunAnywhere.synthesize('Hello from RunAnywhere!');
console.log(result.sampleRate, result.audioData.length);

5. Voice Activity Detection (VAD)

// Blocked in current artifacts: model-backed Silero VAD requires ONNX/Sherpa WASM.
const result = await RunAnywhere.detectVoiceActivity(audioChunk, { sampleRate: 16000 });
console.log(result.isSpeech);

6. Vision Language Model (VLM)

import { VLMImageFormat, VLMModelFamily } from '@runanywhere/web';

await RunAnywhere.modelRegistry.registerModel(smolVLM2ModelInfo);
await RunAnywhere.downloadModel({ modelId: 'smolvlm2-256m-video-instruct-q8_0' });
await RunAnywhere.loadModel({ modelId: 'smolvlm2-256m-video-instruct-q8_0' });
await RunAnywhere.visionLanguage.loadCurrentModel();

const result = await RunAnywhere.processImage(
  { format: VLMImageFormat.VLM_IMAGE_FORMAT_RAW_RGB, rawRgb: pixelData, width: 256, height: 256 },
  {
    prompt: 'Describe this image.',
    maxTokens: 100,
    temperature: 0.2,
    topP: 1,
    topK: 40,
    stopSequences: [],
    streamingEnabled: false,
    maxImageSize: 512,
    nThreads: 4,
    useGpu: true,
    modelFamily: VLMModelFamily.VLM_MODEL_FAMILY_QWEN2_VL,
    seed: 0,
    repetitionPenalty: 1,
    minP: 0,
    emitImageEmbeddings: false,
  },
);
console.log(result.text);

Current VLM validation status: Chrome/WebGPU loads SmolVLM2 and mmproj, then times out in CLIP image encoding before token decode. Treat real VLM inference as BLOCKED until that path returns text in browser E2E.


Architecture

+---------------------------------------------+
|  TypeScript API                              |
|  RunAnywhere facade + namespaced APIs       |
|  textGeneration / stt / tts / vad / vlm     |
+---------------------------------------------+
|  WASMBridge + PlatformAdapter               |
|  (Emscripten addFunction / ccall / cwrap)   |
+---------------------------------------------+
|  RACommons C++ (compiled to WASM)           |
|   - Service Registry   - Event System       |
|   - Model Management   - Lifecycle          |
+---------------------------------------------+
|  Inference Backends (WASM)                  |
|   - llama.cpp  (LLM / VLM)                 |
|   - whisper.cpp (STT)                       |
|   - sherpa-onnx (TTS / VAD)                |
+---------------------------------------------+

The Web SDK compiles the same C++ core (runanywhere-commons) used by the iOS and Android SDKs to WebAssembly via Emscripten. The llama.cpp LLM/VLM path is present in the current artifacts. The ONNX/Sherpa speech path is still gated by missing Web static archives and must not be claimed as runtime-ready until those archives are linked and exports are verified.

Key Components

Layer Component Description
Public RunAnywhere Swift-shaped SDK lifecycle and namespace facade
Public RunAnywhere.textGeneration LLM text generation and streaming
Public RunAnywhere.stt Speech-to-text component lifecycle and transcription
Public RunAnywhere.tts Text-to-speech component lifecycle and synthesis
Public RunAnywhere.vad Voice activity detection component lifecycle and processing
Public RunAnywhere.visionLanguage Vision-language model inference
Public RunAnywhere.modelRegistry C++ model registry proto bridge
Public RunAnywhere.modelLifecycle C++ model lifecycle proto bridge
Public RunAnywhere.downloads C++ download workflow proto bridge
Public RunAnywhere.storage Browser storage helpers plus native storage analyzer bridge
Internal @runanywhere/web/internal Backend-only WASM, adapter, logging, and provider hooks
Browser @runanywhere/web/browser Audio/video capture, playback, file loading, and capability helpers

Project Structure

sdk/runanywhere-web/
+-- packages/
|   +-- core/                       # @runanywhere/web npm package
|       +-- src/
|       |   +-- Public/             # Public API
|       |   |   +-- RunAnywhere.ts
|       |   |   +-- Extensions/
|       |   |       +-- RunAnywhere+TextGeneration.ts
|       |   |       +-- RunAnywhere+STT.ts
|       |   |       +-- RunAnywhere+TTS.ts
|       |   |       +-- RunAnywhere+VAD.ts
|       |   |       +-- RunAnywhere+VisionLanguage.ts
|       |   |       +-- RunAnywhere+VoiceAgent.ts
|       |   |       +-- RunAnywhere+ToolCalling.ts
|       |   |       +-- RunAnywhere+StructuredOutput.ts
|       |   |       +-- RunAnywhere+ModelRegistry.ts
|       |   |       +-- RunAnywhere+ModelLifecycle.ts
|       |   |       +-- RunAnywhere+Storage.ts
|       |   |       +-- RunAnywhere+PluginLoader.ts
|       |   +-- Adapters/            # Proto-byte C ABI adapters
|       |   +-- runtime/             # Emscripten module singleton + proto bridge
|       |   +-- Foundation/         # Core infrastructure
|       |   |   +-- EventBus.ts
|       |   |   +-- SDKLogger.ts
|       |   +-- Infrastructure/     # Browser services
|       |   |   +-- AudioCapture.ts
|       |   |   +-- AudioPlayback.ts
|       |   |   +-- VideoCapture.ts
|       |   |   +-- DeviceCapabilities.ts
|       |   +-- types/              # Proto re-exports + Web-only I/O types
|       +-- tests/                  # Unit and type tests, outside SDK source
|       +-- dist/                   # TypeScript build output (generated)
|   +-- llamacpp/                   # @runanywhere/web-llamacpp backend shell
|   +-- onnx/                       # @runanywhere/web-onnx backend shell
+-- wasm/                           # Emscripten build system
|   +-- CMakeLists.txt
|   +-- src/wasm_exports.cpp
|   +-- platform/wasm_platform_shims.cpp
|   +-- scripts/
|       +-- build.sh                # Main WASM build script
|       +-- setup-emsdk.sh          # Emscripten SDK installer
|       +-- build.sh                # Unified RACommons WASM build
+-- package.json                    # Workspace root
+-- tsconfig.base.json

Building from Source

Building from source is only required if you want to modify the C++ core or build a custom WASM binary with specific backends. Pre-built WASM files are included in the npm package.

Prerequisites

Setup Emscripten

# One-time setup
./wasm/scripts/setup-emsdk.sh
source ~/emsdk/emsdk_env.sh

Build WASM

# All backends (LLM + STT + TTS/VAD) -- produces racommons.wasm (~3.6 MB)
./wasm/scripts/build.sh --all-backends

# Individual backends
./wasm/scripts/build.sh --llamacpp          # LLM only (llama.cpp)
./wasm/scripts/build.sh --whispercpp        # STT only (whisper.cpp)
./wasm/scripts/build.sh --onnx              # Requires ONNX/Sherpa WASM static archives first
./wasm/scripts/build.sh --llamacpp --vlm    # LLM + VLM (llama.cpp + mtmd)

# WebGPU-accelerated build
./wasm/scripts/build.sh --webgpu

# Debug build with pthreads
./wasm/scripts/build.sh --debug --pthreads --all-backends

# Clean rebuild
./wasm/scripts/build.sh --clean --all-backends

Build outputs are copied to packages/core/wasm/.

Build TypeScript

cd sdk/runanywhere-web
npm install
npm run build:ts

Output: packages/core/dist/index.js and packages/core/dist/index.d.ts.

Typecheck

cd packages/core && npx tsc --noEmit

Browser Requirements

Feature Required Fallback
WebAssembly Yes N/A
SharedArrayBuffer For pthreads (multi-threaded) Single-threaded mode
Cross-Origin Isolation For SharedArrayBuffer Single-threaded mode
WebGPU For Diffusion backend N/A (Diffusion unavailable)
OPFS For persistent model storage MEMFS (volatile, models re-downloaded each session)
Web Audio API For microphone capture / playback N/A

Use detectCapabilities() to check browser support at runtime:

import { detectCapabilities } from '@runanywhere/web/browser';

const caps = await detectCapabilities();
console.log('Cross-Origin Isolated:', caps.isCrossOriginIsolated);
console.log('SharedArrayBuffer:', caps.hasSharedArrayBuffer);
console.log('WebGPU:', caps.hasWebGPU);
console.log('OPFS:', caps.hasOPFS);

Cross-Origin Isolation Headers

For multi-threaded WASM (pthreads), your server must set two HTTP headers on every response:

Cross-Origin-Opener-Policy: same-origin
Cross-Origin-Embedder-Policy: require-corp

These headers enable SharedArrayBuffer, which is required for multi-threaded WASM. Without them, crossOriginIsolated will be false and the SDK falls back to single-threaded mode.

Note: require-corp means all sub-resources (images, scripts, fonts, iframes) must either be same-origin or include a Cross-Origin-Resource-Policy: cross-origin header. Plan accordingly for CDN assets.

Configuration by Platform

Nginx ```nginx server { listen 443 ssl; server_name app.example.com; add_header Cross-Origin-Opener-Policy "same-origin" always; add_header Cross-Origin-Embedder-Policy "require-corp" always; types { application/wasm wasm; } location ~* \.wasm$ { add_header Cross-Origin-Opener-Policy "same-origin" always; add_header Cross-Origin-Embedder-Policy "require-corp" always; add_header Cache-Control "public, max-age=31536000, immutable"; } } ```
Vercel ```json { "headers": [ { "source": "/(.*)", "headers": [ { "key": "Cross-Origin-Opener-Policy", "value": "same-origin" }, { "key": "Cross-Origin-Embedder-Policy", "value": "require-corp" } ] } ] } ```
Netlify ```toml [[headers]] for = "/*" [headers.values] Cross-Origin-Opener-Policy = "same-origin" Cross-Origin-Embedder-Policy = "require-corp" ```
Cloudflare Pages Create a `_headers` file in the project root: ``` /* Cross-Origin-Opener-Policy: same-origin Cross-Origin-Embedder-Policy: require-corp ```
CloudFront (AWS) Add a **Response Headers Policy** with: - `Cross-Origin-Opener-Policy: same-origin` - `Cross-Origin-Embedder-Policy: require-corp` Or use a CloudFront Function: ```javascript function handler(event) { var response = event.response; var headers = response.headers; headers['cross-origin-opener-policy'] = { value: 'same-origin' }; headers['cross-origin-embedder-policy'] = { value: 'require-corp' }; return response; } ```
Apache (.htaccess) ```apache Header always set Cross-Origin-Opener-Policy "same-origin" Header always set Cross-Origin-Embedder-Policy "require-corp" AddType application/wasm .wasm ```
Vite (development) ```typescript export default defineConfig({ server: { headers: { 'Cross-Origin-Opener-Policy': 'same-origin', 'Cross-Origin-Embedder-Policy': 'credentialless', }, }, }); ```

Configuration

SDK Initialization

await RunAnywhere.initialize({
  environment: 'development',  // 'development' | 'staging' | 'production'
  debug: true,                 // Enable verbose logging
});

Logging

Configure logging through the public RunAnywhere.logging namespace:

import { RunAnywhere, LogLevel } from '@runanywhere/web';

RunAnywhere.logging.setLevel(LogLevel.Debug);
RunAnywhere.logging.setEnabled(true);

Events

Subscribe to SDK lifecycle events:

RunAnywhere.events.on('model.downloadProgress', (event) => {
  console.log(`Download: ${(event.progress * 100).toFixed(0)}%`);
});

RunAnywhere.events.on('model.loadCompleted', (event) => {
  console.log(`Model loaded: ${event.modelId}`);
});

Error Handling

The SDK uses typed errors with error codes:

import { SDKException, SDKErrorCode } from '@runanywhere/web';

try {
  await RunAnywhere.generate({ prompt: 'Hello' });
} catch (err) {
  if (err instanceof SDKException) {
    switch (err.code) {
      case SDKErrorCode.NotInitialized:
        console.error('SDK not initialized');
        break;
      case SDKErrorCode.ModelNotLoaded:
        console.error('No model loaded');
        break;
      default:
        console.error(`SDK error [${err.code}]: ${err.message}`);
    }
  }
}

Demo App

A full-featured example application is included at examples/web/RunAnywhereAI/. It demonstrates all SDK capabilities across seven tabs: Chat, Vision, Voice, Transcribe, Speak, Storage, and Settings.

cd examples/web/RunAnywhereAI
npm install
npm run dev

The demo app runs on Vite with Cross-Origin Isolation headers pre-configured.


npm Packages

@runanywhere/web

Export Description
RunAnywhere SDK lifecycle and Swift-shaped namespaces (textGeneration, stt, tts, vad, voiceAgent, visionLanguage, modelRegistry, modelLifecycle, downloads, storage, pluginLoader)
LogLevel Public logging level enum for RunAnywhere.logging
SDKException, SDKErrorCode, isSDKException Typed error hierarchy
Proto-derived types/enums SDKEnvironment, InferenceFramework, ModelCategory, VLMImageFormat, ToolDefinition, DownloadProgress, and related generated types
@runanywhere/web/browser Browser helpers: AudioCapture, AudioPlayback, AudioFileLoader, VideoCapture, detectCapabilities, getDeviceInfo
@runanywhere/web/internal Backend-only adapter/runtime hooks, not an application API

@runanywhere/web-llamacpp

Export Description
LlamaCPP Registers the llama.cpp LLM/VLM RACommons WASM backend
LifecycleVLMProvider Backend provider used by RunAnywhere.processImage after RunAnywhere.loadModel

@runanywhere/web-onnx

Export Description
ONNX Registers the ONNX/sherpa RACommons WASM backend for RunAnywhere.stt, RunAnywhere.tts, and RunAnywhere.vad

FAQ

Does this work offline?

Yes. Once models are downloaded and cached in OPFS, the SDK works entirely offline. No server, API key, or network connection is needed for inference.

Where are models stored?

Models are stored in the browser’s Origin Private File System (OPFS), a sandboxed persistent storage API. Files persist across browser sessions but are origin-scoped and not accessible via the regular file system. If OPFS quota is exceeded, the SDK falls back to an in-memory cache for the current session.

How large are the WASM files?

The Web SDK ships unified RACommons WASM artifacts from the backend packages. LLM/VLM and STT/TTS/VAD support are selected at build time with npm run build:wasm -- --llamacpp --onnx --vlm --webgpu and cached by the browser after download.

Is my data private?

Yes. All inference runs entirely in the browser via WebAssembly. No data is sent to any server. Audio, text, and images never leave the device.

Which browsers are supported?

Chrome 96+ and Edge 96+ are fully supported. Firefox 119+ works but lacks WebGPU. Safari 17+ has basic support but limited OPFS reliability. Mobile browsers have memory constraints that limit larger models.

Can I use a custom model?

Yes for the current LLM/VLM path: GGUF-format models compatible with llama.cpp can work when memory and browser capabilities allow. STT/TTS/VAD model formats remain ONNX/Piper/Silero, but Web runtime support is blocked until ONNX/Sherpa WASM archives are linked.


Troubleshooting

“SharedArrayBuffer is not defined”

Cause: Missing Cross-Origin Isolation headers.

Fix: Add the required headers to your server configuration. See Cross-Origin Isolation Headers. The SDK will fall back to single-threaded mode if headers are missing.

“Model failed to load”

Cause: CORS error, wrong file path, or corrupted download.

Fix: Ensure the model URL has proper CORS headers or serve from the same origin. Check the browser console for network errors. Try deleting the model from OPFS storage and re-downloading.

“Out of memory” / tab crashes

Cause: Model too large for available browser memory.

Fix: Use smaller quantized models (Q4_0 instead of Q8_0). Close other browser tabs. On mobile, models larger than 1 GB may exceed available memory.

VLM inference times out during image encoding

Cause: Current Chrome/WebGPU validation reaches CLIP encoding image slice... after prompt preparation and does not return before the 60s E2E timeout.

Fix: Treat VLM real inference as blocked until the WebGPU CLIP image-encoding path is fixed. Use smaller capture dimensions while debugging and keep Playwright traces from RA_RUN_VLM_E2E=1 npm run test:browser -- tests/browser/vlm-generate.spec.ts --trace on.

OPFS storage not persisting

Cause: Browser may evict storage under memory pressure, or Incognito mode.

Fix: The SDK requests persistent storage automatically. Ensure you are not in Incognito/Private mode. Safari has known OPFS reliability issues.


Known Limitations (Beta)


Contributing

See the repository Contributing Guide for details.

# Clone and set up
git clone https://github.com/RunanywhereAI/runanywhere-sdks.git
cd runanywhere-sdks/sdk/runanywhere-web

# Install dependencies
npm install

# Build TypeScript
npm run build:ts

# Run the demo app
cd ../../examples/web/RunAnywhereAI
npm install
npm run dev

Support


License

Apache 2.0 – see LICENSE for details.