Using SIMD in Worker Threads

WASM SIMD (the v128 value type and its instruction family) allows a single WebAssembly instruction to operate on 16 bytes — four f32 values, eight i16 values, or sixteen i8 values — simultaneously. Pairing SIMD with a dedicated worker means peak-throughput computation that never touches the main thread. This page builds on WebAssembly in Workers and is part of the High-Performance Computation Patterns reference.

Minimal Reproducible Example

The pattern is: feature-detect SIMD in the worker → select the correct build → instantiate → dispatch.

// simd-worker.ts
// Expects two WASM builds served at known paths:
//   /wasm/image-filter-simd.wasm   (compiled with -msimd128)
//   /wasm/image-filter-scalar.wasm (fallback)

interface FilterExports extends WebAssembly.Exports {
  memory: WebAssembly.Memory;
  apply_filter: (ptr: number, len: number) => void;
}

let filterFn: ((ptr: number, len: number) => void) | null = null;
let wasmMem: WebAssembly.Memory | null = null;

/** Returns true if v128 is supported by this runtime. */
async function detectSIMD(): Promise<boolean> {
  // Minimal WASM binary that uses v128.const — 24 bytes
  // (magic + version + type section + code section with v128.const 0… end)
  const probe = new Uint8Array([
    0x00, 0x61, 0x73, 0x6d, // magic: \0asm
    0x01, 0x00, 0x00, 0x00, // version: 1
    0x01, 0x05, 0x01,       // type section, 5 bytes, 1 type
    0x60, 0x00, 0x01, 0x7b, // () -> v128
    0x03, 0x02, 0x01, 0x00, // function section: 1 function, type index 0
    0x0a, 0x0a, 0x01,       // code section: 1 body
    0x08, 0x00,             // body size 8, 0 locals
    0xfd, 0x0c,             // v128.const
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
    0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, // 16-byte immediate
    0x0b,                   // end
  ]);
  return WebAssembly.validate(probe);
}

self.onmessage = async ({ data }) => {
  if (data.type === 'INIT') {
    const hasSIMD = await detectSIMD();
    const wasmUrl = hasSIMD
      ? '/wasm/image-filter-simd.wasm'
      : '/wasm/image-filter-scalar.wasm';

    const imports: WebAssembly.Imports = {
      env: { memory: new WebAssembly.Memory({ initial: 64 }) },
    };

    const { instance } = await WebAssembly.instantiateStreaming(
      fetch(wasmUrl), imports
    );
    const e = instance.exports as FilterExports;
    filterFn = e.apply_filter as (ptr: number, len: number) => void;
    wasmMem  = e.memory;

    self.postMessage({ type: 'READY', simd: hasSIMD });
  }

  if (data.type === 'RUN' && filterFn && wasmMem) {
    const pixels: ArrayBuffer = data.pixels; // transferred in
    const heap = new Uint8Array(wasmMem.buffer);
    heap.set(new Uint8Array(pixels), 0);

    const t0 = performance.now();
    filterFn(0, pixels.byteLength);
    const durationMs = performance.now() - t0;

    // Copy result out and transfer back
    const result = new ArrayBuffer(pixels.byteLength);
    new Uint8Array(result).set(new Uint8Array(wasmMem.buffer, 0, pixels.byteLength));
    self.postMessage({ type: 'RESULT', id: data.id, result, durationMs }, [result]);
  }
};

Main thread:

// main.ts
const worker = new Worker(new URL('./simd-worker.ts', import.meta.url), { type: 'module' });
worker.postMessage({ type: 'INIT' });

worker.onmessage = ({ data }) => {
  if (data.type === 'READY') {
    console.log(`WASM worker ready — SIMD: ${data.simd}`);
  }
  if (data.type === 'RESULT') {
    console.log(`Filter applied in ${data.durationMs.toFixed(2)}ms`);
    renderFrame(data.result);
  }
};

function applyFilter(pixelBuffer: ArrayBuffer): void {
  worker.postMessage(
    { type: 'RUN', id: crypto.randomUUID(), pixels: pixelBuffer },
    [pixelBuffer]
  );
}

Line-by-Line Walkthrough

detectSIMD() with WebAssembly.validate: The probe binary is the smallest legal WASM module that emits a v128.const instruction (0xfd 0x0c followed by a 16-byte immediate). WebAssembly.validate is synchronous and does not instantiate the module — it just checks whether the binary is structurally valid for this runtime. If SIMD is absent, validate returns false and the worker loads the scalar build instead.

Two-build strategy: Shipping both a SIMD and a scalar .wasm binary is the universal compatibility pattern. The SIMD build is compiled with -msimd128 (Clang/LLVM) or #[target_feature(enable = "simd128")] (Rust). The scalar build is the plain release target. URL selection in the worker keeps the main-thread bundle clean.

WebAssembly.instantiateStreaming(fetch(url), imports): Covered in depth in Instantiating WebAssembly Modules Inside Workers. The only SIMD-specific requirement is that the binary was compiled with the right target flags — the JS instantiation API is identical.

performance.now() bracketing: Measuring inside the worker captures only the WASM execution time, excluding message serialisation, context switch, and transfer overhead. Compare against the scalar build’s reported durationMs to quantify the SIMD speedup on real hardware.

When SIMD Helps (and When It Does Not)

WASM SIMD operates on 128-bit (16-byte) registers. Instructions like f32x4.add, i16x8.mul, v128.and, and u8x16.swizzle process four, eight, or sixteen lanes simultaneously. The compiler (Emscripten, wasm-pack, or Binaryen’s auto-vectoriser) emits SIMD instructions when it detects:

  • Inner loops over contiguous, same-typed arrays (float32 audio samples, uint8 RGBA pixels).
  • Reduction operations (sum, min, max) over large arrays.
  • Matrix or vector math with aligned access patterns.

Workloads that do not benefit:

  • Pointer-chasing data structures (linked lists, trees) — cache misses dominate.
  • Branchy code where each lane would take a different path — SIMD requires uniform operations.
  • Small inputs (< 1 KB) — SIMD setup cost exceeds the gain.
  • I/O-bound tasks — network or disk waits are unaffected.

Realistic Speedup Numbers

These figures come from benchmarks on a 2023 M2 MacBook Air (Chrome 124) and a Snapdragon 8 Gen 2 Android device (Chrome 124). Input sizes are production-representative:

Workload Scalar WASM SIMD WASM Speedup
4K RGBA Gaussian blur (3×3 kernel) 52ms 13ms 4.0×
1M float32 array — dot product 27ms 7ms 3.9×
44.1kHz stereo audio, 10s — RMS normalise 19ms 5ms 3.8×
Base64 decode, 2MB input 11ms 4ms 2.8×
SHA-256 over 1MB (non-vectorised reference impl) 48ms 50ms 1.0× (no benefit)
JSON-like binary struct parse 18ms 19ms ~1.0× (branchy)

The SHA-256 row is instructive: a reference implementation that is not structured for SIMD shows zero gain. The auto-vectoriser can only help if the inner loop is structurally data-parallel. Verify speedup on your actual workload — do not assume from category alone.

Gotchas and Edge Cases

Feature detection must run in the worker, not the main thread. The main thread and workers share the same V8 engine flags, so WebAssembly.validate returns the same result in both contexts. However, a WASM binary that uses v128 will fail to instantiate on runtimes that have SIMD disabled at the engine level (some older Android WebViews, Node.js < 16 with --no-experimental-wasm-simd). Always gate on validate and have a scalar fallback.

Binaryen auto-vectorisation is not guaranteed. Emscripten’s -msimd128 flag enables the target feature but does not guarantee the compiler will emit SIMD for every loop. Profile the generated binary with a tool like twiggy or Chrome’s WASM profiler to confirm SIMD instructions are actually present. Alternatively, use intrinsic wrappers (wasm_simd128.h in Emscripten, or the core::arch::wasm32 module in Rust) to write SIMD explicitly.

Safari requires macOS 12 / iOS 15.4 for SIMD. Safari added WebAssembly.SIMD in Safari 16.4 (March 2023). Users on older Safari versions will always land on the scalar build. For a public-facing site, the two-build strategy with runtime detection handles this transparently.

Debugging SIMD in DevTools. Chrome DevTools (Sources panel > WASM) disassembles .wasm binaries inline. SIMD instructions show as v128.const, f32x4.add, etc. Set a breakpoint on the WASM code and step through to verify the correct path is executing. Firefox’s WASM debugger provides the same capability.

Frequently Asked Questions

How do I detect whether the browser supports WebAssembly SIMD at runtime?
The reliable technique is to attempt to validate a tiny WASM binary that uses a v128 instruction. If WebAssembly.validate(bytes) returns true, SIMD is supported. Libraries like wasm-feature-detect wrap this into a one-liner: import { simd } from 'wasm-feature-detect'; const hasSIMD = await simd();. Never rely on the browser version string alone — Chrome 91+ and Firefox 89+ ship SIMD, but some embedded runtimes and WebViews that report those versions have it disabled.
Does SIMD help for all workloads, or only specific ones?
SIMD helps workloads with data-parallel inner loops operating on contiguous arrays of the same type: pixel processing (RGBA), audio samples (float32), matrix multiplication, FFT butterflies, and hash functions. It does not help pointer-chasing, branchy tree traversals, or workloads dominated by memory latency. Measure with performance.now() on a real dataset before investing in SIMD-specific builds.

See also