Building a Lock-Free Ring Buffer with Atomics
A ring buffer (circular buffer) is the canonical shared-memory data structure for passing a stream of values from one thread to another with no allocation and no locks. This page walks through a minimal, complete TypeScript implementation for a Web Worker pipeline.
This technique builds directly on the SharedArrayBuffer & Atomics reference — read that page for the COOP/COEP header setup, typed-array view mechanics, and the Atomics memory-model rules that make this safe. The Web Workers Architecture & Communication overview provides the broader thread-boundary context.
A SharedArrayBuffer is only available when the page is cross-origin isolated. Your server must respond with both Cross-Origin-Opener-Policy: same-origin and Cross-Origin-Embedder-Policy: require-corp. Verify with console.log(crossOriginIsolated) — it must be true.
Minimal Complete Example
The buffer layout uses the first two Int32 slots for head (read pointer, owned by the consumer) and tail (write pointer, owned by the producer). Data occupies slots 2 through capacity + 1.
// ring-buffer.ts — shared module imported by both main thread and workers
export const HEADER_SLOTS = 2; // slot 0 = head, slot 1 = tail
export const HEAD_SLOT = 0;
export const TAIL_SLOT = 1;
export function createRingBuffer(capacity: number): SharedArrayBuffer {
// +2 header slots, capacity in Int32 elements
const sab = new SharedArrayBuffer((capacity + HEADER_SLOTS) * Int32Array.BYTES_PER_ELEMENT);
const view = new Int32Array(sab);
Atomics.store(view, HEAD_SLOT, 0);
Atomics.store(view, TAIL_SLOT, 0);
return sab;
}
/** Returns true if the value was enqueued, false if the buffer is full. */
export function enqueue(view: Int32Array, value: number): boolean {
const capacity = view.length - HEADER_SLOTS;
const tail = Atomics.load(view, TAIL_SLOT);
const nextTail = (tail + 1) % capacity;
const head = Atomics.load(view, HEAD_SLOT);
if (nextTail === head) return false; // full — one slot is always wasted
Atomics.store(view, HEADER_SLOTS + tail, value); // write data before advancing tail
Atomics.store(view, TAIL_SLOT, nextTail); // publish: consumer can now read this slot
return true;
}
/** Returns the dequeued value, or null if the buffer is empty. */
export function dequeue(view: Int32Array): number | null {
const capacity = view.length - HEADER_SLOTS;
const head = Atomics.load(view, HEAD_SLOT);
const tail = Atomics.load(view, TAIL_SLOT);
if (head === tail) return null; // empty
const value = Atomics.load(view, HEADER_SLOTS + head); // read data before advancing head
const nextHead = (head + 1) % capacity;
Atomics.store(view, HEAD_SLOT, nextHead); // release: producer can reuse this slot
return value;
}
// producer-worker.ts
import { createRingBuffer, enqueue } from './ring-buffer.js';
let view: Int32Array;
self.onmessage = ({ data }) => {
if (data.type === 'INIT') {
view = new Int32Array(data.sab);
// Produce 1 000 values at ~1 kHz
let i = 0;
const interval = setInterval(() => {
const ok = enqueue(view, i++);
if (!ok) console.warn('Ring buffer full — dropping value');
if (i >= 1000) clearInterval(interval);
}, 1);
}
};
// consumer-worker.ts
import { dequeue } from './ring-buffer.js';
let view: Int32Array;
self.onmessage = ({ data }) => {
if (data.type === 'INIT') {
view = new Int32Array(data.sab);
// Poll at ~2 kHz — twice the production rate to drain quickly
setInterval(() => {
let v: number | null;
while ((v = dequeue(view)) !== null) {
processValue(v);
}
}, 0.5);
}
};
function processValue(v: number): void {
// Placeholder: real work would go here
if (v % 100 === 0) self.postMessage({ type: 'PROGRESS', value: v });
}
// main.ts
import { createRingBuffer } from './ring-buffer.js';
const CAPACITY = 256; // must be a power of two for fast modulo (optional but common)
const sab = createRingBuffer(CAPACITY);
const producer = new Worker(new URL('./producer-worker.ts', import.meta.url), { type: 'module' });
const consumer = new Worker(new URL('./consumer-worker.ts', import.meta.url), { type: 'module' });
producer.postMessage({ type: 'INIT', sab });
consumer.postMessage({ type: 'INIT', sab });
consumer.onmessage = ({ data }) => {
if (data.type === 'PROGRESS') console.log('Consumed up to:', data.value);
};
Line-by-Line Walkthrough
createRingBuffer(capacity)
Allocates (capacity + 2) * 4 bytes. The factor of 4 is Int32Array.BYTES_PER_ELEMENT. The two header slots are zeroed atomically at construction time so both threads see a consistent initial state even if the consumer starts before the producer sends its first value.
enqueue — ownership rule
Only the producer thread calls enqueue. It owns the tail index — it is the only writer of TAIL_SLOT. The producer reads HEAD_SLOT to check for fullness, but never writes it. This single-writer-per-index invariant is what makes the buffer lock-free: there is no contention on either index.
The ordering sequence inside enqueue is critical:
- Write data first (
Atomics.store(view, HEADER_SLOTS + tail, value)) - Advance tail second (
Atomics.store(view, TAIL_SLOT, nextTail))
Reversing this order would let the consumer observe the new tail value and attempt to read the slot before the data was written — a classic store-reordering race.
dequeue — mirrored ownership
Only the consumer thread calls dequeue. It owns HEAD_SLOT — sole writer. It reads TAIL_SLOT to check for emptiness but never writes it.
The ordering sequence mirrors enqueue:
- Read data first (
Atomics.load(view, HEADER_SLOTS + head)) - Advance head second (
Atomics.store(view, HEAD_SLOT, nextHead))
Advancing head before reading the data would allow the producer to overwrite the slot before the consumer has finished reading it.
Full/empty detection without an extra counter
The design sacrifices one slot: capacity - 1 usable entries, not capacity. When (tail + 1) % capacity === head the buffer is considered full. This avoids storing a count in shared memory — a count would require an atomic increment/decrement on every enqueue and dequeue, creating contention.
Wrap-around
The modulo operation (index + 1) % capacity handles wrap-around. When capacity is a power of two you can substitute the faster bitwise form (index + 1) & (capacity - 1), but ordinary modulo is correct for any capacity and the JIT frequently optimises it anyway.
Gotchas & Edge Cases
1. Atomic stores on data slots are required
Plain view[HEADER_SLOTS + tail] = value is not sufficient. The V8 JIT can reorder plain stores relative to subsequent atomic stores. Always use Atomics.store for the data write and the index advance in this pattern.
2. Capacity must be agreed upon by both threads
The ring buffer module exports HEADER_SLOTS and derives capacity from view.length - HEADER_SLOTS. Both workers compute capacity from the same shared buffer length — there is no separate capacity variable to keep in sync.
3. COOP/COEP must be present before the page loads
Setting these headers after page load (e.g., via a Service Worker’s fetchEvent response) does not retroactively cross-origin-isolate the browsing context. The headers must arrive on the initial navigation response.
4. Drop strategy vs backpressure The example above logs a warning and drops values when the buffer is full. In production you should decide between:
- Drop with loss signal — increment a separate dropped-count slot with
Atomics.add. - Block the producer —
Atomics.waiton a signal slot that the consumer notifies after draining; covered in Coordinating Workers with Atomics.wait and notify. - Dynamic back-off — slow the producer interval when the fill level exceeds a threshold.
Performance Rule of Thumb
An SPSC ring buffer over SharedArrayBuffer with Int32Array can sustain roughly 50–200 million enqueue/dequeue pairs per second in V8 on a 2023 x86 CPU — limited mainly by cache-line ping-pong between cores. That is approximately 100× higher throughput than a postMessage-based channel at equivalent message rates, because there is no serialisation, no memory allocation, and no IPC system-call overhead per item.
For bursts rather than sustained streams, postMessage with a transferable ArrayBuffer is often simpler: hand off the entire buffer in one zero-copy transfer, process it, and hand it back. The ring-buffer pattern shines when you need a continuous, low-latency pipeline where both producer and consumer are always running.