Measuring Structured Clone Cost with performance.now()

postMessage overhead has two distinct components: the time to serialise your payload (structured clone) and the time to actually deliver the message across the thread boundary. Conflating the two leads to wrong architectural decisions. Measuring them separately shows exactly when switching to transferable objects pays off.

This page is part of the postMessage Bottleneck Analysis section within the Debugging, Profiling & Production Optimization reference.

Minimal Reproducible Example: Clone Timing Harness

// harness.ts — runs in the browser main thread
interface BenchResult {
  payloadBytes: number;
  cloneMs: number;
  dispatchMs: number;
  roundTripMs: number;
  transferMs: number;
}

const worker = new Worker(
  new URL('./echo.worker.ts', import.meta.url),
  { type: 'module' }
);

let pendingResolve: ((r: BenchResult) => void) | null = null;

worker.onmessage = (e: MessageEvent) => {
  const { type, workerReceiveTime, transferReceiveTime, payloadBytes } = e.data;

  if (type === 'echo-clone') {
    const roundTripMs = performance.now() - e.data.t0;
    const cloneMs = workerReceiveTime - e.data.t0;
    pendingResolve?.({
      payloadBytes,
      cloneMs,
      dispatchMs: roundTripMs - cloneMs,
      roundTripMs,
      transferMs: 0,
    });
  } else if (type === 'echo-transfer') {
    pendingResolve?.({
      payloadBytes,
      cloneMs: 0,
      dispatchMs: 0,
      roundTripMs: performance.now() - e.data.t0,
      transferMs: transferReceiveTime - e.data.t0,
    });
  }
};

export async function benchmarkCloneVsTransfer(
  sizeMb: number
): Promise<BenchResult[]> {
  const bytes = sizeMb * 1024 * 1024;
  const results: BenchResult[] = [];

  // Warm up: discard first 3 runs
  for (let i = 0; i < 3; i++) {
    await runClone(bytes);
  }

  // Measure 10 runs each strategy
  for (let i = 0; i < 10; i++) {
    results.push(await runClone(bytes));
  }
  for (let i = 0; i < 10; i++) {
    results.push(await runTransfer(bytes));
  }
  return results;
}

function runClone(bytes: number): Promise<BenchResult> {
  return new Promise(resolve => {
    pendingResolve = resolve;
    const data = new Uint8Array(bytes);                 // zero-filled
    const t0 = performance.now();
    worker.postMessage({ type: 'clone', data, t0 });    // structured clone
  });
}

function runTransfer(bytes: number): Promise<BenchResult> {
  return new Promise(resolve => {
    pendingResolve = resolve;
    const buffer = new ArrayBuffer(bytes);
    const t0 = performance.now();
    worker.postMessage({ type: 'transfer', buffer, t0 }, [buffer]);
  });
}
// echo.worker.ts
self.onmessage = (e: MessageEvent) => {
  const receiveTime = performance.now();

  if (e.data.type === 'clone') {
    // Echo back with receive timestamp; data is cloned back to main thread
    self.postMessage({
      type: 'echo-clone',
      t0: e.data.t0,
      workerReceiveTime: receiveTime,
      payloadBytes: (e.data.data as Uint8Array).byteLength,
    });
  } else if (e.data.type === 'transfer') {
    const buf = e.data.buffer as ArrayBuffer;
    self.postMessage(
      {
        type: 'echo-transfer',
        t0: e.data.t0,
        transferReceiveTime: receiveTime,
        payloadBytes: buf.byteLength,
      },
      [buf]   // transfer back so main thread can reuse the buffer
    );
  }
};

Line-by-Line Walkthrough

const t0 = performance.now();
worker.postMessage({ type: 'clone', data, t0 });

t0 is captured immediately before postMessage. Including t0 in the payload means the worker receives the dispatch timestamp inside the same message — no separate channel, no ordering issues. Note that embedding t0 in a structured-clone message means the number itself is cloned, which is negligible (8 bytes).

const receiveTime = performance.now();

This runs as the first line of onmessage. Any computation before this line would inflate the reported clone time. Placing it first isolates message delivery from handler logic.

// In the main thread onmessage handler:
const cloneMs = workerReceiveTime - e.data.t0;

workerReceiveTime - t0 is the combined cost of structured clone serialisation + thread scheduling + message queue delivery. This is the number that matters when deciding whether a payload is too large to clone repeatedly.

worker.postMessage({ type: 'transfer', buffer, t0 }, [buffer]);

The transfer case sends buffer in the transfer list. buffer.byteLength becomes 0 in the main thread immediately after this call. The delta transferReceiveTime - t0 measures ownership handover cost — typically 0.05–0.2 ms regardless of payload size.

Concrete Numbers Table

Measured on Chrome 124, Intel Core i7-1185G7, 16 GB RAM (stable clock, 20 runs each, median reported):

Payload Clone time (ms) Transfer time (ms) Clone : Transfer ratio
100 KB 0.08 0.05 1.6×
500 KB 0.35 0.05
1 MB 0.72 0.06 12×
5 MB 3.6 0.07 51×
10 MB 7.2 0.08 90×
50 MB 38 0.11 345×

Above ~500 KB, the gap between clone and transfer becomes architecturally significant. A 50 MB payload that is cloned on every animation frame (60 fps) costs 38 ms × 60 = 2 280 ms of clone work per second — completely infeasible. The same payload transferred as an ArrayBuffer costs 0.11 ms per frame.

Rule of thumb

Use transferable ArrayBuffer for any payload over ~250 KB that crosses the thread boundary more than once per second. Below 250 KB, structured clone cost is under 0.15 ms and the simpler code is worth more than the marginal time savings.

Isolating Clone Cost Without a Worker

To measure pure structured-clone serialisation cost without worker-dispatch noise, use structuredClone() directly:

function measureCloneMs(payload: unknown): number {
  const t0 = performance.now();
  structuredClone(payload);        // synchronous, blocks caller
  return performance.now() - t0;
}

const arr = new Uint8Array(5 * 1024 * 1024); // 5 MB
console.log(`Clone: ${measureCloneMs(arr).toFixed(2)} ms`);
// Typical output: Clone: 3.52 ms

This is useful for profiling the serialisation cost of complex nested objects (not just typed arrays), where the V8 fast path for flat typed arrays does not apply. A deeply nested object with 10 000 keys and circular-like fan-outs can cost 8–15 ms to clone even at just 500 KB — the structural complexity matters as much as byte count.

Gotchas & Edge Cases

1. Timer precision is reduced in cross-origin contexts

Browsers quantize performance.now() to 0.1 ms granularity in contexts without Cross-Origin-Opener-Policy: same-origin + Cross-Origin-Embedder-Policy: require-corp headers. For payloads that clone in under 0.5 ms, this means your measurement has a ±20% error margin. Add the COOP/COEP headers to your dev server to restore 5 µs precision.

2. The first call is always slower (JIT cold start)

V8 compiles the structured clone path just-in-time. The first two or three postMessage calls for a new payload shape can be 3–5× slower than subsequent calls. Always warm up with 3–5 discarded runs before collecting measurements — the harness above discards the first 3 runs for exactly this reason.

3. Clone cost scales with object graph complexity, not just byte count

A 1 MB JSON object with 50 000 keys clones significantly slower than a 1 MB Float32Array. The typed-array path in V8 uses memcpy-level performance; the object-graph path must traverse every property descriptor. When profiling your specific payload, always use a representative fixture, not a synthetic typed array, unless your actual messages are typed arrays.

4. Return-trip clone cost is often forgotten

The harness measures cost in one direction. If the worker returns a result via postMessage, the return trip also incurs clone cost. For a worker that receives a 5 MB payload, processes it, and returns a 5 MB result, total clone cost is approximately 7.2 ms × 2 = 14.4 ms per round trip. Transfer the return buffer in the transfer list too:

// worker: transfer result back instead of cloning
self.postMessage({ result: outputBuffer }, [outputBuffer]);

Building a Reusable Benchmark Report

// benchmark-runner.ts
export async function printBenchmarkTable(sizeMbs: number[]): Promise<void> {
  const rows: string[] = [
    'Payload | Clone (ms) | Transfer (ms) | Ratio',
    '--------|------------|---------------|------',
  ];

  for (const mb of sizeMbs) {
    const runs = await benchmarkCloneVsTransfer(mb);
    const cloneRuns = runs.filter(r => r.cloneMs > 0);
    const transferRuns = runs.filter(r => r.transferMs > 0);

    const medianClone = median(cloneRuns.map(r => r.cloneMs));
    const medianTransfer = median(transferRuns.map(r => r.transferMs));
    const ratio = (medianClone / medianTransfer).toFixed(0);

    rows.push(
      `${mb} MB | ${medianClone.toFixed(2)} | ${medianTransfer.toFixed(2)} | ${ratio}×`
    );
  }

  console.log(rows.join('\n'));
}

function median(values: number[]): number {
  const sorted = [...values].sort((a, b) => a - b);
  const mid = Math.floor(sorted.length / 2);
  return sorted.length % 2 !== 0
    ? sorted[mid]
    : (sorted[mid - 1] + sorted[mid]) / 2;
}

// Usage:
await printBenchmarkTable([0.1, 0.5, 1, 5, 10, 50]);

Run this in the browser DevTools console (with benchmarkCloneVsTransfer exported and accessible) to generate a table specific to the user’s device and browser. The numbers vary significantly by hardware — a mobile device shows 3–5× higher clone latency than a desktop.

When Clone Cost Appears in the Chrome Performance Tab

The structured-clone operation shows up in the Performance tab flame chart as a v8.serialize task on the main thread, immediately followed by v8.deserialize on the receiving thread. If you see v8.serialize consuming more than 1–2 ms in a flame chart, that is a direct signal to switch to transferables for that payload.

To identify these tasks:

  1. Open DevTools → Performance tab.
  2. Click the gear icon and enable “Worker threads” to include worker timelines.
  3. Record a session that includes the postMessage calls under scrutiny.
  4. In the main thread flame chart, use Ctrl+F (or Cmd+F on Mac) to search for serialize. Every hit is a structured-clone operation.
  5. Click a v8.serialize block to see its duration and the stack frame that triggered it.

Tasks under 0.5 ms are generally acceptable. Tasks over 2 ms on the main thread contribute directly to frame budget overruns — a 16.6 ms frame budget at 60 fps leaves little room for repeated serialisation of large payloads.

Connecting Measurements to Architecture Decisions

The numbers collected by the harness feed directly into three architectural choices:

Choice 1 — Clone or transfer? Use the 250 KB threshold as the starting point: if your median clone cost consistently exceeds 0.5 ms for a given payload shape, switch to ArrayBuffer transfer. The harness gives you the exact number for your payload rather than a generic rule.

Choice 2 — Batch or stream? If each individual message is cheap to clone (< 0.2 ms) but you send thousands per second, measure the aggregate. 0.15 ms × 1 000/s = 150 ms of clone work per second — likely acceptable but worth knowing.

Choice 3 — Worker pool size? Clone cost scales linearly with payload size but is fixed per worker instance. Running four workers in parallel does not reduce per-message clone cost — they each pay the full serialisation price independently. Pool size decisions should be based on computation time, not clone time.

Frequently Asked Questions

Why can't I just measure performance.now() immediately before and after postMessage to get clone cost?
The postMessage call returns synchronously after enqueueing the message — the structured clone happens on the same microtask but the call returns before the clone is fully written to the target heap. The accurate approach is to record t0 = performance.now() before postMessage, include t0 in the message payload, then subtract t0 from the worker’s performance.now() on receipt. That round-trip delta captures: clone time + thread scheduling + message delivery. To isolate clone cost alone, use structuredClone(payload) directly and measure that instead.
Are performance.now() timestamps comparable across threads?
Yes. Both the main thread and each worker share the same time origin (performance.timeOrigin) within a browsing context. performance.now() values from main thread and worker threads can be subtracted directly to produce meaningful deltas. The clocks can drift by ~0.1 ms across threads due to measurement precision restrictions introduced by Spectre mitigations (browsers quantize timers to 0.1 ms for cross-origin contexts and 5 µs for same-origin isolated contexts).

See also