Hardware

Intel i7-10700KF (primary)

PropertyValue
CPUIntel Core i7-10700KF (Comet Lake)
Base frequency3.80 GHz
Turbo frequencyUp to 5.10 GHz (single-core)
Cores / Threads8 cores / 16 threads (SMT enabled)
L1d cache32 KB per core, 8-way
L2 cache256 KB per core, 4-way
L3 cache16 MB shared, ring bus interconnect
Architecturex86_64, Comet Lake (14 nm)
OSUbuntu (Linux 6.8)
Rust1.93.1 stable

Apple M1 Pro (secondary)

PropertyValue
CPUApple M1 Pro
Cores8 (6 performance + 2 efficiency)
Architectureaarch64 (ARMv8.5-A)
L1d cache128 KB per P-core, 64 KB per E-core
L2 cache12 MB P-cluster, 4 MB E-cluster
OSmacOS 26.3
Rust1.92.0 stable

Criterion Configuration

ParameterValue
Sample size100 (Criterion default)
Warm-up time3 seconds (Criterion default)
Measurement time5 seconds (Criterion default)
Reported statisticMedian
Outlier detectionCriterion built-in MAD-based classification

Compiler flags: --release (opt-level 3). No custom RUSTFLAGS, no LTO, PGO, or target-cpu=native.

What Is NOT Controlled

The following variables are not controlled and can cause variance between runs and machines:

Cross-Thread Roundtrip Methodology

The roundtrip benchmark (benches/throughput.rs, function cross_thread_latency) measures the time for a message to travel from the publisher to a subscriber thread and for the subscriber to signal receipt back:

  1. Publisher writes a u64 sequence number via publish(i).
  2. Subscriber thread busy-spins on try_recv(). On receipt it stores the value into a shared AtomicU64 (seen) with Release ordering.
  3. Publisher busy-spins on seen.load(Acquire) until it equals i.
  4. Criterion measures steps 1–3.
Note: This is a roundtrip measurement: it includes one cache line transfer for the slot data (publisher → subscriber) and one for the seen atomic (subscriber → publisher). The reported 95 ns is approximately 2x the true one-way latency plus the AtomicU64 signal-back overhead.

One-Way Latency (RDTSC)

The one-way benchmark (benches/rdtsc_oneway.rs) eliminates signal-back overhead by embedding the publisher's TSC reading directly in the message payload:

  1. Publisher calls RDTSCP (serializing TSC read) immediately before publish(). The TSC value is stored in the message payload.
  2. Subscriber calls LFENCE; RDTSC immediately after try_recv() returns Ok.
  3. The delta (subscriber_tsc - publisher_tsc) is recorded in raw cycles.
  4. After 100,000 samples (10,000 warmup discarded), percentiles are computed and converted to nanoseconds using the known CPU base and turbo frequencies.

Disruptor Comparison

Both Photon Ring and disruptor-rs benchmarks run in the same Criterion binary, compiled with identical flags, in the same cargo bench invocation:

Cross-thread Disruptor numbers are not available because the Disruptor's consumer thread is managed internally by its builder API. The roundtrip comparison uses same-thread Criterion iteration for both libraries.

How to Reproduce

Full benchmark suite (Criterion)

cargo bench --bench throughput
cargo bench --bench payload_scaling

Results are written to target/criterion/ as JSON and HTML reports.

One-way latency (RDTSC)

# x86_64 only -- uses inline RDTSCP/LFENCE+RDTSC
cargo bench --bench rdtsc_oneway

Pinned-core latency example

cargo run --release --example pinned_latency

Caveats