Hardware
Intel i7-10700KF (primary)
| Property | Value |
|---|---|
| CPU | Intel Core i7-10700KF (Comet Lake) |
| Base frequency | 3.80 GHz |
| Turbo frequency | Up to 5.10 GHz (single-core) |
| Cores / Threads | 8 cores / 16 threads (SMT enabled) |
| L1d cache | 32 KB per core, 8-way |
| L2 cache | 256 KB per core, 4-way |
| L3 cache | 16 MB shared, ring bus interconnect |
| Architecture | x86_64, Comet Lake (14 nm) |
| OS | Ubuntu (Linux 6.8) |
| Rust | 1.93.1 stable |
Apple M1 Pro (secondary)
| Property | Value |
|---|---|
| CPU | Apple M1 Pro |
| Cores | 8 (6 performance + 2 efficiency) |
| Architecture | aarch64 (ARMv8.5-A) |
| L1d cache | 128 KB per P-core, 64 KB per E-core |
| L2 cache | 12 MB P-cluster, 4 MB E-cluster |
| OS | macOS 26.3 |
| Rust | 1.92.0 stable |
Criterion Configuration
| Parameter | Value |
|---|---|
| Sample size | 100 (Criterion default) |
| Warm-up time | 3 seconds (Criterion default) |
| Measurement time | 5 seconds (Criterion default) |
| Reported statistic | Median |
| Outlier detection | Criterion built-in MAD-based classification |
Compiler flags: --release (opt-level 3). No custom RUSTFLAGS, no LTO, PGO, or target-cpu=native.
What Is NOT Controlled
The following variables are not controlled and can cause variance between runs and machines:
- CPU frequency governor. Left at OS default. Turbo boost is not disabled.
- SMT (Hyper-Threading). Enabled on Intel i7-10700KF. Cross-thread benchmarks may land on sibling hyperthreads or separate physical cores, which dramatically changes latency.
- Core isolation. No
isolcpus,nohz_full, orrcu_nocbskernel parameters are set. - Core pinning. Criterion benchmarks do not pin threads. The
rdtsc_onewaybench and thepinned_latencyexample do use core pinning where noted. - Background load. Benchmarks run on a developer workstation, not a dedicated bare-metal machine.
Cross-Thread Roundtrip Methodology
The roundtrip benchmark (benches/throughput.rs, function cross_thread_latency)
measures the time for a message to travel from the publisher to a subscriber thread and
for the subscriber to signal receipt back:
- Publisher writes a
u64sequence number viapublish(i). - Subscriber thread busy-spins on
try_recv(). On receipt it stores the value into a sharedAtomicU64(seen) with Release ordering. - Publisher busy-spins on
seen.load(Acquire)until it equalsi. - Criterion measures steps 1–3.
seen atomic
(subscriber → publisher). The reported 95 ns is approximately 2x the true one-way
latency plus the AtomicU64 signal-back overhead.
One-Way Latency (RDTSC)
The one-way benchmark (benches/rdtsc_oneway.rs) eliminates signal-back overhead
by embedding the publisher's TSC reading directly in the message payload:
- Publisher calls
RDTSCP(serializing TSC read) immediately beforepublish(). The TSC value is stored in the message payload. - Subscriber calls
LFENCE; RDTSCimmediately aftertry_recv()returnsOk. - The delta
(subscriber_tsc - publisher_tsc)is recorded in raw cycles. - After 100,000 samples (10,000 warmup discarded), percentiles are computed and converted to nanoseconds using the known CPU base and turbo frequencies.
Disruptor Comparison
Both Photon Ring and disruptor-rs benchmarks run in the same Criterion binary,
compiled with identical flags, in the same cargo bench invocation:
- Same ring size: 4096 slots.
- Same wait strategy:
BusySpin(lowest-latency strategy in both libraries). - Publish-only: Disruptor ring has a single
BusySpinconsumer attached (required by its API). The consumer stores received values into aRelaxedatomic, which the benchmark ignores.
How to Reproduce
Full benchmark suite (Criterion)
cargo bench --bench throughput cargo bench --bench payload_scaling
Results are written to target/criterion/ as JSON and HTML reports.
One-way latency (RDTSC)
# x86_64 only -- uses inline RDTSCP/LFENCE+RDTSC cargo bench --bench rdtsc_oneway
Pinned-core latency example
cargo run --release --example pinned_latency
Caveats
- Self-benchmarks. All benchmarks are authored and run by the Photon Ring maintainers. They have not been independently verified by a third party.
- Hardware-dependent. Numbers are specific to the tested hardware. Different CPUs, cache hierarchies, and interconnects will produce different results.
- Disruptor comparison is against the Rust port. The disruptor crate (v4.0.0) is a Rust reimplementation of the LMAX Disruptor pattern. A direct comparison against the Java original on matched hardware has not been performed.
- Median vs. tail latency. The README reports median (p50). Tail latency (p99, p999) is higher and more variable. The
rdtsc_onewaybenchmark reports full percentile distributions. - Single-socket only. All benchmarks run on single-socket machines. Cross-socket (NUMA) latency would be significantly higher for both libraries.