Photon Ring — Seqlock-Stamped Inter-Thread Messaging for Rust

Overview

How Photon Ring achieves near-hardware latency

Photon Ring is a zero-allocation pub/sub crate for Rust built around pre-allocated ring buffers with per-slot seqlock stamps. It targets the part of concurrent systems where queueing overhead dominates: market data, telemetry fanout, staged pipelines, and other hot-path broadcast workloads where every subscriber must observe every message.

The central insight is stamp-in-slot co-location: by embedding the seqlock sequence stamp directly alongside the payload in one #[repr(C, align(64))] struct, both ownership metadata and data reside within a single 64-byte cache line for payloads up to 56 bytes. Consumers validate and read in one L3 snoop instead of two, cutting coherence traffic in half.

Slot layout

                    64 bytes (one cache line)
+-----------------------------------------------------------+
|  stamp: AtomicU64  |  value: T                            |
|  (seqlock)         |  (Pod — all bit patterns valid)      |
+-----------------------------------------------------------+
For T <= 56 bytes: stamp and value share one cache line.
Larger T spills to additional lines (still correct, slightly slower).

Write protocol:                Read protocol:
  stamp = seq*2 + 1 (odd)       s1 = stamp.load(Acquire)
  fence(Release)                 if odd  → spin
  memcpy(slot.value, data)       if s1 < expected → Empty
  stamp = seq*2 + 2 (even)       if s1 > expected → Lagged
  cursor = seq (Release)         value = memcpy(slot)
                                 s2 = stamp.load(Acquire)
                                 if s1 == s2 → return value
                                 else → retry

⚡

Near-hardware latency

48 ns p50 one-way latency on Intel i7-10700KF — within 20% of the bare L3 snoop floor, leaving almost no software overhead.

📡

True broadcast

Every subscriber sees every message. Fanout to 10 independent subscribers costs 17 ns total (Intel) — 1.7 ns per subscriber. SubscriberGroup batches N logical consumers into a single seqlock read, cutting that to 0.2 ns each.

🧰

Zero allocation on the hot path

The ring is pre-allocated at construction. publish and try_recv never touch the allocator — no GC pauses, no malloc jitter.

🧪

Pod payload safety

Requires T: Pod (every bit pattern valid), making speculative torn reads safe to discard. Compile-time proof, not a runtime check.

⚙

SPMC and MPMC

Publisher is single-producer via &mut self (no CAS on write). MpPublisher adds a lock-free multi-producer path. Named-topic Photon<T> and heterogeneous TypedBus included.

🌐

no_std + alloc

Works on bare metal, WASM, and embedded targets with alloc. Pipeline topology builder, hugepages, and CPU affinity are available on supported desktop/server platforms.

Benchmarks

Criterion (100 samples, --release, no custom RUSTFLAGS) on two machines. Numbers are medians unless stated.

Hardware

Intel i7-10700KF — Primary

CPUIntel Core i7-10700KF

MicroarchComet Lake (14 nm)

Cores / Threads8C / 16T (SMT on)

Base / Turbo3.80 GHz / 5.10 GHz

L1d / L2 / L332 KB / 256 KB / 16 MB

OSLinux 6.8 (Ubuntu)

Rust1.93.1 stable

Apple M1 Pro — Secondary

CPUApple M1 Pro

Architectureaarch64 (ARMv8.5-A)

Cores8 (6P + 2E)

L1d (P-core)128 KB

L212 MB (P-cluster)

OSmacOS 26.3

Rust1.92.0 stable

Core operations

Compared against disruptor v4.0.0 (BusySpin wait, 4096-slot ring, same binary, same Criterion invocation).

Operation	Photon Ring (A)	Photon Ring (B)	disruptor-rs (A)	disruptor-rs (B)
Publish only	2.8 ns	2.4 ns	30.6 ns	15.3 ns
Cross-thread roundtrip	95 ns	130 ns	138 ns	186 ns
Same-thread roundtrip (1 sub)	2.7 ns	8.8 ns	—	—
Fanout (10 subscribers)	17.0 ns	27.7 ns	—	—
SubscriberGroup read	2.6 ns	8.8 ns	—	—
MPMC (1 pub + 1 sub)	12.1 ns	10.6 ns	—	—
Empty poll	0.85 ns	1.1 ns	—	—
Batch publish 64 + drain	158 ns	282 ns	—	—
Struct roundtrip (24-byte Pod)	4.8 ns	9.3 ns	—	—
One-way latency p50 (RDTSC)	48 ns	—	—	—
Sustained throughput	~300M msg/s	~88M msg/s	—	—

disruptor-rs comparison note: Both libraries use BusySpin wait strategy and 4096-slot rings. The Disruptor benchmarks run in the same Criterion binary, compiled with identical flags. Cross-thread Disruptor numbers are not available because its consumer thread is managed internally by the builder API. See benchmark methodology for full details.

Cross-thread roundtrip latency distribution

100,000,000 samples per library — Intel i7-10700KF, no core pinning

Publish Latency Comparison

Publish-only cost in nanoseconds, same Criterion run

Cross-Thread Roundtrip

Publisher → subscriber → signal-back, two machines

One-way Latency Percentiles (RDTSC)

p50, p90, p99, p99.9 on Intel i7-10700KF (x86_64)

Throughput

Sustained message rate, single publisher, single subscriber, BusySpin.

Machine	Throughput	Notes
Intel i7-10700KF (Intel i7-10700KF)	~300M msg/s	BusySpin, 4096 slots, u64 payload
Apple M1 Pro (Apple M1 Pro)	~88M msg/s	BusySpin, 4096 slots, u64 payload

Payload scaling

Photon Ring outperforms disruptor-rs at every payload size tested (8 B – 4 KiB). See full payload scaling analysis.

Reproducibility: Numbers use T: Pod payloads and no custom RUSTFLAGS. CPU governor, Turbo Boost, SMT, and core pinning are not controlled in the Criterion suite. Run cargo bench on your own hardware and treat published figures as indicative snapshots.

Comparison

How Photon Ring fits alongside the broader Rust concurrency ecosystem

Feature matrix

Feature	Photon Ring	disruptor-rs v4	crossbeam-channel	bus
Delivery model	Broadcast	Broadcast	Point-to-point queue	Broadcast
Publish cost	2.8 ns / 12.1 ns (MPMC)	30.6 ns	—	—
Cross-thread roundtrip	95 ns	138 ns	—	—
Sustained throughput	~300M msg/s	—	—	—
Topology / pipeline builder	Yes	Yes	No	No
Batch publish & drain	Yes	Yes	Iterator only	No
Named-topic bus	Yes	No	No	No
Heterogeneous-type bus	Yes (TypedBus)	No	No	No
Backpressure	Optional	Default	Default	Default
no_std compatible	Yes	No	No	No
Multi-producer (MPMC)	Yes	Yes	Yes	No
CPU affinity helpers	Yes	No	No	No
Hugepages / mlock	Linux only	No	No	No
Lazy drop on overflow	Yes (lossy default)	Blocks	Blocks or drops	Blocks

Design constraints

Constraint	Rationale
T: Pod	Every bit pattern must be valid. Torn reads from speculative seqlock copies are safe to reject without UB.
Power-of-two capacity	Indexing uses `seq & mask` instead of `%`, avoiding division on the hot path.
Single producer by default	`&mut self` enforces one writer at the type level. No CAS on the write path.
Lossy overflow by default	Publisher never blocks. Slow subscribers detect drops via `TryRecvError::Lagged`.
64-bit atomics required	The seqlock stamp is a `u64`. Platforms without atomic 64-bit operations are not supported.

When to choose crossbeam-channel instead: If each message should be consumed by exactly one receiver (point-to-point ownership transfer), use crossbeam-channel. Photon Ring is optimised for broadcast: every subscriber sees the same stream with independent cursors and no contention.

API Overview

Channels, buses, pipelines, and wait strategies — all composable. See docs.rs for the full reference.

SPMC channel

Channel basics rust

// Single producer, multiple consumers — the fastest path
let (mut pub_, subs) = channel::<u64>(1024);
let mut sub = subs.subscribe();

pub_.publish(42);
assert_eq!(sub.try_recv(), Ok(42));

// Bounded backpressure (publisher blocks instead of overwriting)
let (mut pub_, subs) = channel_bounded::<u64>(1024, 512);

// Multiple producers
let (mp_pub, subs) = channel_mpmc::<u64>(1024);
let mp_pub2 = mp_pub.clone();  // MpPublisher: Clone + Send + Sync

Named-topic bus

Photon<T> bus rust

let bus = Photon::<u64>::new(1024);

let mut prices = bus.publisher("prices");
let mut trades = bus.publisher("trades");

let mut sub = bus.subscribe("prices");

prices.publish(100);
assert_eq!(sub.try_recv(), Ok(100));

Pipeline topology

Multi-stage pipeline builder rust

let (input, pipeline) = Pipeline::builder()
    .capacity(4096)
    .input::<u64>()
    .then(|x| x * 2)       // stage 1: dedicated thread
    .then(|x| x + 1)       // stage 2: dedicated thread
    .build();

input.publish(21);

// Fan-out: diamond topology
let (input, _pipeline) = Pipeline::builder()
    .capacity(1024)
    .input::<u64>()
    .fan_out(|x| x * 2, |x| x + 100)   // two parallel branches
    .build();

Wait strategies

Blocking vs. spinning rust

use photon_ring::WaitStrategy;

// Absolute lowest wakeup latency
sub.recv_with(WaitStrategy::BusySpin);

// Cooperative spinning (yields CPU between spins)
sub.recv_with(WaitStrategy::YieldSpin);

// Exponential backoff (good for mixed loads)
sub.recv_with(WaitStrategy::BackoffSpin);

// Automatically tunes based on observed latency
sub.recv_with(WaitStrategy::Adaptive);

Platform support

Platform	Core ring	Affinity	Topology	Hugepages
x86_64 Linux	Yes	Yes	Yes	Yes
x86_64 macOS / Windows	Yes	Yes	Yes	No
aarch64 Linux	Yes	Yes	Yes	Yes
aarch64 macOS (Apple Silicon)	Yes	Yes	Yes	No
wasm32	Yes	No	No	No
FreeBSD / NetBSD / Android	Yes	Yes	Yes	No
32-bit ARM (Cortex-M)	No	No	No	No

Get Started

From zero to a working channel in under a minute

1. Add to Cargo.toml

Cargo.toml toml

[dependencies]
photon-ring = "2"

# Optional features
# photon-ring = { version = "2", features = ["derive", "hugepages"] }

2. Quick start

src/main.rs rust

use photon_ring::{channel, Photon};

fn main() {
    // SPMC: one publisher, multiple independent subscribers
    let (mut pub_, subs) = channel::<u64>(1024);
    let mut sub_a = subs.subscribe();
    let mut sub_b = subs.subscribe();

    pub_.publish(42);

    // Both subscribers see the same message
    assert_eq!(sub_a.try_recv(), Ok(42));
    assert_eq!(sub_b.try_recv(), Ok(42));

    // Named-topic bus
    let bus = Photon::<u64>::new(1024);
    let mut p = bus.publisher("prices");
    let mut s = bus.subscribe("prices");
    p.publish(100);
    assert_eq!(s.try_recv(), Ok(100));
}

3. Cross-thread usage

Cross-thread example rust

use photon_ring::{channel, WaitStrategy};
use std::thread;

let (mut pub_, subs) = channel::<u64>(4096);
let mut sub = subs.subscribe();

let consumer = thread::spawn(move || {
    loop {
        match sub.try_recv() {
            Ok(v)  => { /* process v */ }
            Err(_) => break,
        }
    }
});

for i in 0..1_000_000 {
    pub_.publish(i);
}

consumer.join().unwrap();

Resources

docs.rs API reference — full public API with examples
GitHub examples — market_data, pipeline, backpressure, diamond, pinned_latency
Benchmark methodology — how to reproduce the numbers
Payload scaling analysis — latency vs payload size, 8 B – 4 KiB
Technical report — cache coherence theory, seqlock design, formal analysis