Photon Ring banner

Photon Ring

Seqlock-stamped inter-thread messaging for Rust. Zero-allocation broadcast channels at near-hardware latency.

48 ns
p50 one-way latency
2.8 ns
publish cost (Intel)
300M
msg/s sustained
0 alloc
on hot path

Overview

How Photon Ring achieves near-hardware latency

Photon Ring is a zero-allocation pub/sub crate for Rust built around pre-allocated ring buffers with per-slot seqlock stamps. It targets the part of concurrent systems where queueing overhead dominates: market data, telemetry fanout, staged pipelines, and other hot-path broadcast workloads where every subscriber must observe every message.

The central insight is stamp-in-slot co-location: by embedding the seqlock sequence stamp directly alongside the payload in one #[repr(C, align(64))] struct, both ownership metadata and data reside within a single 64-byte cache line for payloads up to 56 bytes. Consumers validate and read in one L3 snoop instead of two, cutting coherence traffic in half.

Slot layout

                    64 bytes (one cache line)
+-----------------------------------------------------------+
|  stamp: AtomicU64  |  value: T                            |
|  (seqlock)         |  (Pod — all bit patterns valid)      |
+-----------------------------------------------------------+
For T <= 56 bytes: stamp and value share one cache line.
Larger T spills to additional lines (still correct, slightly slower).

Write protocol:                Read protocol:
  stamp = seq*2 + 1 (odd)       s1 = stamp.load(Acquire)
  fence(Release)                 if odd  → spin
  memcpy(slot.value, data)       if s1 < expected → Empty
  stamp = seq*2 + 2 (even)       if s1 > expected → Lagged
  cursor = seq (Release)         value = memcpy(slot)
                                 s2 = stamp.load(Acquire)
                                 if s1 == s2 → return value
                                 else → retry
Near-hardware latency
48 ns p50 one-way latency on Intel i7-10700KF — within 20% of the bare L3 snoop floor, leaving almost no software overhead.
📡
True broadcast
Every subscriber sees every message. Fanout to 10 independent subscribers costs 17 ns total (Intel) — 1.7 ns per subscriber. SubscriberGroup batches N logical consumers into a single seqlock read, cutting that to 0.2 ns each.
🧰
Zero allocation on the hot path
The ring is pre-allocated at construction. publish and try_recv never touch the allocator — no GC pauses, no malloc jitter.
🧪
Pod payload safety
Requires T: Pod (every bit pattern valid), making speculative torn reads safe to discard. Compile-time proof, not a runtime check.
SPMC and MPMC
Publisher is single-producer via &mut self (no CAS on write). MpPublisher adds a lock-free multi-producer path. Named-topic Photon<T> and heterogeneous TypedBus included.
🌐
no_std + alloc
Works on bare metal, WASM, and embedded targets with alloc. Pipeline topology builder, hugepages, and CPU affinity are available on supported desktop/server platforms.

Benchmarks

Criterion (100 samples, --release, no custom RUSTFLAGS) on two machines. Numbers are medians unless stated.

Hardware

Intel i7-10700KF — Primary
CPUIntel Core i7-10700KF
MicroarchComet Lake (14 nm)
Cores / Threads8C / 16T (SMT on)
Base / Turbo3.80 GHz / 5.10 GHz
L1d / L2 / L332 KB / 256 KB / 16 MB
OSLinux 6.8 (Ubuntu)
Rust1.93.1 stable
Apple M1 Pro — Secondary
CPUApple M1 Pro
Architectureaarch64 (ARMv8.5-A)
Cores8 (6P + 2E)
L1d (P-core)128 KB
L212 MB (P-cluster)
OSmacOS 26.3
Rust1.92.0 stable

Core operations

Compared against disruptor v4.0.0 (BusySpin wait, 4096-slot ring, same binary, same Criterion invocation).

Operation Photon Ring (A) Photon Ring (B) disruptor-rs (A) disruptor-rs (B)
Publish only 2.8 ns 2.4 ns 30.6 ns 15.3 ns
Cross-thread roundtrip 95 ns 130 ns 138 ns 186 ns
Same-thread roundtrip (1 sub) 2.7 ns 8.8 ns
Fanout (10 subscribers) 17.0 ns 27.7 ns
SubscriberGroup read 2.6 ns 8.8 ns
MPMC (1 pub + 1 sub) 12.1 ns 10.6 ns
Empty poll 0.85 ns 1.1 ns
Batch publish 64 + drain 158 ns 282 ns
Struct roundtrip (24-byte Pod) 4.8 ns 9.3 ns
One-way latency p50 (RDTSC) 48 ns
Sustained throughput ~300M msg/s ~88M msg/s
disruptor-rs comparison note: Both libraries use BusySpin wait strategy and 4096-slot rings. The Disruptor benchmarks run in the same Criterion binary, compiled with identical flags. Cross-thread Disruptor numbers are not available because its consumer thread is managed internally by the builder API. See benchmark methodology for full details.
Cross-thread roundtrip latency distribution
100,000,000 samples per library — Intel i7-10700KF, no core pinning
Publish Latency Comparison
Publish-only cost in nanoseconds, same Criterion run
Cross-Thread Roundtrip
Publisher → subscriber → signal-back, two machines
One-way Latency Percentiles (RDTSC)
p50, p90, p99, p99.9 on Intel i7-10700KF (x86_64)

Throughput

Sustained message rate, single publisher, single subscriber, BusySpin.

Machine Throughput Notes
Intel i7-10700KF (Intel i7-10700KF) ~300M msg/s BusySpin, 4096 slots, u64 payload
Apple M1 Pro (Apple M1 Pro) ~88M msg/s BusySpin, 4096 slots, u64 payload

Payload scaling

Photon Ring outperforms disruptor-rs at every payload size tested (8 B – 4 KiB). See full payload scaling analysis.

Payload scaling chart: Photon Ring vs disruptor-rs across 8B-4KiB payloads
Reproducibility: Numbers use T: Pod payloads and no custom RUSTFLAGS. CPU governor, Turbo Boost, SMT, and core pinning are not controlled in the Criterion suite. Run cargo bench on your own hardware and treat published figures as indicative snapshots.

Comparison

How Photon Ring fits alongside the broader Rust concurrency ecosystem

Feature matrix

Feature Photon Ring disruptor-rs v4 crossbeam-channel bus
Delivery model Broadcast Broadcast Point-to-point queue Broadcast
Publish cost 2.8 ns / 12.1 ns (MPMC) 30.6 ns
Cross-thread roundtrip 95 ns 138 ns
Sustained throughput ~300M msg/s
Topology / pipeline builder Yes Yes No No
Batch publish & drain Yes Yes Iterator only No
Named-topic bus Yes No No No
Heterogeneous-type bus Yes (TypedBus) No No No
Backpressure Optional Default Default Default
no_std compatible Yes No No No
Multi-producer (MPMC) Yes Yes Yes No
CPU affinity helpers Yes No No No
Hugepages / mlock Linux only No No No
Lazy drop on overflow Yes (lossy default) Blocks Blocks or drops Blocks

Design constraints

Constraint Rationale
T: Pod Every bit pattern must be valid. Torn reads from speculative seqlock copies are safe to reject without UB.
Power-of-two capacity Indexing uses seq & mask instead of %, avoiding division on the hot path.
Single producer by default &mut self enforces one writer at the type level. No CAS on the write path.
Lossy overflow by default Publisher never blocks. Slow subscribers detect drops via TryRecvError::Lagged.
64-bit atomics required The seqlock stamp is a u64. Platforms without atomic 64-bit operations are not supported.
When to choose crossbeam-channel instead: If each message should be consumed by exactly one receiver (point-to-point ownership transfer), use crossbeam-channel. Photon Ring is optimised for broadcast: every subscriber sees the same stream with independent cursors and no contention.

API Overview

Channels, buses, pipelines, and wait strategies — all composable. See docs.rs for the full reference.

SPMC channel

Channel basics rust
// Single producer, multiple consumers — the fastest path
let (mut pub_, subs) = channel::<u64>(1024);
let mut sub = subs.subscribe();

pub_.publish(42);
assert_eq!(sub.try_recv(), Ok(42));

// Bounded backpressure (publisher blocks instead of overwriting)
let (mut pub_, subs) = channel_bounded::<u64>(1024, 512);

// Multiple producers
let (mp_pub, subs) = channel_mpmc::<u64>(1024);
let mp_pub2 = mp_pub.clone();  // MpPublisher: Clone + Send + Sync

Named-topic bus

Photon<T> bus rust
let bus = Photon::<u64>::new(1024);

let mut prices = bus.publisher("prices");
let mut trades = bus.publisher("trades");

let mut sub = bus.subscribe("prices");

prices.publish(100);
assert_eq!(sub.try_recv(), Ok(100));

Pipeline topology

Multi-stage pipeline builder rust
let (input, pipeline) = Pipeline::builder()
    .capacity(4096)
    .input::<u64>()
    .then(|x| x * 2)       // stage 1: dedicated thread
    .then(|x| x + 1)       // stage 2: dedicated thread
    .build();

input.publish(21);

// Fan-out: diamond topology
let (input, _pipeline) = Pipeline::builder()
    .capacity(1024)
    .input::<u64>()
    .fan_out(|x| x * 2, |x| x + 100)   // two parallel branches
    .build();

Wait strategies

Blocking vs. spinning rust
use photon_ring::WaitStrategy;

// Absolute lowest wakeup latency
sub.recv_with(WaitStrategy::BusySpin);

// Cooperative spinning (yields CPU between spins)
sub.recv_with(WaitStrategy::YieldSpin);

// Exponential backoff (good for mixed loads)
sub.recv_with(WaitStrategy::BackoffSpin);

// Automatically tunes based on observed latency
sub.recv_with(WaitStrategy::Adaptive);

Platform support

Platform Core ring Affinity Topology Hugepages
x86_64 Linux Yes Yes Yes Yes
x86_64 macOS / Windows Yes Yes Yes No
aarch64 Linux Yes Yes Yes Yes
aarch64 macOS (Apple Silicon) Yes Yes Yes No
wasm32 Yes No No No
FreeBSD / NetBSD / Android Yes Yes Yes No
32-bit ARM (Cortex-M) No No No No

Get Started

From zero to a working channel in under a minute

1. Add to Cargo.toml

Cargo.toml toml
[dependencies]
photon-ring = "2"

# Optional features
# photon-ring = { version = "2", features = ["derive", "hugepages"] }

2. Quick start

src/main.rs rust
use photon_ring::{channel, Photon};

fn main() {
    // SPMC: one publisher, multiple independent subscribers
    let (mut pub_, subs) = channel::<u64>(1024);
    let mut sub_a = subs.subscribe();
    let mut sub_b = subs.subscribe();

    pub_.publish(42);

    // Both subscribers see the same message
    assert_eq!(sub_a.try_recv(), Ok(42));
    assert_eq!(sub_b.try_recv(), Ok(42));

    // Named-topic bus
    let bus = Photon::<u64>::new(1024);
    let mut p = bus.publisher("prices");
    let mut s = bus.subscribe("prices");
    p.publish(100);
    assert_eq!(s.try_recv(), Ok(100));
}

3. Cross-thread usage

Cross-thread example rust
use photon_ring::{channel, WaitStrategy};
use std::thread;

let (mut pub_, subs) = channel::<u64>(4096);
let mut sub = subs.subscribe();

let consumer = thread::spawn(move || {
    loop {
        match sub.try_recv() {
            Ok(v)  => { /* process v */ }
            Err(_) => break,
        }
    }
});

for i in 0..1_000_000 {
    pub_.publish(i);
}

consumer.join().unwrap();

Resources