Skip to content

Bulk Backfill

Two tools make large pulls civilized: per-day request splitting with your tier's concurrency, and chunk-streaming decode so memory stays flat no matter the response size.

Python — concurrent days, streamed decode

python
import asyncio

from thetadatadx import Config, Credentials, ThetaDataDxClient, split_date_range

tdx = ThetaDataDxClient(Credentials.from_file("creds.txt"), Config.production())

def on_chunk(chunk):
    # chunk: list of TradeTick — write to your store, then it is freed.
    store.append(chunk)

async def pull(start, end):
    builder = tdx.stock_history_trade_builder("AAPL", start).end_date(end)
    await builder.stream_async(on_chunk)

windows = split_date_range("20250101", "20250331")

async def main():
    await asyncio.gather(*(pull(s, e) for s, e in windows))

asyncio.run(main())

Every endpoint has a <endpoint>_builder(...) factory whose .stream(...) / .stream_async(...) terminals hand each decoded chunk to your callback and free it before fetching the next.

Rust — the same shape

rust
let days = ["20250303", "20250304", "20250305"];
for day in days {
    tdx.stock_history_trade("AAPL", day)
        .stream(|chunk| {
            // &[TradeTick] — persist, then the chunk is dropped.
            write_parquet(chunk);
        })
        .await?;
}

Run several days concurrently with futures::future::join_all — the SDK's tier semaphore paces them.

When to stop looping

A per-symbol loop over the whole market is the wrong tool past a handful of symbols — that's what flat files are for: one request returns every contract for a date.

Released under the Apache-2.0 License.