Skip to content

High-Frequency Data

Overview

High-frequency data captures market activity at sub-second intervals. It is essential for market microstructure research, HFT strategy development, and execution quality analysis.

Difficulty advanced

Data Types

1. Tick Data

Each trade is recorded individually:
- Timestamp (nanosecond precision)
- Price
- Volume
- Trade direction (buy/sell)
- Exchange

Example:
2024-01-15 09:30:00.123456789, 150.25, 100, BUY, ARCA
2024-01-15 09:30:00.123456790, 150.26, 50, SELL, ARCA

2. Quote Data (NBBO)

National Best Bid and Offer updates:
- Timestamp
- Best bid price, bid size
- Best ask price, ask size
- Exchange

Example:
2024-01-15 09:30:00.123456789, 150.24×500, 150.26×300, ARCA/NASDAQ

3. Order Book Data (L2/L3)

Level 2: Best bids and offers at each price level
Level 3: Full order book with individual order IDs

L2 Example:
Bid: 150.24×500, 150.23×1000, 150.22×800, ...
Ask: 150.26×300, 150.27×600, 150.28×400, ...

L3 Example:
Bid: Order#12345@150.24×100, Order#12346@150.24×200, ...

4. Message Data

Every order book event:
- Order submission
- Order cancellation
- Order modification
- Trade execution

Typical: 500,000+ messages per second for active stocks

Data Volume

Data Type Messages/Day Storage/Day Format
Tick (SPY) ~1M ~50MB CSV/Parquet
Quote (SPY) ~10M ~500MB Binary
L2 Book (SPY) ~50M ~2GB Binary
L3 Book (SPY) ~500M ~20GB Binary
Full Market (All stocks) ~50B ~2TB Proprietary

Data Storage

Format Read Speed Write Speed Compression Use Case
CSV Slow Slow Good Small datasets
Parquet Fast Fast Excellent Analysis
HDF5 Fast Medium Good Large datasets
KDB+/q Very Fast Very Fast Built-in HFT production
Arrow Very Fast Very Fast Good Inter-process

Checklist

  • [ ] Timestamp precision adequate (nanosecond for HFT)
  • [ ] Timezone handling consistent (UTC)
  • [ ] Clock synchronization (NTP/PTP)
  • [ ] Data gaps identified and handled
  • [ ] Survivorship bias avoided
  • [ ] Corporate actions adjusted
  • [ ] Storage format optimized for access patterns
  • [ ] Backfill process for missing data
  • [ ] Data quality checks automated
  • [ ] Latency between market event and recording measured

References

  1. Hasbrouck, J. (2007). Empirical Market Microstructure. Oxford University Press.
  2. O'Hara, M. (1995). Market Microstructure Theory. Blackwell.
  3. Easley, D. & O'Hara, M. (1992). "Time and the Process of Security Price Adjustment." Journal of Finance, 47(2), 577-605.