High-Frequency Data¶
Overview¶
High-frequency data captures market activity at sub-second intervals. It is essential for market microstructure research, HFT strategy development, and execution quality analysis.
Difficulty advanced
Data Types¶
1. Tick Data¶
Each trade is recorded individually:
- Timestamp (nanosecond precision)
- Price
- Volume
- Trade direction (buy/sell)
- Exchange
Example:
2024-01-15 09:30:00.123456789, 150.25, 100, BUY, ARCA
2024-01-15 09:30:00.123456790, 150.26, 50, SELL, ARCA
2. Quote Data (NBBO)¶
National Best Bid and Offer updates:
- Timestamp
- Best bid price, bid size
- Best ask price, ask size
- Exchange
Example:
2024-01-15 09:30:00.123456789, 150.24×500, 150.26×300, ARCA/NASDAQ
3. Order Book Data (L2/L3)¶
Level 2: Best bids and offers at each price level
Level 3: Full order book with individual order IDs
L2 Example:
Bid: 150.24×500, 150.23×1000, 150.22×800, ...
Ask: 150.26×300, 150.27×600, 150.28×400, ...
L3 Example:
Bid: Order#12345@150.24×100, Order#12346@150.24×200, ...
4. Message Data¶
Every order book event:
- Order submission
- Order cancellation
- Order modification
- Trade execution
Typical: 500,000+ messages per second for active stocks
Data Volume¶
| Data Type | Messages/Day | Storage/Day | Format |
|---|---|---|---|
| Tick (SPY) | ~1M | ~50MB | CSV/Parquet |
| Quote (SPY) | ~10M | ~500MB | Binary |
| L2 Book (SPY) | ~50M | ~2GB | Binary |
| L3 Book (SPY) | ~500M | ~20GB | Binary |
| Full Market (All stocks) | ~50B | ~2TB | Proprietary |
Data Storage¶
Recommended Formats¶
| Format | Read Speed | Write Speed | Compression | Use Case |
|---|---|---|---|---|
| CSV | Slow | Slow | Good | Small datasets |
| Parquet | Fast | Fast | Excellent | Analysis |
| HDF5 | Fast | Medium | Good | Large datasets |
| KDB+/q | Very Fast | Very Fast | Built-in | HFT production |
| Arrow | Very Fast | Very Fast | Good | Inter-process |
Checklist¶
- [ ] Timestamp precision adequate (nanosecond for HFT)
- [ ] Timezone handling consistent (UTC)
- [ ] Clock synchronization (NTP/PTP)
- [ ] Data gaps identified and handled
- [ ] Survivorship bias avoided
- [ ] Corporate actions adjusted
- [ ] Storage format optimized for access patterns
- [ ] Backfill process for missing data
- [ ] Data quality checks automated
- [ ] Latency between market event and recording measured
References¶
- Hasbrouck, J. (2007). Empirical Market Microstructure. Oxford University Press.
- O'Hara, M. (1995). Market Microstructure Theory. Blackwell.
- Easley, D. & O'Hara, M. (1992). "Time and the Process of Security Price Adjustment." Journal of Finance, 47(2), 577-605.