Backtesting Engine¶
Difficulty expert
Overview¶
A backtesting engine simulates trading strategies on historical data to evaluate their performance before live deployment.
Engine Architecture¶
┌──────────────────────────────────────────────┐
│ BACKTESTER │
├──────────┬──────────┬──────────┬──────────────┤
│ Data │ Strategy │ Portfolio│ Analytics │
│ Loader │ Engine │ Manager │ Engine │
│ │ │ │ │
│ ┌──────┐ │ ┌──────┐ │ ┌──────┐ │ ┌──────────┐ │
│ │CSV │ │ │Signal│ │ │Cash │ │ │P&L │ │
│ │API │ │ │Gen │ │ │Pos │ │ │Metrics │ │
│ │DB │ │ │ │ │ │ │ │ │ │ │
│ └──────┘ │ └──────┘ │ └──────┘ │ └──────────┘ │
└──────────┴──────────┴──────────┴──────────────┘
Core Components¶
Event-Driven Backtester¶
class Event:
"""Base event class."""
pass
class MarketEvent(Event):
"""New market data received."""
def __init__(self, timestamp, data):
self.timestamp = timestamp
self.data = data
class SignalEvent(Event):
"""Trading signal generated."""
def __init__(self, symbol, direction, strength, timestamp):
self.symbol = symbol
self.direction = direction # 1=buy, -1=sell, 0=exit
self.strength = strength
self.timestamp = timestamp
class OrderEvent(Event):
"""Order to be executed."""
def __init__(self, symbol, order_type, side, quantity, price=None):
self.symbol = symbol
self.order_type = order_type
self.side = side
self.quantity = quantity
self.price = price
class FillEvent(Event):
"""Order execution confirmation."""
def __init__(self, symbol, side, quantity, price, commission, timestamp):
self.symbol = symbol
self.side = side
self.quantity = quantity
self.price = price
self.commission = commission
self.timestamp = timestamp
Portfolio Manager¶
class Portfolio:
"""Track positions, cash, and P&L."""
def __init__(self, initial_capital):
self.initial_capital = initial_capital
self.cash = initial_capital
self.positions = {}
self.trades = []
self.equity_curve = []
def update_fill(self, fill):
"""Update portfolio based on fill."""
cost = fill.quantity * fill.price + fill.commission
if fill.side == 'BUY':
self.cash -= cost
self.positions[fill.symbol] = self.positions.get(fill.symbol, 0) + fill.quantity
else:
self.cash += cost
self.positions[fill.symbol] = self.positions.get(fill.symbol, 0) - fill.quantity
self.trades.append(fill)
self.equity_curve.append({
'timestamp': fill.timestamp,
'cash': self.cash,
'positions': dict(self.positions)
})
Performance Metrics¶
class PerformanceAnalyzer:
"""Calculate strategy performance metrics."""
@staticmethod
def calculate_metrics(equity_curve, risk_free_rate=0.0):
"""Calculate comprehensive performance metrics."""
returns = pd.Series(equity_curve).pct_change().dropna()
total_return = (equity_curve[-1] / equity_curve[0]) - 1
n_years = len(equity_curve) / 252
cagr = (1 + total_return) ** (1 / n_years) - 1
sharpe = (returns.mean() - risk_free_rate / 252) / returns.std() * np.sqrt(252)
# Drawdown
peak = pd.Series(equity_curve).cummax()
drawdown = (pd.Series(equity_curve) - peak) / peak
max_drawdown = drawdown.min()
return {
'total_return': total_return,
'cagr': cagr,
'sharpe_ratio': sharpe,
'max_drawdown': max_drawdown,
'calmar_ratio': cagr / abs(max_drawdown) if max_drawdown != 0 else 0,
'volatility': returns.std() * np.sqrt(252),
'win_rate': (returns > 0).mean(),
'profit_factor': returns[returns > 0].sum() / abs(returns[returns < 0].sum())
}
Common Pitfalls¶
| Pitfall | Problem | Solution |
|---|---|---|
| Look-ahead bias | Using future data | Strict chronological ordering |
| Survivorship bias | Only surviving stocks | Use point-in-time data |
| Overfitting | Curve-fitting to history | Out-of-sample testing |
| Ignoring costs | No commissions/slippage | Include all costs |
| Ignoring liquidity | Can't execute assumed size | Volume constraints |
| Data snooping | Testing many strategies | Adjust for multiple testing |
Walk-Forward Validation¶
def walk_forward_backtest(strategy_class, data, train_window=252, test_window=63, retrain_every=21):
"""Walk-forward backtesting."""
all_returns = []
train_end = train_window
while train_end + test_window <= len(data):
# Train
train_data = data.iloc[train_end - train_window:train_end]
strategy = strategy_class().fit(train_data)
# Test
test_data = data.iloc[train_end:train_end + test_window]
returns = strategy.backtest(test_data)
all_returns.extend(returns)
train_end += retrain_every
return all_returns
Practical Guidelines¶
- Start Simple — Basic backtest before complex
- Include Costs — Commissions, slippage, spread
- Out-of-Sample — Always test on unseen data
- Realistic Assumptions — Don't assume perfect execution
- Multiple Scenarios — Test across market conditions
- Sensitivity Analysis — Vary parameters
- Don't Trust One Backtest — Always validate live
q&a¶
Why is event-driven backtesting better than vectorized?
Vectorized backtests assume you can act on bar-close prices and re-balance the whole portfolio instantly. Event-driven walks through time tick-by-tick (or bar-by-bar) and respects the actual sequence of decisions — order placement, fills, partial fills, cancellations. Slower to write, far closer to live behavior, harder to fool yourself with.
What's the single most common backtesting mistake?
Look-ahead bias — using information that wouldn't have been available at decision time. Examples: backfilled fundamentals, restated earnings, point-in-time index membership, after-hours news in your training data. Even subtle versions (e.g., scaling features with the full sample mean) inflate backtest Sharpe by 2-3× over reality.
What Sharpe ratio should I trust in a backtest?
A backtest Sharpe of 3+ on daily data with realistic costs is almost always overfit. Live Sharpe usually halves backtest Sharpe at best. If your backtest shows >2 Sharpe on out-of-sample data after costs, be more skeptical, not more excited.
How long should the out-of-sample window be?
Long enough to span at least one regime change and ideally one drawdown. For daily strategies, 2+ years out-of-sample after a 5+ year in-sample window. Walk-forward (rolling re-fit) is better than a single in/out split because it tests the model's ability to adapt.
Should I include transaction costs in early prototyping?
Yes — even just a rough fixed-bps-per-trade estimate. Strategies that "work" without costs almost always fail at real costs. Including costs early kills bad ideas before you over-invest in them.
Next Steps¶
- Data Pipelines — Data infrastructure
- Paper Trading — Live simulation
- System Design — Full system architecture