Skip to content

Backtesting Engine

Difficulty expert

Overview

A backtesting engine simulates trading strategies on historical data to evaluate their performance before live deployment.

Engine Architecture

┌──────────────────────────────────────────────┐
│              BACKTESTER                       │
├──────────┬──────────┬──────────┬──────────────┤
│  Data    │ Strategy │ Portfolio│  Analytics   │
│  Loader  │ Engine   │ Manager  │  Engine      │
│          │          │          │              │
│ ┌──────┐ │ ┌──────┐ │ ┌──────┐ │ ┌──────────┐ │
│ │CSV   │ │ │Signal│ │ │Cash  │ │ │P&L      │ │
│ │API   │ │ │Gen   │ │ │Pos   │ │ │Metrics   │ │
│ │DB    │ │ │      │ │ │      │ │ │          │ │
│ └──────┘ │ └──────┘ │ └──────┘ │ └──────────┘ │
└──────────┴──────────┴──────────┴──────────────┘

Core Components

Event-Driven Backtester

class Event:
    """Base event class."""
    pass

class MarketEvent(Event):
    """New market data received."""
    def __init__(self, timestamp, data):
        self.timestamp = timestamp
        self.data = data

class SignalEvent(Event):
    """Trading signal generated."""
    def __init__(self, symbol, direction, strength, timestamp):
        self.symbol = symbol
        self.direction = direction  # 1=buy, -1=sell, 0=exit
        self.strength = strength
        self.timestamp = timestamp

class OrderEvent(Event):
    """Order to be executed."""
    def __init__(self, symbol, order_type, side, quantity, price=None):
        self.symbol = symbol
        self.order_type = order_type
        self.side = side
        self.quantity = quantity
        self.price = price

class FillEvent(Event):
    """Order execution confirmation."""
    def __init__(self, symbol, side, quantity, price, commission, timestamp):
        self.symbol = symbol
        self.side = side
        self.quantity = quantity
        self.price = price
        self.commission = commission
        self.timestamp = timestamp

Portfolio Manager

class Portfolio:
    """Track positions, cash, and P&L."""

    def __init__(self, initial_capital):
        self.initial_capital = initial_capital
        self.cash = initial_capital
        self.positions = {}
        self.trades = []
        self.equity_curve = []

    def update_fill(self, fill):
        """Update portfolio based on fill."""
        cost = fill.quantity * fill.price + fill.commission
        if fill.side == 'BUY':
            self.cash -= cost
            self.positions[fill.symbol] = self.positions.get(fill.symbol, 0) + fill.quantity
        else:
            self.cash += cost
            self.positions[fill.symbol] = self.positions.get(fill.symbol, 0) - fill.quantity

        self.trades.append(fill)
        self.equity_curve.append({
            'timestamp': fill.timestamp,
            'cash': self.cash,
            'positions': dict(self.positions)
        })

Performance Metrics

class PerformanceAnalyzer:
    """Calculate strategy performance metrics."""

    @staticmethod
    def calculate_metrics(equity_curve, risk_free_rate=0.0):
        """Calculate comprehensive performance metrics."""
        returns = pd.Series(equity_curve).pct_change().dropna()

        total_return = (equity_curve[-1] / equity_curve[0]) - 1
        n_years = len(equity_curve) / 252
        cagr = (1 + total_return) ** (1 / n_years) - 1

        sharpe = (returns.mean() - risk_free_rate / 252) / returns.std() * np.sqrt(252)

        # Drawdown
        peak = pd.Series(equity_curve).cummax()
        drawdown = (pd.Series(equity_curve) - peak) / peak
        max_drawdown = drawdown.min()

        return {
            'total_return': total_return,
            'cagr': cagr,
            'sharpe_ratio': sharpe,
            'max_drawdown': max_drawdown,
            'calmar_ratio': cagr / abs(max_drawdown) if max_drawdown != 0 else 0,
            'volatility': returns.std() * np.sqrt(252),
            'win_rate': (returns > 0).mean(),
            'profit_factor': returns[returns > 0].sum() / abs(returns[returns < 0].sum())
        }

Common Pitfalls

Pitfall Problem Solution
Look-ahead bias Using future data Strict chronological ordering
Survivorship bias Only surviving stocks Use point-in-time data
Overfitting Curve-fitting to history Out-of-sample testing
Ignoring costs No commissions/slippage Include all costs
Ignoring liquidity Can't execute assumed size Volume constraints
Data snooping Testing many strategies Adjust for multiple testing

Walk-Forward Validation

def walk_forward_backtest(strategy_class, data, train_window=252, test_window=63, retrain_every=21):
    """Walk-forward backtesting."""
    all_returns = []
    train_end = train_window

    while train_end + test_window <= len(data):
        # Train
        train_data = data.iloc[train_end - train_window:train_end]
        strategy = strategy_class().fit(train_data)

        # Test
        test_data = data.iloc[train_end:train_end + test_window]
        returns = strategy.backtest(test_data)
        all_returns.extend(returns)

        train_end += retrain_every

    return all_returns

Practical Guidelines

  1. Start Simple — Basic backtest before complex
  2. Include Costs — Commissions, slippage, spread
  3. Out-of-Sample — Always test on unseen data
  4. Realistic Assumptions — Don't assume perfect execution
  5. Multiple Scenarios — Test across market conditions
  6. Sensitivity Analysis — Vary parameters
  7. Don't Trust One Backtest — Always validate live

q&a

Why is event-driven backtesting better than vectorized?

Vectorized backtests assume you can act on bar-close prices and re-balance the whole portfolio instantly. Event-driven walks through time tick-by-tick (or bar-by-bar) and respects the actual sequence of decisions — order placement, fills, partial fills, cancellations. Slower to write, far closer to live behavior, harder to fool yourself with.

What's the single most common backtesting mistake?

Look-ahead bias — using information that wouldn't have been available at decision time. Examples: backfilled fundamentals, restated earnings, point-in-time index membership, after-hours news in your training data. Even subtle versions (e.g., scaling features with the full sample mean) inflate backtest Sharpe by 2-3× over reality.

What Sharpe ratio should I trust in a backtest?

A backtest Sharpe of 3+ on daily data with realistic costs is almost always overfit. Live Sharpe usually halves backtest Sharpe at best. If your backtest shows >2 Sharpe on out-of-sample data after costs, be more skeptical, not more excited.

How long should the out-of-sample window be?

Long enough to span at least one regime change and ideally one drawdown. For daily strategies, 2+ years out-of-sample after a 5+ year in-sample window. Walk-forward (rolling re-fit) is better than a single in/out split because it tests the model's ability to adapt.

Should I include transaction costs in early prototyping?

Yes — even just a rough fixed-bps-per-trade estimate. Strategies that "work" without costs almost always fail at real costs. Including costs early kills bad ideas before you over-invest in them.

Next Steps