Probability Theory for Trading¶
Difficulty beginner
Fundamentals¶
Basic Definitions¶
Sample Space (Ω) — Set of all possible outcomes
Event (E) — Subset of sample space
Probability (P) — Measure of likelihood
where:
P(E)probability of event E ·Ωsample space (all outcomes) ·∅empty set (no outcomes). does: the three Kolmogorov axioms — every probability lies in [0,1], the entire sample space has probability 1, and the impossible event has probability 0.
Probability Rules¶
Addition Rule:
where:
A ∪ Bunion (A or B or both) ·A ∩ Bintersection (both A and B) · "mutually exclusive" means A and B cannot occur together. does: computes probability of "either event" — subtracting the overlap prevents double-counting outcomes shared by both events.
Multiplication Rule:
where:
P(B|A)probability of B given A occurred · independence means A's occurrence carries no information about B. does: probability of two events both occurring. Independence collapses the conditional into the marginal — most asset-return models assume this and lose information.
Bayes' Theorem:
where:
P(A)prior belief about A ·P(B|A)likelihood of observing B if A is true ·P(B)total probability of B (normalizing constant) ·P(A|B)posterior — updated belief after seeing B. does: the mechanical rule for updating beliefs with new evidence. The single most useful equation in quant finance for sequential decision-making.
Conditional Probability¶
Definition¶
where:
P(A|B)conditional probability of A given B ·P(A ∩ B)joint probability of both A and B ·P(B)marginal probability of B (must be > 0). does: restricts the sample space to outcomes consistent with B, then asks how much of that restricted space is also in A. Backbone of Bayesian inference.
Random Variables¶
Discrete Random Variables¶
Probability Mass Function (PMF):
where:
Xdiscrete random variable ·xspecific value ·p(x)probability X takes that exact value · for discrete RVs, Σ p(x) = 1 over all possible values. does: assigns a probability to every possible outcome of a discrete RV. The discrete analogue of a probability density function.
Binomial Distribution:
P(X = k) = C(n,k) × pᵏ × (1-p)ⁿ⁻ᵏ
n = number of trials
k = number of successes
p = probability of success
Mean: np
Variance: np(1-p)
where:
C(n,k)binomial coefficient (n choose k) = n!/(k!(n−k)!) ·p^kprobability of k specific successes ·(1−p)^(n−k)probability of n−k specific failures. does: probability of exactly k wins in n independent trades, each with win probability p. The natural distribution for "in N trades, how often will I win X times?"
Continuous Random Variables¶
Probability Density Function (PDF):
where:
f(x)density at x (not a probability — densities can exceed 1) ·∫ₐᵇ f(x)dxarea under the density between a and b · the two constraints are non-negativity and total area = 1. does: continuous analogue of the PMF — probability is the integral of the density over a range, never the value at a single point. Used for any continuous return / price / vol model.
Cumulative Distribution Function (CDF):
where:
F(x)cumulative distribution function ·P(X ≤ x)probability X is at or below x · the integral runs from −∞ to x, accumulating density. does: maps any value x to the probability of being at or below it. Inverse CDF gives quantiles — the function behind VaR, percentile stops, and confidence-interval cut-offs.
Common Distributions in Trading¶
Expected Value¶
Definition¶
where:
E[X]expected value (mean) of random variable X ·xᵢdiscrete outcome ·p(xᵢ)probability of that outcome ·f(x)probability density function (continuous case). does: probability-weighted average of all possible values. The single most useful summary of a distribution and the foundation of every decision-under-uncertainty framework.
Variance and Moments¶
Variance¶
where:
Var(X)variance ·μ = E[X]mean · the second form (E[X²] − μ²) is the computational shortcut that avoids the centring step. does: expected squared deviation from the mean. Square root = standard deviation, in the same units as X.
Higher Moments¶
| Moment | Formula | Meaning |
|---|---|---|
| 1st | E[X] | Location (mean) |
| 2nd | E[(X-μ)²] | Spread (variance) |
| 3rd | E[(X-μ)³]/σ³ | Asymmetry (skewness) |
| 4th | E[(X-μ)⁴]/σ⁴ | Tail weight (kurtosis) |
Portfolio Variance¶
where:
Rpportfolio return ·wᵢweight on asset i ·Cov(Rᵢ,Rⱼ)covariance between assets i and j ·σᵢstandard deviation of asset i ·ρ₁₂correlation between assets 1 and 2. does: generalizes asset variance to a portfolio. The cross term — twice the covariance — is the entire mechanism of diversification.
Law of Large Numbers¶
Statement¶
As sample size increases, sample mean converges to population mean:
where:
x̄ₙmean of n samples ·μtrue population mean · "almost surely" means convergence happens with probability 1. does: guarantees that a long enough track record reveals true performance — but says nothing about how long is "enough." In practice, you need many trades before observed mean is a reliable estimate of true mean.
Trading Implication¶
Your observed win rate will converge to your true win rate as trade count increases. Short-term results are noisy.
Minimum Sample Size¶
n ≥ (z × σ / E)²
z = z-score for confidence level (1.96 for 95%)
σ = standard deviation
E = margin of error
For win rate estimation:
n ≥ z² × p(1-p) / E²
where:
nminimum required sample size ·zstandard-normal quantile for chosen confidence (1.96 for 95%, 2.576 for 99%) ·σestimated standard deviation ·Ehalf-width of the desired confidence interval ·pestimated proportion (use 0.5 for the most conservative bound). does: computes the minimum number of trades you need to estimate true win rate or mean return within an error band E at confidence level z. The standard answer to "is my track record long enough yet?"
Central Limit Theorem¶
Statement¶
Sample mean of i.i.d. random variables converges to normal distribution:
where:
x̄ₙsample mean of n i.i.d. observations ·μtrue mean ·σ²true variance ·N(0, σ²)normal distribution with mean 0, variance σ² · "→ in distribution" means the CDF converges as n→∞. does: says the scaled, centred sample mean has approximately a normal distribution for large n, regardless of the original distribution's shape (as long as variance is finite). The reason confidence intervals work for non-normal returns.
Trading Application¶
Even if individual trade returns are not normal, the average return over many trades approaches normality. This enables: - Confidence intervals for strategy performance - Statistical significance testing - VaR calculations
Caveats in Finance¶
CLT assumptions often violated in markets: - Returns not i.i.d. (volatility clustering) - Infinite variance possible (power laws) - Structural breaks (regime changes)
Markov Chains¶
Definition¶
Stochastic process where future depends only on current state:
where:
X_tstate at time t · the left side conditions on the entire history; the right side conditions only on the current state. does: formalizes "memorylessness" — given today's state, tomorrow is independent of how you got here. Foundation for regime models, hidden Markov models, and discrete dynamic-programming policies.
Monte Carlo Methods¶
Applications¶
| Application | Purpose |
|---|---|
| Strategy testing | Assess robustness to randomness |
| Risk analysis | Estimate tail events |
| Portfolio planning | Range of possible outcomes |
| Position sizing | Optimize Kelly criterion |
| Options pricing | Price complex derivatives |
Random Walk Theory¶
Definition¶
Price changes are independent and identically distributed:
where:
P_tprice at time t ·ε_trandom innovation · i.i.d. = independent and identically distributed · zero-mean innovation means no systematic drift in this simplest form. does: the textbook efficient-market model — tomorrow's price is today's plus pure noise. The benchmark every strategy is implicitly trying to beat. Real markets exhibit some predictability (momentum, mean-reversion, volatility clustering) but the random-walk null is hard to reject decisively.
Implications¶
- Prices follow random walk — Past prices don't predict future
- Technical analysis ineffective — Patterns are illusory
- Active management futile — Can't consistently beat market
Evidence Against Pure Random Walk¶
Markets exhibit: - Momentum (short to medium term) - Mean reversion (long term) - Volatility clustering - Fat tails - Calendar anomalies
Reality: Markets are not perfectly efficient, but are difficult to beat consistently.
Information Theory¶
Entropy¶
where:
H(X)Shannon entropy of X (bits) ·p(x)probability of outcome x ·log₂log base 2 (bits); natural log gives nats; log10 gives hartleys. does: measures average information content in bits. Maximum for a uniform distribution (no predictability), zero for a deterministic one. Used in decision-tree splits, feature selection, and quantifying "how predictable is this market state?"
Mutual Information¶
I(X;Y) = Σ Σ p(x,y) × log(p(x,y) / (p(x)p(y)))
Measures information shared between X and Y
Better than correlation for non-linear relationships
where:
I(X;Y)mutual information between X and Y ·p(x,y)joint probability ·p(x), p(y)marginal probabilities · zero when X and Y are independent (joint = product of marginals). does: measures any statistical dependence, including nonlinear ones that correlation misses. Essential for screening features where the relationship isn't linear (e.g., volatility regimes, threshold effects).
Key Formulas Reference¶
Bayes: P(A|B) = P(B|A) × P(A) / P(B)
Expected Value: E[X] = Σ xᵢ × p(xᵢ)
Variance: Var(X) = E[X²] - (E[X])²
Portfolio Variance: w'Σw
Binomial: P(X=k) = C(n,k)pᵏ(1-p)ⁿ⁻ᵏ
EV per trade: WR × AvgWin + (1-WR) × AvgLoss
Next Steps¶
- Time Series Analysis — Modeling sequential financial data
- Regression Models — Predictive modeling
- Statistics Basics — Foundational statistics