The Kelly Criterion is the most cited position sizing formula in quantitative trading. It is also the most misapplied. Traders who understand the formula — who can write it out, who know where it comes from, who have read Thorp — routinely use it in a way that mathematically guarantees they are risking more than they should. The error is not in the formula. The error is in what they plug into it.
This article explains what Kelly actually requires, why backtests cannot provide it, what the academic research says about the consequences of getting it wrong, and how to use Kelly correctly given the specific kind of uncertainty that comes with a backtested trading strategy.
What Kelly Actually Says
John Larry Kelly Jr. published the criterion in 1956 while working at Bell Labs, in a paper titled “A New Interpretation of Information Rate.” The original context was information theory — specifically, how to optimally size bets when receiving imperfect signals. The formula for a binary outcome bet is:
f* = (bp – q) / b
Where f* is the fraction of capital to risk, b is the net odds received on the bet (how much you win per unit risked), p is the probability of winning, and q is the probability of losing (1 – p). For a trading strategy with a win rate of 60% and an average win equal to the average loss (b = 1), Kelly says: f* = (1 × 0.6 – 0.4) / 1 = 0.20. Risk 20% of your account on each trade.
The alternative form used by traders is:
f* = W – (1 – W) / R
Where W is the win rate and R is the ratio of average win to average loss. Same result, different inputs. Both formulas are mathematically equivalent. The question is not whether the formula is right. It is. The question is whether the inputs are right — and for the vast majority of backtested EA strategies, they are not.
What Kelly Requires That Backtests Cannot Provide
The Kelly formula requires the true probability of winning and the true expected payoff ratio. Not the observed probability from a historical sample. Not the backtested win rate from a specific data window. The true underlying probability — the probability that would be realized over an effectively infinite sequence of future trades drawn from the same distribution as the past.
This is a fundamental distinction that the formula’s notation obscures. When you write p = 0.60, you are asserting a fact about the future: that 60% of trades will win going forward. Your backtest tells you that 60% of trades won during the specific historical period you tested. These are not the same thing.
The gap between observed backtest statistics and true underlying parameters arises from three sources that are well documented in the quantitative trading literature. First, sampling error: a 200-trade backtest with a 60% observed win rate has a 95% confidence interval of roughly 53% to 67%. The true win rate could be anywhere in that range. Second, overfitting bias: any strategy that was optimized on historical data has parameters that are more favorable to that data than to future data. The observed win rate in the backtested period is systematically higher than the expected win rate out of sample. Wiecki et al. (2016) demonstrated empirically with 888 live strategies that in-sample Sharpe ratio explained less than 2.5% of variance in out-of-sample performance — a finding that implies the backtest inputs to Kelly are unreliable by default. Third, regime nonstationarity: market conditions change. A strategy’s win rate in a trending regime differs from its win rate in a ranging regime. The backtest win rate is the average across whatever mix of regimes the historical period happened to contain.
MacLean, Thorp, and Ziemba established in their definitive 2011 work “The Kelly Capital Growth Investment Criterion” the quantitative consequence of this input error: a 10% overestimation of expected returns causes approximately 50% overbetting when using full Kelly. If your backtest says your win rate is 60% and your true forward win rate is 54% — a 10% relative overestimation — full Kelly would have you bet approximately 50% more than the mathematically optimal fraction. You are not just betting too much. You are betting dramatically too much while believing you are betting exactly the right amount.
What Overbetting Actually Does to Your Account
The Kelly criterion is derived by maximizing the expected logarithm of wealth, not the expected value of wealth. This distinction is mathematically critical. Maximizing expected value leads to betting your entire bankroll whenever the expected value is positive — which produces maximum expected gain per trade but also certainty of ruin over enough trades. Kelly’s logarithmic objective avoids this by penalizing drawdowns exponentially: losing 50% requires a 100% gain to recover, not a 50% gain. The log utility function captures this asymmetry.
The consequence of overbetting is that you move to the right of the Kelly fraction on the growth rate curve. The growth rate curve has a specific shape: it rises from zero at f = 0, reaches its maximum at f = f*, and then declines, crossing zero again at f = 2f*. Beyond 2f*, the expected growth rate becomes negative — meaning a strategy that is genuinely profitable at the correct Kelly fraction will produce expected losses at double Kelly.
This is not a theoretical curiosity. If your true Kelly fraction is 10% and you are betting 20% because your backtest overestimated your win rate, you are at 2f* — the breakeven point on the growth rate curve. Any further overestimation pushes you into negative expected growth territory with a genuinely profitable strategy. The strategy has edge. Your position sizing destroys it.
Rotando and Thorp (1992) analyzed practical betting scenarios and found that half-Kelly practitioners had a 90% probability of avoiding ruin over long periods under estimation error conditions, compared to approximately 50% for full Kelly practitioners. The asymmetry between overbetting and underbetting is severe and consistent: overbetting beyond Kelly is worse than underbetting by an equivalent amount. Reducing from full Kelly to half Kelly costs roughly 25% of theoretical growth rate in the long run. Overbetting by the same margin destroys growth entirely and introduces meaningful ruin probability. The risk-reward of underbetting is dramatically superior to the risk-reward of overbetting.
The Specific Problem with EA Backtests
EA backtests have a structural problem that makes Kelly input estimation particularly unreliable. The backtest win rate is almost always measured on completed trades at closing prices. The actual forward-looking win rate includes all the trades the strategy would have taken in conditions that did not appear in the backtest window — different volatility regimes, different spread environments, different liquidity conditions, and different correlations between instruments.
For a EURUSD H1 strategy backtested from 2019 to 2024, the historical period contains the COVID volatility spike of 2020, the Fed tightening cycle of 2022, and the low-volatility compression of 2021. These are structurally different regimes with different statistical properties. A strategy with a 62% win rate in 2020–2021 may have a 48% win rate in 2023–2024 conditions. The aggregate backtest win rate of say 57% is an average across these different regimes — and the regime the strategy encounters next may not resemble the average.
The optimization process makes this worse. Most EA developers run at least some parameter optimization before settling on a final configuration. Every optimization run selects for parameters that happened to work on the historical data. The selected parameters are biased toward the historical period by construction. The Deflated Sharpe Ratio framework of Bailey and López de Prado (2012) quantifies how much the reported Sharpe ratio — and by extension, the win rate and profit factor that feed into Kelly — is inflated by the number of parameter combinations tested. A strategy tested across 50 configurations requires a substantially higher observed win rate before the result is statistically distinguishable from favorable random parameter selection.
The practical implication is direct. When you take the win rate from your EA backtest and plug it into the Kelly formula, you are using a number that is biased upward by sampling error, regime selection, and optimization. The full Kelly fraction you calculate is almost certainly higher than the true Kelly fraction for your strategy. You are systematically overbetting before you take a single live trade.
The Correct Approach: Fractional Kelly with Confidence Discounting
Ed Thorp, who used Kelly to beat blackjack and later ran Princeton Newport Partners to 19.1% annual returns over 29 years, did not use full Kelly in practice. His documented approach was to bet a fraction of full Kelly — typically between 25% and 50% — with the fraction calibrated to his confidence in the edge estimate. This is not a departure from Kelly theory. It is Kelly theory correctly applied to the reality of uncertain inputs.
The formal justification comes from the relationship between parameter uncertainty and optimal fraction. If you have complete certainty about your edge, full Kelly is optimal. As uncertainty increases, the optimal fraction decreases. The mathematics of this relationship, developed rigorously in the shrinkage estimation literature (Rising and Wyner 2012, Han et al.), shows that the optimal Kelly fraction under parameter uncertainty is approximately f* multiplied by a shrinkage factor that reflects the ratio of signal to noise in the edge estimate. For a typical EA backtest with 200 to 500 trades, this shrinkage factor is substantially below 1.0.
In practice, a workable framework is to scale the Kelly fraction by your confidence in the edge estimate, expressed as a number between 0 and 1. This confidence estimate should reflect the statistical significance of the backtest result, the number of optimization trials run, the length of the track record, and whether the strategy has been validated out of sample. A strategy with 1,000 trades, no optimization, a statistically significant profit factor, and a confirmed out-of-sample period might warrant a Kelly fraction of 0.5f* to 0.75f*. A strategy with 150 trades, multiple optimization iterations, and no out-of-sample validation might warrant 0.1f* to 0.2f*.
The practical ranges used by professional quant practitioners, as documented in the Kelly criterion literature, converge on 10% to 25% of full Kelly for most real-world applications. Quarter Kelly is the most widely cited starting point. Half Kelly is appropriate when the edge estimate is unusually well-validated. Full Kelly is almost never appropriate in live trading because the certainty it requires about future win rates is never available from backtests.
Calculating the Full Kelly Fraction Correctly First
Before discounting, the Kelly fraction itself needs to be calculated from the right inputs. For a trading strategy, the continuous-return version of Kelly is more appropriate than the binary win-loss version, because trading returns are not binary — they follow a distribution. The continuous Kelly fraction is:
f* = mean return / variance of returns
Where mean return and variance are both computed from the per-trade return distribution. This version handles the full complexity of trading return distributions, including the impact of outlier wins and losses that the binary formula ignores. A strategy where 10% of trades produce 80% of the profits should not be sized the same way as a strategy with uniform trade outcomes, even if both have the same aggregate win rate and average win-loss ratio.
The binary formula (f* = W – (1-W)/R) is an approximation that works reasonably well when the win-loss ratio is stable and the return distribution is approximately symmetric. When the strategy has significant skewness or kurtosis — which is common in EA trading, particularly for strategies that run tight stops and wide targets or that have occasional very large wins — the continuous formula is more accurate and should be preferred.
Computing the continuous Kelly fraction requires the full trade return series, not just the aggregate win rate and average win-loss ratio. This is information that is available from your backtest trade log. If your backtesting platform reports only aggregate statistics, not individual trade returns, you cannot accurately calculate Kelly — and any calculation you make from aggregate statistics alone will carry additional estimation error on top of the estimation error already present in the backtest statistics themselves.
Vince’s Optimal f: Why It Is More Dangerous Than Kelly
Ralph Vince introduced Optimal f in “Portfolio Management Formulas” (1990) as an alternative position sizing methodology. Optimal f is the fraction that maximizes the terminal wealth of the geometric mean system — effectively the same objective as Kelly, computed empirically from the historical trade distribution rather than analytically from win rate and payoff ratio.
The problem with Optimal f is that it is full Kelly applied to the historical sample without any adjustment for uncertainty. It finds the fraction that would have maximized wealth on the specific historical trades that occurred. This is precisely the overfitting problem applied to position sizing: the optimal fraction for the past data is not the optimal fraction for the future. Optimal f applied to a backtest is mathematically equivalent to running full Kelly on the most favorable possible interpretation of your historical edge — which is systematically too large by exactly the margin that the historical data overestimates your true edge.
Traders who use Optimal f from backtests frequently discover in live trading that the drawdowns are substantially larger than the historical simulation predicted. This is not bad luck. It is the consequence of sizing positions at full Kelly on an edge estimate that was biased upward by the historical sample. The solution is the same as for Kelly: apply a confidence discount before using the optimal fraction, scaled to the actual reliability of the edge estimate.
Connecting Edge Confidence to the Kelly Fraction
The framework that emerges from taking the literature seriously is one where the Kelly fraction is not a fixed calculation from backtest statistics — it is a dynamic parameter that reflects the current state of evidence for the strategy’s edge. A strategy that has been running live for 18 months with results consistent with the backtest warrants a higher fraction than the same strategy at launch. A strategy whose live win rate is tracking 5 percentage points below the backtest warrants a lower fraction than the backtest calculation would suggest.
The specific adjustment depends on what kind of validation the strategy has undergone. Out-of-sample walk-forward testing reduces but does not eliminate estimation uncertainty — the out-of-sample period is itself a finite sample. Monte Carlo simulation of the trade sequence reveals whether the observed performance is statistically distinguishable from lucky sequencing. Temporal stability analysis shows whether the win rate is consistent across market regimes or concentrated in specific historical periods. Each validation layer reduces the uncertainty in the edge estimate and therefore justifies a higher Kelly fraction — but no amount of backtesting eliminates the need for a confidence discount entirely, because live market conditions always contain a component of novelty that the historical data did not.
This is the correct way to think about Kelly: not as a formula that tells you how much to bet once you have calculated your win rate, but as a framework that requires you to also quantify how confident you are in that win rate before determining your actual position size. The formula gives you the ceiling — the maximum fraction that would be optimal if your edge estimate were perfect. Your actual fraction should be somewhere below that ceiling, with the distance determined by the quality of the evidence behind the estimate.
Edge Matrix’s composite validation score, which combines 18 statistical tests across Monte Carlo robustness, temporal stability, edge decay, martingale detection, and distributional properties, is designed to provide exactly this kind of calibrated confidence estimate for a backtest result. A strategy scoring 82/100 has passed a significantly more rigorous set of statistical tests than one scoring 44/100. The practical implication for position sizing is direct: the higher the validation score, the closer to full Kelly your position sizing can reasonably be. The lower the score, the further below full Kelly you should operate — not as a matter of caution, but as a matter of correctly applying the Kelly framework to the actual uncertainty in your edge estimate.
Kelly’s formula is not wrong. It is the answer to a specific question: given a known edge, what fraction maximizes long-run growth? Your backtest gives you an estimated edge, not a known one. Treating an estimated edge as a known one and applying full Kelly is not using Kelly correctly. It is using Kelly’s answer to the wrong question — and the consequence, as MacLean, Thorp, and Ziemba showed, is overbetting that is not marginal but potentially catastrophic even for genuinely profitable strategies.