+468.4%
Total Return
33.9%
Annual CAGR
5.2%
Max Drawdown
6.5×
Calmar Ratio
62.9%
Win Rate
0.306R
Expectancy
1.08:1
Reward:Risk
8.3
T-Statistic
This system demonstrates a statistically confirmed positive expectancy across 5.96 years of backtest data encompassing 1,083 closed positions on MULTI MIXED. The strategy achieves 1.08:1 reward-to-risk, operating 14.7 percentage points above its mathematical breakeven threshold of 48.1%. Annualised CAGR of 33.9% relative to 5.2% maximum drawdown yields a Calmar ratio of 6.5×, significantly exceeding the professional benchmark range of 3–5×. Monte Carlo validation across 2,000 block-bootstrap simulations confirms structural consistency under adverse trade sequencing. 18 of 19 validation tests pass. 1 area of note: holding time asymmetry (losers held 1.9× longer than winners).
⚠ Holding Time Discipline (-5%)
Section I
Analytical Findings & Observations
F.1
Statistical Significance Strength
The strongest dimension is Stat Significance (100/100). T-statistic of 8.3 exceeds the 99% two-tailed significance threshold of 2.576. (p = 0) Probability of results arising by chance is below 0.1%. The edge is statistically real given this 1056-trade sample. This test applies a Welch t-test on the profit distribution and requires the mean return to be significantly different from zero.
F.2
Holding Time Asymmetry Finding
Losers are held 1.88× longer than winners on average (winners: 11h, losers: 20.6h). This loss-aversion pattern extends drawdown duration. Discipline tier: PROBLEMATIC. Median ratio: 2.79×. At the current 0.306R expectancy, this pattern increases drawdown duration and extends underwater periods.
F.3
MC Drawdown Envelope Observation
Block-bootstrap Monte Carlo (2,000 simulations, block size 15, AC lag-1: 0.069) produces a 95th-percentile maximum drawdown of 14.3% — approximately 2.7× the historical 5.2%. P50: 8.4%, P99: 18.0%. The historical sequence sits at the 5th percentile of the simulated distribution, confirming results were not predicated on an unusually favourable trade ordering. Risk management sizing against the MC P95 envelope rather than historical DD is advisable for live deployment.
F.4
Execution Sensitivity Observation
Under 10% execution degradation (wider spreads, adverse fills), expectancy retains 0.66× of its backtest level. At 0.306R base expectancy, the strategy remains profitable under this stress test. Forward testing under broker-accurate spread conditions is standard practice before capital deployment.
Development Considerations
Areas for Further Development
Exit Discipline
Holding Time scored 40/100. Losing trades are held 1.88× longer than winners on average (winners: 11h, losers: 20.6h). This loss-aversion pattern extends drawdown duration and underwater periods. The primary fix: add a time-based stop — if a trade has not reached take-profit within 16.5h (1.5× average winner duration), close at market. This preserves the win structure while preventing losers from accumulating excess holding cost.
Monte Carlo Robustness
MC Robustness scored 74/100. Block-bootstrap CV of 0.104 indicates moderate sequence dependency. MC P95 DD of 14.3% vs historical 5.2% (2.7× expansion). Position sizing should be calibrated against the MC P95 envelope, not the historical DD. At 1% risk per trade, the P95 scenario implies up to 14.0% account drawdown — ensure capital allocation accounts for this rather than the 5.2% historical figure.
Drawdown Endurance
DD Endurance scored 75/100. The strategy spends 60% of calendar time below its high-water mark — equity is in drawdown for nearly three quarters of the backtest. Longest single episode: 220d 19h 0m, longest recovery: 148d 6h 46m. Recovery speed is strong (median penance 1.17× vs the theoretical 3.0× IID expectation), so the issue is frequency of drawdown entry, not recovery speed. Consider whether the entry filter can be tightened to reduce the number of small losing episodes — these accumulate the underwater time, not the major DD events.
Section II
Validation Test Results
92
Temporal
100
Statistical
94
Drawdown
87
Capital
94
Edge
92
Edge
96
Concentration
100
Ulcer
92
Sample
91
Return
81
MC
88
Consecutive
97
Cliff
74
MC
75
DD
86
Execution
40
Holding
76
Edge
82
Expected
Temporal Stability
92
EXCELLENT — All 10 periods profitable
All 10 of 10 equal calendar periods generated positive returns across the backtest horizon. No losing period detected. Return consistency CV of 0.82 confirms profitability is spread evenly, not concentrated in a single regime window. This score measures temporal robustness — a strategy that only profits in one or two periods may be regime-dependent rather than exhibiting a repeatable edge.
Statistical Significance
100
Highly significant edge (t=8.30, 99% confidence)
T-statistic of 8.3 exceeds the 99% two-tailed significance threshold of 2.576. (p = 0) Probability of results arising by chance is below 0.1%. The edge is statistically real given this 1056-trade sample. This test applies a Welch t-test on the profit distribution and requires the mean return to be significantly different from zero.
Drawdown Analysis
94
MINIMAL drawdown (5.2% max, 1.7% avg episode)
Maximum drawdown of 5.2% with an average episode depth of 1.7%. The median recovery speed is 3.1 days per 1% of drawdown. 96 drawdown episodes were detected. No single episode dominates the overall drawdown profile, indicating consistent rather than event-driven risk. This test scores three components: max DD depth (50%), average episode depth (30%), and recovery quality in days per 1% of DD (20%).
Capital Efficiency
87
EXCELLENT — 33.9% annual, Calmar 6.5
Compound annual growth rate of 33.9% against 5.2% maximum drawdown. Calmar ratio of 6.5× significantly exceeds the professional benchmark of 3–5×. CAGR is computed using true compound growth (end equity / start equity)^(1/5.96 years), not simple annualisation. Capital efficiency rewards strategies that generate high risk-adjusted returns relative to their worst historical loss.
Edge Temporal Decay
94
STABLE — Edge is consistent with no meaningful decay
Rolling expectancy regression slope is positive (normalised +0.41), indicating the edge has strengthened over the backtest horizon. Second-half expectancy exceeds first-half by 21% (ratio 1.21). Profit factor across four quartiles (1.778, 1.773, 1.6, 2.124) trends mildly upward (normalised slope +0.14). This test detects whether a strategy's edge is eroding over time — a critical check for curve-fitted systems that perform well historically but deteriorate as market conditions evolve.
Edge Consistency
92
EXCELLENT — Edge performs consistently across all conditions
Win rate variance across weekdays falls within acceptable bounds. No structurally unprofitable weekday detected. Profit factor log-variance of 0.2283 and day-of-week variance of 13.7006 indicate edge quality does not fluctuate meaningfully by session day. This test checks whether the strategy's edge is consistent across all trading sessions or is heavily dependent on specific days or conditions.
Concentration Risk
96
EXCELLENT — Well distributed
Top 10% of winning trades account for 18.8% of total profit — well within the 30% ideal-diversification threshold. The largest single winner represents 0.6% of total profit, confirming no individual trade disproportionately sustains the overall result. Profit distribution is scored on two components: top-decile share (80%) and single largest winner share (20%). A well-distributed profit profile indicates genuine repeatable edge rather than lottery-dependent returns.
Ulcer Index
100
Excellent drawdown profile (UI: 1.4%)
Ulcer Index of 1.4% represents minimal cumulative drawdown pain. Max DD: 5.2%, avg DD: 0.91%, time underwater: 51.8%. Unlike maximum drawdown which captures a single worst point, the Ulcer Index integrates both depth and duration of all underwater periods — a UI below 5% indicates drawdowns are shallow, brief, and recover quickly.
Sample Adequacy
92
GOOD — 1083 trades over 6.0y - solid validation
1083 trades over 6.0 years exceeds the academic minimum of 125 trades. MinTRL (minimum track record length) statistic: 51. Confidence factor applied to all other tests: 1. Sample adequacy is the foundational test — a backtest with insufficient trades cannot produce statistically valid conclusions regardless of how impressive the individual metrics appear.
Return Autocorrelation
91
Returns are independent (AC: 0.058)
Lag-1 autocorrelation of 0.058 (lag-2: -0.027) — no meaningful serial dependence. Returns are effectively independent. No martingale signature or hidden clustering pattern detected. Significant autocorrelation can indicate position-sizing escalation or regime-dependent behaviour that inflates backtest results.
MC DD Stability
81
GOOD — Drawdown well controlled across sequences
Under 1,000 permutation shuffles of the exact trade sequence, the 95th-percentile maximum drawdown reaches 10.3% — a 2.0× expansion from the 5.2% historical figure. 99th percentile: 12.0%. A ratio below 2.0× confirms the strategy does not rely on a particularly favourable trade ordering. This test measures whether the backtest drawdown is structurally representative or a statistical artefact of a lucky sequence of trades.
Consecutive Loss
88
MINIMAL — Statistically normal streak behavior
Maximum consecutive losing streak of 7 trades against a statistically expected maximum of 7.4 (ratio 0.95×). Loss clustering ratio of 1.05 — losses are not grouping more frequently than random distribution predicts. Worst streak required approximately 15 average wins to fully recover (damage ratio 7.2×). This test checks four dimensions: observed vs expected streak length (30%), loss clustering (25%), worst streak damage (25%), and recovery speed (20%).
Cliff Ratio
97
EXCELLENT — Healthy risk profile
95th-percentile loss of $301 is 1.93× the average win of $156.15 — a healthy ratio indicating tail losses are not catastrophically larger than typical wins. Average loss: $145. Single largest loss ($361.58) is 1.2× above the P95 level — no structural outlier. This test uses the 95th-percentile loss rather than the single largest loss as the primary metric, making the score more robust to one-off broker anomalies while still flagging structural outliers separately.
MC Robustness
74
FAIR — Sequence-independent results confirmed
Block-bootstrap Monte Carlo (2,000 simulations, block size 15 preserving serial structure, AC lag-1: 0.069) produces a survival rate of 100.0% across all simulations. Coefficient of variation: 0.104. MC DD envelope — P50: 8.4%, P95: 14.3%. No position-scaling pattern detected — the strategy applies approximately uniform lot sizing regardless of recent outcomes. Block bootstrap preserves the serial correlation structure of returns (unlike naive IID resampling), producing more realistic stress scenarios.
DD Endurance
75
FAIR — MANAGEABLE (1.2x penance, 61% underwater)
Median penance ratio of 1.17× substantially outperforms the theoretical IID expectation of 3.0× (Bailey & López de Prado, 2014). A ratio below 1.0 means recovery consistently takes less time than the drawdown formation period — a strong signal of genuine edge. Time spent underwater: 60.5%. Longest DD episode: 220d 19h 0m (10.2% of backtest). Longest recovery: 148d 6h 46m. 96 episodes detected. Scored on four components: penance ratio (35%), longest DD as % of backtest (25%), % time underwater (25%), and recovery consistency CV (15%).
Execution Cost Sensitivity
86
GOOD — Edge moderately affected by degradation
Under a 10% uniform execution degradation scenario (wins reduced 10%, losses increased 10%), per-trade expectancy retains 0.66× of its backtest level. Original expectancy: $43.25 → degraded: $28.43 (34.3% impact). Strategy remains profitable under this stress test. At 0.306R base expectancy, the strategy retains meaningful cushion against real-world execution costs.
Holding Time
40
BELOW BENCHMARK — Losers held 1.9x longer than winners
Losers are held 1.88× longer than winners on average (winners: 11h, losers: 20.6h). This loss-aversion pattern extends drawdown duration. Discipline tier: PROBLEMATIC. Median ratio: 2.79×. At the current 0.306R expectancy, this pattern increases drawdown duration and extends underwater periods.
Edge Quality
76
GOOD — Solid edge detected
Expectancy of 0.306R per trade reflects a genuine but not exceptional edge. Win rate of 62.9% operates 14.7 percentage points above the mathematical breakeven of 48.1%. Largest win is 4.14× the average win — some concentration in large outlier wins. Edge quality is scored on four dimensions: expectancy (35%), repeatability (30%), win rate margin (15%), and execution decay (20%).
Expected Shortfall
82
GOOD — Moderate tail risk present (ES ratio: 2.1x avg loss)
CVaR (95%) measures 2.1× the average loss — within acceptable range. The worst 53 trades (5% of sample) average $304.02 against a $145 mean loss. CVaR 99%: 2.21× average loss. Tail risk level: MODERATE. This elevation is partially structural: with a 1.08× RR ratio, the absolute average loss is modest, making tail events appear proportionally larger in ratio terms. Active monitoring of worst-case trade magnitude under live conditions is advisable.
Section III
Portfolio Composition
This report evaluates a combined portfolio of the following constituent backtests. All validation metrics above are computed on the combined, chronologically-merged trade stream.
| # | Strategy | Symbol | Timeframe | Trades |
|---|---|---|---|---|
| 1 | Apex EA 7.0 MT5 | GBPUSD | H1 | 235 |
| 2 | Apex EA 7.0 MT5 | USDCAD | H1 | 368 |
| 3 | Apex EA 7.0 MT5 | USDCHF | H1 | 203 |
| 4 | Apex EA 7.0 MT5 | AUDUSD | H1 | 250 |
| Total · 4 strategies | 1,056 |