Edge Matrix Quantitative Validation Report — Spectrum Trend Pro EA

+309.3%

Total Return

51.1%

Annual CAGR

22.8%

Max Drawdown

2.2×

Calmar Ratio

65.9%

Win Rate

0.235R

Expectancy

0.87:1

Reward:Risk

3.37

T-Statistic

This system demonstrates a statistically confirmed positive expectancy across 3.42 years of backtest data encompassing 320 closed positions on XAUUSD H4. The strategy achieves 0.87:1 reward-to-risk, operating 12.5 percentage points above its mathematical breakeven threshold of 53.4%. Annualised CAGR of 51.1% relative to 22.8% maximum drawdown yields a Calmar ratio of 2.2×, below the professional benchmark range of 3–5×. Monte Carlo validation across 2,000 block-bootstrap simulations confirms structural consistency under adverse trade sequencing. 18 of 19 validation tests pass. 1 test falls below the 70-point threshold and warrant review.

⚠ SHORT TEST (3.4/5yr for Forex H4)

Section I

Analytical Findings & Observations

F.1

Edge Temporal Decay Strength

The strongest dimension is Temporal Decay (100/100). Rolling expectancy regression slope is positive (normalised +1.7), indicating the edge has strengthened over the backtest horizon. Second-half expectancy exceeds first-half by 56% (ratio 1.56). Profit factor across four quartiles (1.342, 1.648, 1.434, 2.526) trends upward overall (normalised slope +0.58). Win rate trends upward across quartiles (normalised slope +0.13). This test detects whether a strategy's edge is eroding over time — a critical check for curve-fitted systems that perform well historically but deteriorate as market conditions evolve.

F.2

Tail Risk Elevation Finding

CVaR (95%) measures 1.12× the average loss — within acceptable range. The worst 16 trades (5% of sample) average $230.99 against a $206.06 mean loss. CVaR 99%: 1.15× average loss. Tail risk level: MINIMAL. This elevation is partially structural: with a 0.87× RR ratio, the absolute average loss is modest, making tail events appear proportionally larger in ratio terms. Active monitoring of worst-case trade magnitude under live conditions is advisable.

F.3

MC Drawdown Envelope Observation

Block-bootstrap Monte Carlo (2,000 simulations, block size 6, AC lag-1: -0.048) produces a 95th-percentile maximum drawdown of 36.0% — approximately 1.6× the historical 22.8%. P50: 17.2%, P99: 48.8%. The historical sequence sits at the 75th percentile of the simulated distribution, confirming results were not predicated on an unusually favourable trade ordering. Risk management sizing against the MC P95 envelope rather than historical DD is advisable for live deployment.

F.4

Execution Sensitivity Observation

Under 10% execution degradation (wider spreads, adverse fills), expectancy retains 0.61× of its backtest level. At 0.235R base expectancy, the strategy remains profitable under this stress test. Forward testing under broker-accurate spread conditions is standard practice before capital deployment.

Development Considerations

Areas for Further Development

Drawdown Endurance

DD Endurance scored 59/100. The strategy spends 87% of calendar time below its high-water mark — equity is in drawdown for nearly three quarters of the backtest. Longest single episode: 334d 22h 1m, longest recovery: 165d 23h 51m. Recovery speed is strong (median penance 1.44× vs the theoretical 3.0× IID expectation), so the issue is frequency of drawdown entry, not recovery speed. Consider whether the entry filter can be tightened to reduce the number of small losing episodes — these accumulate the underwater time, not the major DD events.

Edge Quality Improvement

Edge Quality scored 74/100. Expectancy of 0.235R is positive but thin — the win rate margin of 12.5% above breakeven (53.4%) leaves limited cushion against execution costs. Repeatability score is 100/100, indicating wins are consistent rather than lottery-driven. The primary lever is tightening entry criteria to filter lower-quality setups, which would reduce trade count but improve expectancy per trade without structural changes to the strategy.

Drawdown Profile Refinement

Drawdown Analysis scored 76/100. Max DD 22.8% with 32 episodes averaging 4.7% depth and 2.8 days per 1% recovery. The lever here depends on where the DD comes from: if the max DD was driven by one bad episode, check its date and identify the market event — a session filter or news blackout would remove it. If DD depth is consistent across episodes, the strategy's base risk per trade is the lever — reducing position size by 20% trades a proportional haircut in return for materially better Calmar.

Section II

Validation Test Results

Temporal

Statistical

Drawdown

Capital

100

Edge

100

Edge

Concentration

Ulcer

Sample

100

Return

Consecutive

100

Cliff

Execution

Holding

Edge

100

Expected

Temporal Stability 73

FAIR — 7/8 profitable, 1 losing

7 of 8 equal calendar periods generated positive returns; 1 period was non-profitable. Return consistency CV of 1.04 indicates high variation between periods — returns are concentrated rather than evenly distributed. This score measures temporal robustness — a strategy that only profits in one or two periods may be regime-dependent rather than exhibiting a repeatable edge.

Statistical Significance 94

Highly significant edge (t=3.37, 99% confidence)

T-statistic of 3.37 exceeds the 99% two-tailed significance threshold of 2.576. (p = 0.0011) Probability of results arising by chance is approximately 0.11%. The edge is statistically real given this 320-trade sample. This test applies a Welch t-test on the profit distribution and requires the mean return to be significantly different from zero.

Drawdown Analysis 76

GOOD — Some sequence dependency detected drawdown (22.8% max, 4.7% avg episode)

Maximum drawdown of 22.8% with an average episode depth of 4.7%. The median recovery speed is 2.8 days per 1% of drawdown. 32 drawdown episodes were detected. No single episode dominates the overall drawdown profile, indicating consistent rather than event-driven risk. This test scores three components: max DD depth (50%), average episode depth (30%), and recovery quality in days per 1% of DD (20%).

Capital Efficiency 90

EXCELLENT — 51.1% annual, Calmar 2.2

Compound annual growth rate of 51.1% against 22.8% maximum drawdown. Calmar ratio of 2.2× falls below the professional benchmark of 3–5×. CAGR is computed using true compound growth (end equity / start equity)^(1/3.42 years), not simple annualisation. Capital efficiency rewards strategies that generate high risk-adjusted returns relative to their worst historical loss.

Edge Temporal Decay 100

STABLE — Edge is consistent with no meaningful decay

Rolling expectancy regression slope is positive (normalised +1.7), indicating the edge has strengthened over the backtest horizon. Second-half expectancy exceeds first-half by 56% (ratio 1.56). Profit factor across four quartiles (1.342, 1.648, 1.434, 2.526) trends upward overall (normalised slope +0.58). Win rate trends upward across quartiles (normalised slope +0.13). This test detects whether a strategy's edge is eroding over time — a critical check for curve-fitted systems that perform well historically but deteriorate as market conditions evolve.

Edge Consistency 100

EXCELLENT — Edge performs consistently across all conditions

Win rate variance across weekdays falls within acceptable bounds. No structurally unprofitable weekday detected. Profit factor log-variance of 0.1548 and day-of-week variance of 7.8877 indicate edge quality does not fluctuate meaningfully by session day. This test checks whether the strategy's edge is consistent across all trading sessions or is heavily dependent on specific days or conditions.

Concentration Risk 87

EXCELLENT — Well distributed

Top 10% of winning trades account for 28.4% of total profit — well within the 30% ideal-diversification threshold. The largest single winner represents 1.5% of total profit, confirming no individual trade disproportionately sustains the overall result. Profit distribution is scored on two components: top-decile share (80%) and single largest winner share (20%). A well-distributed profit profile indicates genuine repeatable edge rather than lottery-dependent returns.

Ulcer Index 82

GOOD — Acceptable drawdown persistence (UI: 5.7%)

Ulcer Index of 5.7% represents minimal cumulative drawdown pain. Max DD: 22.8%, avg DD: 3.66%, time underwater: 71.9%. Unlike maximum drawdown which captures a single worst point, the Ulcer Index integrates both depth and duration of all underwater periods — a UI below 5% indicates drawdowns are shallow, brief, and recover quickly.

Sample Adequacy 88

GOOD — 320 trades over 3.4y - solid validation

320 trades over 3.4 years exceeds the academic minimum of 125 trades. MinTRL (minimum track record length) statistic: 60. Confidence factor applied to all other tests: 1. Sample adequacy is the foundational test — a backtest with insufficient trades cannot produce statistically valid conclusions regardless of how impressive the individual metrics appear.

Return Autocorrelation 100

Returns are independent (AC: -0.048)

Lag-1 autocorrelation of -0.048 (lag-2: -0.039) — no meaningful serial dependence. Returns are effectively independent. No martingale signature or hidden clustering pattern detected. Significant autocorrelation can indicate position-sizing escalation or regime-dependent behaviour that inflates backtest results.

MC DD Stability 93

EXCELLENT — Highly stable under randomization

Under 1,000 permutation shuffles of the exact trade sequence, the 95th-percentile maximum drawdown reaches 31.3% — a 1.4× expansion from the 22.8% historical figure. 99th percentile: 35.1%. A ratio below 2.0× confirms the strategy does not rely on a particularly favourable trade ordering. This test measures whether the backtest drawdown is structurally representative or a statistical artefact of a lucky sequence of trades.

Consecutive Loss 94

MINIMAL — Statistically normal streak behavior

Maximum consecutive losing streak of 5 trades against a statistically expected maximum of 5.4 (ratio 0.93×). Loss clustering ratio of 0.87 — losses are not grouping more frequently than random distribution predicts. Worst streak required approximately 23 average wins to fully recover (damage ratio 6×). This test checks four dimensions: observed vs expected streak length (30%), loss clustering (25%), worst streak damage (25%), and recovery speed (20%).

Cliff Ratio 100

EXCELLENT — Healthy risk profile

95th-percentile loss of $231.99 is 1.29× the average win of $179.75 — a healthy ratio indicating tail losses are not catastrophically larger than typical wins. Average loss: $206.06. Single largest loss ($240.54) is 1.04× above the P95 level — no structural outlier. This test uses the 95th-percentile loss rather than the single largest loss as the primary metric, making the score more robust to one-off broker anomalies while still flagging structural outliers separately.

MC Robustness 77

GOOD — Sequence-independent results confirmed

Block-bootstrap Monte Carlo (2,000 simulations, block size 6 preserving serial structure, AC lag-1: -0.048) produces a survival rate of 100.0% across all simulations. Coefficient of variation: 0.204. MC DD envelope — P50: 17.2%, P95: 36.0%. No position-scaling pattern detected — the strategy applies approximately uniform lot sizing regardless of recent outcomes. Block bootstrap preserves the serial correlation structure of returns (unlike naive IID resampling), producing more realistic stress scenarios.

DD Endurance 59

WEAK — DEMANDING (1.4x penance, 87% underwater)

Median penance ratio of 1.44× substantially outperforms the theoretical IID expectation of 3.0× (Bailey & López de Prado, 2014). A ratio below 1.0 means recovery consistently takes less time than the drawdown formation period — a strong signal of genuine edge. Time spent underwater: 86.8%. Longest DD episode: 334d 22h 1m (26.8% of backtest). Longest recovery: 165d 23h 51m. 32 episodes detected. Scored on four components: penance ratio (35%), longest DD as % of backtest (25%), % time underwater (25%), and recovery consistency CV (15%).

Execution Cost Sensitivity 81

GOOD — Edge sensitive to execution costs

Under a 10% uniform execution degradation scenario (wins reduced 10%, losses increased 10%), per-trade expectancy retains 0.61× of its backtest level. Original expectancy: $48.33 → degraded: $29.46 (39.0% impact). Strategy remains profitable under this stress test. At 0.235R base expectancy, the strategy retains meaningful cushion against real-world execution costs.

Holding Time 93

GOOD — Winners held 1.3x longer than losers

Winners are held 1.3× longer than losers on average (winners: 48.9h, losers: 37.6h). This is a positive pattern — the strategy allows profitable trades to run while cutting losses relatively quickly. Discipline tier: GOOD. Median ratio: 0.84×. At the current 0.235R expectancy, this does not materially impact performance.

Edge Quality 74

FAIR — Solid edge detected

Expectancy of 0.235R per trade reflects a genuine but not exceptional edge. Win rate of 65.9% operates 12.5 percentage points above the mathematical breakeven of 53.4%. Largest win is 3.12× the average win — some concentration in large outlier wins. Edge quality is scored on four dimensions: expectancy (35%), repeatability (30%), win rate margin (15%), and execution decay (20%).

Expected Shortfall 100

Well-controlled tail risk (ES ratio: 1.1x)