A generational
mindset for
adaptive markets.
Old Growth Harbor is a proprietary quantitative research firm. We design reinforcement-learning systems for US equities and ETFs, and trade them against our own capital — patiently, transparently, and with discipline.
Research disciplined
by long horizons
and short feedback loops.
Adaptive reinforcement-learning. SAC and PPO variants trained on a fixed 30-name large-cap universe, evaluated across regime-disjoint out-of-sample windows.
Adaptive RL,
regime-aware.
Quantsys-RL — a config-driven ingest → train → walk-forward → paper-trade pipeline. Shared policy-inference code between backtest and live execution.
One stack,
research to live.
Production asyncio daemon against Interactive Brokers. Pre-flight integrity checks, idempotent order submission, EOD reconciliation, hard-breach kill switch.
Stewardship,
operationalised.
Featured paper.
April 2026.
A convolutional encoder beats an MLP in a bear window — and trails it in a benign one. A causal regime router stitches both into a single policy.
We trained and compared four SAC/PPO architectural variants on a fixed 30-name large-cap universe across two regime-disjoint out-of-sample windows. The convolutional-encoder variant outperformed a baseline MLP by +0.275 Sharpe in the 2022 bear window (p = 0.073) but trailed by −0.096 in a benign 2024 window — a regime-specific advantage invisible under standard single-window evaluation.
A regime-adaptive routing policy combining both models via a causal SPY 200-day moving-average and VIX labeler achieved a stitched out-of-sample Sharpe of 0.862. The finding argues that the choice between architectures is not a global question, but a local one — conditional on the prevailing market regime.
The training environment is a Gymnasium trading harness with a differential-Sharpe reward augmented by drawdown and turnover penalties. MLflow tracking, SHA-256 manifest snapshot identifiers, and shared policy-inference code between backtest and live execution provide a reproducible bridge from research to production.
A 20-day live paper-trading proving program against Interactive Brokers is currently underway as a third, independent out-of-sample window.
Regime-disjoint evaluation. Two historical OOS windows, plus a live proving period.
Regime labels are produced causally from a SPY 200-day moving-average and VIX threshold. No future information enters the labeler at decision time.
Out-of-sample Sharpe, by regime.
Table 01 · annualised| Architecture | OOS encoder | 2022 · Bear | 2024 · Benign | Δ vs. MLP | p |
|---|---|---|---|---|---|
|
SAC · MLP
Baseline · feed-forward encoder
|
MLP | 0.612 | 1.043 | — | — |
|
SAC · Conv
Convolutional temporal encoder
|
Conv1D | +0.275 | −0.096 | regime-specific | 0.073 |
|
SAC · Regime router
Causal SPY 200d MA + VIX labeler
|
Conv / MLP | — | — | stitched OOS | — |
Stitched out-of-sample performance of the regime-adaptive routing policy.
The result is not a winner.
It is a structural claim about evaluation. A model declared superior under a single-window OOS regime can be inferior under another. Without regime-disjoint evaluation, this is invisible.
The remedy is composition.
A causal labeler — no future leakage — routes between the two specialists. The stitched policy carries the strengths of each into the window where they apply, and forfeits neither.
Quantsys-RL.
The platform underneath.
-
01 / Ingest
Config-driven, snapshotted.
Every run is keyed by a SHA-256 manifest snapshot identifier. Data, code, and policy artefacts are addressable and reproducible from any subsequent date.
-
02 / Train
Gymnasium trading environment.
Differential-Sharpe reward augmented by drawdown and turnover penalties. MLflow tracking across every architectural variant, seed, and regime window.
-
03 / Walk-forward
Regime-disjoint windows.
Out-of-sample evaluation is performed across windows chosen for regime structure, not chronology alone. Causal labels prevent leakage of future state into the router.
-
04 / Paper trade
Production daemon, IBKR.
Asyncio daemon with pre-flight integrity checks, pre-trade validation (price collars, notional and delta caps), idempotent order submission, EOD reconciliation, hard-breach kill switch. Validated via chaos drills.
For institutional
partners and
research correspondents.
Say hello.
We’d love to hear from you.
- Office
- Seattle, Washington
- Discipline
- Quantitative research
· Proprietary trading - Capital
- In-house only.
No outside investors.