Est. 2024 · Seattle, Washington

A generational
mindset for
adaptive markets.

Old Growth Harbor is a proprietary quantitative research firm. We design reinforcement-learning systems for US equities and ETFs, and trade them against our own capital — patiently, transparently, and with discipline.

Focus: RL portfolio research
Universe: US large-cap equities & ETFs
Capital: Proprietary, in-house

§ 01 · Practice

Research disciplined
by long horizons
and short feedback loops.

01 / Method

Adaptive reinforcement-learning. SAC and PPO variants trained on a fixed 30-name large-cap universe, evaluated across regime-disjoint out-of-sample windows.

Adaptive RL,
regime-aware.

02 / Platform

Quantsys-RL — a config-driven ingest → train → walk-forward → paper-trade pipeline. Shared policy-inference code between backtest and live execution.

One stack,
research to live.

03 / Execution

Production asyncio daemon against Interactive Brokers. Pre-flight integrity checks, idempotent order submission, EOD reconciliation, hard-breach kill switch.

Stewardship,
operationalised.

§ 02 · Research

Featured paper.
April 2026.

Title: Regime-Specific Architectural Advantages in RL Portfolio Optimization
Author: Old Growth Harbor LLC · Research
Published: April 2026
Universe: 30 US large-cap equities, two regime-disjoint OOS windows

A convolutional encoder beats an MLP in a bear window — and trails it in a benign one. A causal regime router stitches both into a single policy.

We trained and compared four SAC/PPO architectural variants on a fixed 30-name large-cap universe across two regime-disjoint out-of-sample windows. The convolutional-encoder variant outperformed a baseline MLP by +0.275 Sharpe in the 2022 bear window (p = 0.073) but trailed by −0.096 in a benign 2024 window — a regime-specific advantage invisible under standard single-window evaluation.

A regime-adaptive routing policy combining both models via a causal SPY 200-day moving-average and VIX labeler achieved a stitched out-of-sample Sharpe of 0.862. The finding argues that the choice between architectures is not a global question, but a local one — conditional on the prevailing market regime.

The training environment is a Gymnasium trading harness with a differential-Sharpe reward augmented by drawdown and turnover penalties. MLflow tracking, SHA-256 manifest snapshot identifiers, and shared policy-inference code between backtest and live execution provide a reproducible bridge from research to production.

A 20-day live paper-trading proving program against Interactive Brokers is currently underway as a third, independent out-of-sample window.

Fig. 02

Regime-disjoint evaluation. Two historical OOS windows, plus a live proving period.

BEAR · 2022

OOS · Window A Drawdown regime · SPY < 200d MA

BENIGN · 2024

OOS · Window B Trend regime · low VIX

LIVE

Paper · 20-day In progress · 2026

2020

2022

2023

2024

2025

2026

Bear regime · SPY < 200d MA Benign regime · low VIX Live paper-trading window

Regime labels are produced causally from a SPY 200-day moving-average and VIX threshold. No future information enters the labeler at decision time.

Out-of-sample Sharpe, by regime.

Table 01 · annualised

Architecture	OOS encoder	2022 · Bear	2024 · Benign	Δ vs. MLP	p
SAC · MLP Baseline · feed-forward encoder	MLP	0.612	1.043	—	—
SAC · Conv Convolutional temporal encoder	Conv1D	+0.275	−0.096	regime-specific	0.073
SAC · Regime router Causal SPY 200d MA + VIX labeler	Conv / MLP	—	—	stitched OOS	—

Fig. 03

Stitched out-of-sample performance of the regime-adaptive routing policy.

Window A · 2022 bear Conv encoder wins by +0.275 Sharpe.

MLP 0.612 Conv 0.887

p = 0.073 · directionally significant under regime-conditional resampling.

Window B · 2024 benign MLP recovers a small lead of 0.096.

MLP 1.043 Conv 0.947

A reminder that the choice of encoder is not a global property of the model.

Stitched · routing policy A causal regime router chooses between them.

0.862

Stitched OOS Sharpe across both regime windows under the SPY-200d / VIX labeler.

The result is not a winner.

It is a structural claim about evaluation. A model declared superior under a single-window OOS regime can be inferior under another. Without regime-disjoint evaluation, this is invisible.

The remedy is composition.

A causal labeler — no future leakage — routes between the two specialists. The stitched policy carries the strengths of each into the window where they apply, and forfeits neither.

Quantsys-RL.
The platform underneath.

01 / Ingest

Config-driven, snapshotted.

Every run is keyed by a SHA-256 manifest snapshot identifier. Data, code, and policy artefacts are addressable and reproducible from any subsequent date.
02 / Train

Gymnasium trading environment.

Differential-Sharpe reward augmented by drawdown and turnover penalties. MLflow tracking across every architectural variant, seed, and regime window.
03 / Walk-forward

Regime-disjoint windows.

Out-of-sample evaluation is performed across windows chosen for regime structure, not chronology alone. Causal labels prevent leakage of future state into the router.
04 / Paper trade

Production daemon, IBKR.

Asyncio daemon with pre-flight integrity checks, pre-trade validation (price collars, notional and delta caps), idempotent order submission, EOD reconciliation, hard-breach kill switch. Validated via chaos drills.

§ 03 · Contact

For institutional
partners and
research correspondents.

Say hello.
We’d love to hear from you.

info@oldgrowthharbor.com

Office: Seattle, Washington
Discipline: Quantitative research
· Proprietary trading
Capital: In-house only.
No outside investors.

A generational mindset for adaptive markets.

Adaptive RL,regime-aware.

One stack,research to live.

Stewardship,operationalised.