Old Growth Harbor LLC · Pacific Northwest
Est. 2024 · Seattle, Washington

A generational
mindset for
adaptive markets.

Old Growth Harbor is a proprietary quantitative research firm. We design reinforcement-learning systems for US equities and ETFs, and trade them against our own capital — patiently, transparently, and with discipline.

Focus
RL portfolio research
Universe
US large-cap equities & ETFs
Capital
Proprietary, in-house
§ 01 · Practice

Research disciplined
by long horizons
and short feedback loops.

01 / Method

Adaptive reinforcement-learning. SAC and PPO variants trained on a fixed 30-name large-cap universe, evaluated across regime-disjoint out-of-sample windows.

Adaptive RL,
regime-aware.

02 / Platform

Quantsys-RL — a config-driven ingest → train → walk-forward → paper-trade pipeline. Shared policy-inference code between backtest and live execution.

One stack,
research to live.

03 / Execution

Production asyncio daemon against Interactive Brokers. Pre-flight integrity checks, idempotent order submission, EOD reconciliation, hard-breach kill switch.

Stewardship,
operationalised.

§ 02 · Research

Featured paper.
April 2026.

Title
Regime-Specific Architectural Advantages in RL Portfolio Optimization
Author
Old Growth Harbor LLC · Research
Published
April 2026
Universe
30 US large-cap equities, two regime-disjoint OOS windows

A convolutional encoder beats an MLP in a bear window — and trails it in a benign one. A causal regime router stitches both into a single policy.

We trained and compared four SAC/PPO architectural variants on a fixed 30-name large-cap universe across two regime-disjoint out-of-sample windows. The convolutional-encoder variant outperformed a baseline MLP by +0.275 Sharpe in the 2022 bear window (p = 0.073) but trailed by −0.096 in a benign 2024 window — a regime-specific advantage invisible under standard single-window evaluation.

A regime-adaptive routing policy combining both models via a causal SPY 200-day moving-average and VIX labeler achieved a stitched out-of-sample Sharpe of 0.862. The finding argues that the choice between architectures is not a global question, but a local one — conditional on the prevailing market regime.

The training environment is a Gymnasium trading harness with a differential-Sharpe reward augmented by drawdown and turnover penalties. MLflow tracking, SHA-256 manifest snapshot identifiers, and shared policy-inference code between backtest and live execution provide a reproducible bridge from research to production.

A 20-day live paper-trading proving program against Interactive Brokers is currently underway as a third, independent out-of-sample window.

Fig. 02
Regime-disjoint evaluation. Two historical OOS windows, plus a live proving period.
BEAR · 2022
OOS · Window A Drawdown regime · SPY < 200d MA
BENIGN · 2024
OOS · Window B Trend regime · low VIX
LIVE
Paper · 20-day In progress · 2026
2020
2022
2023
2024
2025
2026
Bear regime · SPY < 200d MA Benign regime · low VIX Live paper-trading window

Regime labels are produced causally from a SPY 200-day moving-average and VIX threshold. No future information enters the labeler at decision time.

Out-of-sample Sharpe, by regime.

Table 01 · annualised
Architecture OOS encoder 2022 · Bear 2024 · Benign Δ vs. MLP p
SAC · MLP
Baseline · feed-forward encoder
MLP 0.612 1.043
SAC · Conv
Convolutional temporal encoder
Conv1D +0.275 −0.096 regime-specific 0.073
SAC · Regime router
Causal SPY 200d MA + VIX labeler
Conv / MLP stitched OOS
Fig. 03
Stitched out-of-sample performance of the regime-adaptive routing policy.
Window A · 2022 bear Conv encoder wins by +0.275 Sharpe.
MLP 0.612 Conv 0.887

p = 0.073 · directionally significant under regime-conditional resampling.

Window B · 2024 benign MLP recovers a small lead of 0.096.
MLP 1.043 Conv 0.947

A reminder that the choice of encoder is not a global property of the model.

Stitched · routing policy A causal regime router chooses between them.
0.862

Stitched OOS Sharpe across both regime windows under the SPY-200d / VIX labeler.

The result is not a winner.

It is a structural claim about evaluation. A model declared superior under a single-window OOS regime can be inferior under another. Without regime-disjoint evaluation, this is invisible.

The remedy is composition.

A causal labeler — no future leakage — routes between the two specialists. The stitched policy carries the strengths of each into the window where they apply, and forfeits neither.

Quantsys-RL.
The platform underneath.

  1. 01 / Ingest
    Config-driven, snapshotted.

    Every run is keyed by a SHA-256 manifest snapshot identifier. Data, code, and policy artefacts are addressable and reproducible from any subsequent date.

  2. 02 / Train
    Gymnasium trading environment.

    Differential-Sharpe reward augmented by drawdown and turnover penalties. MLflow tracking across every architectural variant, seed, and regime window.

  3. 03 / Walk-forward
    Regime-disjoint windows.

    Out-of-sample evaluation is performed across windows chosen for regime structure, not chronology alone. Causal labels prevent leakage of future state into the router.

  4. 04 / Paper trade
    Production daemon, IBKR.

    Asyncio daemon with pre-flight integrity checks, pre-trade validation (price collars, notional and delta caps), idempotent order submission, EOD reconciliation, hard-breach kill switch. Validated via chaos drills.

§ 03 · Contact

For institutional
partners and
research correspondents.

Say hello.
We’d love to hear from you.

Office
Seattle, Washington
Discipline
Quantitative research
· Proprietary trading
Capital
In-house only.
No outside investors.