Why this role exists
We need someone who can build and own the research engine—fast, reproducible pipelines that turn raw data into features, run leakage‑safe backtests with real costs, and ship robust signals to live.
What you’ll build (engineering-first)
-
Research runtime & backtester: event‑driven core that is vectorized, profileable, and testable (purged CV, walk‑forward, costs/slippage, turnover/capacity).
-
Data plumbing: reliable ingestion and validation (schema checks, late/dirty data handling, idempotent jobs), dataset versioning, and columnar storage that makes experiments fast.
-
Experiment infra: one‑command runs, seed control, artifact tracking, CI checks, reproducibility guarantees.
-
Prod handoff: packaging signals with interfaces, telemetry, and alerts; safe rollout/rollback; performance + drift dashboards.
Day‑to‑day
- Prototype signals in Python, profile and remove bottlenecks (Numba/Polars/C++/Rust where it counts).
- Write deterministic tests (unit + property‑based) around feature lags, fills, and cost models.
- Make the backtest faster and safer every week (tail latencies, memory, numerical stability).
- Partner with infra to get research code to staging → live with metrics and SLOs.
Must‑haves
- Strong Python with vectorization/Numba (or willingness to learn fast); practical SQL.
- You’ve owned a data/ML/quant system end‑to‑end (ingest → compute → tests → deploy).
- You can explain and implement purged/embargoed CV, leakage guards, and transaction cost modeling.
- Comfort with containers (Docker) and CI; you treat reproducibility as a feature.