A personal financial intelligence system. NLP signals fused with quantitative market models into a single portfolio decision layer.
SYSTEM OVERVIEW
Two parallel tracks, one qualitative and one quantitative, that independently process information and converge into a single decision layer. Neither track alone produces reliable signals. The edge lives in the convergence.
TRACK A
Raw text → structured financial signal. What is the market narrative saying, and how strong is that signal?
TRACK B
Raw market data → pattern recognition. What is price behaviour telling us, and what regime are we in?
FUSION
When the news says X and price does Y, what typically happens next? A learned map of how information flows into price.
MODEL STACK
Each model serves a distinct purpose. They are designed to be built sequentially. Each feeds the next. Do not skip Model 4 to reach the exciting parts.
Fine-tuned FinBERT
Extracts structured signals from financial text: sentiment scores, mentioned tickers, topic classification, and urgency ratings. Not summaries. Machine-readable signal.
WHY THIS ARCHITECTURE
Pre-trained on financial corpora. Produces 90% of the output at 10% of the complexity of a full LLM. Cheaper, faster, debuggable.
OUTPUT
JSON per document: { ticker, sentiment_score, topic, urgency, entities }
XGBoost classifier
Answers the meta-question before any signal is read: what kind of market are we in right now? Momentum strategy in a ranging market destroys returns. This prevents that.
WHY THIS ARCHITECTURE
HMMs assume fixed transition probabilities. Real regimes are driven by macro catalysts that break stationarity entirely. XGBoost is more robust and produces interpretable feature importances.
OUTPUT
Regime label + confidence score daily: trending bull · trending bear · ranging low-vol · ranging high-vol
Temporal Fusion Transformer
Learns temporal patterns in price and volume. Given these quantitative conditions historically, what has happened over the next 1, 5, and 20 days?
WHY THIS ARCHITECTURE
Significantly better than LSTM for multivariate financial time series. Handles variable-length lookbacks and produces interpretable attention weights showing which features matter.
OUTPUT
Probability distribution of price movement over 3 horizons, conditioned on current regime
Rolling covariance + VaR + CVaR
Understands correlation between positions and estimates portfolio-level risk before any position is taken. Stops you from being 'diversified' across assets that crash together.
WHY THIS ARCHITECTURE
Well-understood mathematics, not deep learning. Build it before the optimizer. Risk management bolted on late gets bolted on badly.
OUTPUT
Risk score per position · Max drawdown estimate · Position size ceiling per asset
Markowitz mean-variance + ML expected returns
Given signals from the Fusion Layer and risk constraints from Model 4, computes the optimal allocation across assets. When to rebalance and by how much.
WHY THIS ARCHITECTURE
Markowitz is sensitive to expected return inputs. Small Fusion Layer errors get amplified into extreme allocations. Hard weight constraints from Model 4 are mandatory, not optional.
OUTPUT
Allocation percentages with hard caps · Rebalancing recommendations · Kelly-informed position sizing
DATA SOURCES · ALL FREE TIER DURING DEVELOPMENT
WHY THIS EXISTS
Most retail investors are flying blind. Generic tools, generic signals, no memory of what worked before.
Kairox Vector is my attempt to build infrastructure that compounds. Not a strategy. Not a screener. A system that gets incrementally better informed the longer it runs, because the data it accumulates and the correlations it learns are specific to how I think about markets. Every phase adds a layer of judgment I did not have before.
PHASE 3 OF 9 · REGIME DETECTION · IN PROGRESS