The RiskModels Hierarchical Factor Model

Welcome to the methodology wiki. This page explains the math and intuition behind our three-level hierarchical factor model. Whether you're a portfolio manager trying to understand your exposures or a quant looking to replicate the analysis, this guide will walk you through the key concepts.

This page focuses on the mathematics and portfolio interpretation of the model. If you want the engine-design view of time safety, Security Master discipline, and why the hedge outputs are built to remain executable with raw ETFs, read ERM3 Engine Design.

Use this split: ERM3 Engine explains the modeling assumptions and data-engineering choices; Methodology explains the regression cascade, orthogonalization, and hedge-ratio math.


Why This Method Matters

RiskModels is designed to be more than a descriptive analytics layer. The ERM3 engine is built around a few choices that matter for practical quantitative use:

  • Time-safe construction — the engine is designed to avoid common forward-contamination errors such as recycled tickers, snapshot shares, and retroactive universe contraction
  • Security Master discipline — ticker-level outputs sit on top of a point-in-time identity layer that supports symbol changes, classification lookup, and historically defensible shares data
  • Hierarchical structure — market, sector, and subsector are modeled explicitly rather than folded into a single flat factor view
  • Executable hedge ratios — the published hedge outputs are designed to work with liquid raw ETFs at trade time, not only with orthogonalized synthetic factors
  • Adjusted return series — split- and dividend-adjusted returns make the decomposition more economically consistent across long horizons and corporate actions

These design choices do not remove normal model risk, but they help explain why the API is suited to backtests, neutralization workflows, and portfolio diagnostics rather than just surface-level screening.


The Big Picture

What is a Hierarchical Factor Model?

A hierarchical factor model breaks down stock returns into layers of systematic risk, from broad to granular:

  1. Market (L1): How much does the overall market (S&P 500) explain?
  2. Sector (L2): How much additional variance comes from industry sectors (Tech, Financials, Healthcare, etc.)?
  3. Subsector (L3): How much comes from narrower industries within sectors (semiconductors, biotech, etc.)?
  4. Residual: What's left is idiosyncratic risk — the stock-specific component

At each level, the model produces three key metrics:

LabelNameWhat it tells you
HRHedge RatioDollars of ETF to trade per $1 of stock to neutralize that factor
ERExplained RiskPercentage of the stock's variance explained by that factor
RRResidual RiskPortion of variance not explained by factors — the idiosyncratic remainder

Why Three Levels?

  • Three levels align with how institutional investors think: market timing, sector rotation, and stock selection.
  • Too many levels lead to overfitting and unstable estimates.
  • Too few (just market) miss important sectoral dynamics — a tech stock isn't just "market + noise."

Key insight: Each level captures incremental explanatory power not already explained by higher levels, achieved through orthogonalization.


The Three Levels: L1, L2, L3

The Cascade

The core idea: regress, take the residual, regress again. Each level strips out one more layer of systematic risk, and the leftover feeds into the next level.

StepRegressionWhat it captures
L1Broad market (SPY) exposure
L2Sector exposure (incremental to market)
L3Subsector exposure (incremental to L2)

Notation: = stock return, = SPY return, = cleaned sector ETF return, = cleaned subsector ETF return.

Example: If NVDA has and SPY returns +1%, we expect NVDA to move +1.3% from market exposure alone. The leftover goes to L2 (Tech sector, XLK). What remains () goes to L3 (semiconductors, SOXX). The final residual is pure alpha.

Orthogonalization: Why We Clean the Factors

Sector ETFs like XLK aren't independent of the market — XLK has its own market beta. Using raw XLK returns at L2 would double-count the market exposure already captured at L1.

The fix: Before regressing at each level, strip out higher-level exposures using link betas ():

Here is the beta of ETF regressed on ETF (or its cleaned version), precomputed from historical data.

This ensures each captures only the incremental effect of its own level — no double-counting.

Because factors are orthogonalized against one another, published L3 component hedge ratios (l3_sector_hr, l3_subsector_hr in SDK naming; l3_sec_hr, l3_sub_hr on the wire) can occasionally be negative even when the economic story is long the stock. That outcome is a mathematically valid feature of the neutralization construction, not a data bug — and it is why the Python SDK may emit validate="warn" on names like TSLA when sign checks are enabled.


Hedge Ratios: Making It Tradeable

What Are Hedge Ratios?

A Hedge Ratio (HR) tells you how many dollars of an ETF to trade per $1 of stock position to neutralize a specific factor exposure.

HR SignActionMeaning
NegativeShort the ETFHedge out factor exposure
PositiveLong the ETFAdd factor exposure

L1: Market Only

If , short 1 long the stock.

L2: Market + Sector

Sector HR is the direct beta:

The market HR must be adjusted because shorting the sector ETF also implicitly shorts the market:

L3: Market + Sector + Subsector

Subsector HR is the direct beta:

Sector and market HRs are further adjusted for the market and sector exposure embedded in the subsector ETF:

Consistency check: Applied to raw ETF returns, these adjusted HRs satisfy the replication identity — decomposition into factor contributions plus residual reconciles back to the actual stock return. Verified at runtime for every stock, every date.


Explained Risk: Variance Decomposition

What Is Explained Risk (ER)?

Explained Risk (ER) measures what percentage of a stock's return variance comes from factor exposures. It is the R² from the factor regression:

The Additive Property

Because the factors are orthogonalized, ERs add up perfectly:

This is guaranteed by construction and is verified at runtime to within 0.1% as a data-integrity check.

Interpretation

  • Low ER (e.g., below 50%): High idiosyncratic risk — more alpha opportunity or diversifiable risk.
  • High ER (e.g., 85%+): The stock is mostly a leveraged sector bet — systematic risk dominates.
  • If , sector dynamics dominate market timing for this stock.

Putting It All Together: The Replication Equation

The Core Identity

The ERM3 model lets us perfectly decompose any stock's return using only raw ETF returns — no orthogonalization required at trading time:

SymbolDescription
Hedge ratio (exposure) to ETF
Raw total return of ETF — not residualized
Residual idiosyncratic return — very small by construction

This is a mathematical identity. At runtime, we verify it holds to within 0.1% for every stock on every date.

ε vs l3_residual_er — The residual ε in the return equation is a daily return (units: decimal return). It is not directly exposed as a per-day series on GET /ticker-returns. The API field l3_residual_er (wire: l3_res_er) is a variance fraction — the share of total variance not explained by the orthogonalized market, sector, and subsector factors in the rolling L3 window. Use l3_residual_er from /metrics or /ticker-returns for idiosyncratic risk sizing; use the return equation to understand the factor model conceptually.

Why this matters: You can hedge almost any stock or portfolio using only highly liquid ETFs. No custom baskets. No exotic derivatives. Just SPY, sector ETFs, and subsector ETFs.


Institutional Modellers: Multi-Dimensional Factor Cube

For advanced quantitative workflows, the Python SDK supports xarray to work with returns and hedge ratios as a multi-dimensional data structure (Tickers × Dates × Metrics). This enables broadcasted portfolio math without manual DataFrame pivots.

Install with xarray support

Install from PyPI (riskmodels-py).

pip install riskmodels-py[xarray]

Portfolio-level hedge ratio time series

from riskmodels import RiskModelsClient
import xarray as xr

client = RiskModelsClient.from_env()

# Fetch multi-ticker batch as xarray Dataset
tickers = ["NVDA", "AAPL", "MSFT", "META"]
ds = client.get_dataset(tickers, years=2, format="parquet")

# Define holdings weights (aligned on ticker dimension)
weights = xr.DataArray(
    [0.4, 0.3, 0.2, 0.1],
    dims=["ticker"],
    coords={"ticker": tickers}
)

# SDK names (after TICKER_RETURNS_COLUMN_RENAME for /ticker-returns tabular data):
#   l3_market_hr, l3_sector_hr, l3_subsector_hr
# Wire JSON keys:
#   l3_mkt_hr,    l3_sec_hr,    l3_sub_hr
# Broadcast multiply and sum to get portfolio-level L3 market HR time series
portfolio_market_hr = (ds["l3_market_hr"] * weights).sum(dim="ticker")

# Result: 1D time series of holdings-weighted L3 market hedge ratio
print(portfolio_market_hr)

This computes the holdings-weighted mean of per-ticker L3 market hedge ratios as a rolling time series. It is a descriptive aggregate aligned with the portfolio math in the SDK, not a full portfolio optimizer. For details on how the SDK aggregates batch results, see the portfolio_math.py module and .cursorrules.


Worked Example: Walmart (WMT)

Step 1 — Regression Betas

ParameterETFValue
SPY0.50
XLP (Consumer Staples)0.30
PBJ (Food & Beverage)0.20

Step 2 — Link Betas

RelationshipMeaningValue
XLP → SPYConsumer Staples' market beta0.60
PBJ → SPYFood & Beverage's market beta0.40
PBJ → XLPFood & Beverage's sector beta0.70

Step 3 — Build Hedge Ratios (Bottom-Up: L3 → L2 → L1)

L3 — Subsector (PBJ)

L2 — Sector (XLP)

L1 — Market (SPY)

Final Hedge Ratios

LevelETFDirect BetaLink AdjustmentFinal HR
L1 MarketSPY0.50+0.176−0.324
L2 SectorXLP0.30+0.14−0.16
L3 SubsectorPBJ0.20−0.20

Verification — Sample Day

InstrumentReturn
SPY+1.00%
XLP+0.80%
PBJ+1.20%
WMT+1.10%

This +0.408% is pure stock-specific return — idiosyncratic alpha after neutralizing all factor exposures.


Why This Matters for Trading

Direct Hedging

Unlike academic factor models that output abstract "factor loadings," our model gives you actionable hedge ratios executable with liquid ETFs on any brokerage platform.

Tax-Efficient Risk Scaling

Want to reduce tech exposure without selling NVDA and triggering capital gains? Short XLK proportionally. The adjusted hedge ratios ensure you're not accidentally double-hedging the market.

Alpha Measurement

The residual at L3 is your true idiosyncratic return. Positive residuals over time = alpha. Negative = underperformance that can't be attributed to "the market was down."


How We Compare to Traditional Risk Models

FeatureRiskModelsBarra / Axioma
Factor compositionDirectly tradeable liquid ETFs (SPY, XLK, etc.)Synthetic factors (Value, Momentum, PCA-derived)
Hedging executionShort the ETF directlyRequires custom factor-mimicking baskets
Model structureHierarchical, orthogonal per levelMultivariate, hundreds of factors
ResponsivenessShort lookback — responsive to market shiftsLong history — stable but slower to adapt
Primary use caseActive hedging, tactical PMInstitutional reporting, long-term attribution

Glossary

TermDefinition
L1 (Market)First level: broad market (SPY) exposure
L2 (Sector)Second level: sector-specific exposure (e.g., XLK, XLF)
L3 (Subsector)Third level: granular industry exposure (e.g., SOXX, XBI)
β (beta)Sensitivity coefficient: β = Cov(r_s, r_f) / Var(r_f)
HRDollar amount of ETF per $1 of stock to neutralize factor exposure
ERVariance fraction explained by factors: 1 - Var(ε) / Var(r)
λ (link beta)Beta between ETFs at different levels
OrthogonalizationRemoving higher-level exposure from lower-level factors
ε (residual)Idiosyncratic return unexplained by any factor — stock-specific alpha

Additional Resources

Last updated: February 2026


Related