Idiosyncratic volatility and correlation forecasting#
Use this notebook to extract a volatility forecast report for the idiosyncratic returns. The notebook also shows how to compute idiosyncratic correlations. Getting these correlations out of the system directly is not available yet.
import datetime as dt
from itertools import combinations_with_replacement
import polars as pl
from bayesline.api.equity import (
ReportSettings,
ExposureSettings,
FactorRiskModelSettings,
ModelConstructionSettings,
ReportSettings,
UniverseSettings,
CategoricalExposureGroupSettings,
ContinuousExposureGroupSettings,
PortfolioHierarchySettings,
IdiosyncraticVolatilityReportSettings,
IdiosyncraticReturnReportSettings,
)
from bayesline.apiclient import BayeslineApiClient
bln = BayeslineApiClient.new_client(
endpoint="https://[ENDPOINT]",
api_key="[API-KEY]",
)
We begin by specifying a standard factor model that we can compute the idiosyncratic returns in reference to.
factorriskmodel_settings = FactorRiskModelSettings(
universe=UniverseSettings(dataset="Bayesline-US-All-1y"),
exposures=ExposureSettings(
exposures=[
ContinuousExposureGroupSettings(hierarchy="market"),
CategoricalExposureGroupSettings(hierarchy="trbc"),
ContinuousExposureGroupSettings(hierarchy="style"),
]
),
modelconstruction=ModelConstructionSettings(
estimation_universe=None,
zero_sum_constraints={"trbc": "mcap_weighted"}
),
)
Idiosyncratic risk calculations need a portfolio. This is necessary because we need to make sure we deliver the risk numbers at the asset level in the right ID space. The portfolio can be entirely synthetic, and the weights don’t really matter. Here we consider a portfolio of six stocks. The date is the earliest date for which we want to compute output.
portfolios_loader = bln.equity.portfolios
uploader = portfolios_loader.uploader
demo_portfolio_dataset = uploader.create_or_replace_dataset("Demo-Portfolio")
df = pl.DataFrame({
"portfolio_id": ["Test-Portfolio"]*6,
"asset_id": [
"02079K107", # Alphabet
"2592345", # Microsoft
"67066G10", # NVIDIA
"30303M102", # Meta
"57636Q104", # Mastercard
"92826C839", # Visa
],
"asset_id_type": ["cusip9", "sedol7", "cusip8", "cusip9", "cusip9", "cusip9"],
"date": [dt.date(2025, 1, 1)]*6, # earliest date to calculate things for
"value": [100]*6 # not important, we just need the unique identifiers
})
demo_portfolio_dataset.fast_commit(df, mode="append")
UploadCommitResult(version=1, committed_names=[])
demo_portfolio_dataset.get_data().collect()
| date | portfolio_id | asset_id | asset_id_type | value |
|---|---|---|---|---|
| date | str | str | str | f32 |
| 2025-01-01 | "Test-Portfolio" | "02079K107" | "cusip9" | 100.0 |
| 2025-01-01 | "Test-Portfolio" | "2592345" | "sedol7" | 100.0 |
| 2025-01-01 | "Test-Portfolio" | "30303M102" | "cusip9" | 100.0 |
| 2025-01-01 | "Test-Portfolio" | "57636Q104" | "cusip9" | 100.0 |
| 2025-01-01 | "Test-Portfolio" | "67066G10" | "cusip8" | 100.0 |
| 2025-01-01 | "Test-Portfolio" | "92826C839" | "cusip9" | 100.0 |
After uploading the portfolio, we put it in a hierarchy of just this portfolio. This is necessary to pass to the report api.
ph_settings = PortfolioHierarchySettings.from_source(
source="Demo-Portfolio",
portfolio_ids=["Test-Portfolio"],
)
ph_loader = bln.equity.portfoliohierarchies
ph_api = ph_loader.load(ph_settings)
We can quickly take a look at the (drifted) portfolio constituents.
ph_api.get(None, None)
| date | portfolio_id | input_asset_id | input_asset_id_type | value | value_bench |
|---|---|---|---|---|---|
| date | str | str | str | f32 | f32 |
| 2025-01-01 | "Test-Portfolio" | "02079K107" | "cusip9" | 100.0 | null |
| 2025-01-01 | "Test-Portfolio" | "30303M102" | "cusip9" | 100.0 | null |
| 2025-01-01 | "Test-Portfolio" | "57636Q104" | "cusip9" | 100.0 | null |
| 2025-01-01 | "Test-Portfolio" | "67066G10" | "cusip8" | 100.0 | null |
| 2025-01-01 | "Test-Portfolio" | "92826C839" | "cusip9" | 100.0 | null |
| … | … | … | … | … | … |
| 2025-08-25 | "Test-Portfolio" | "02079K107" | "cusip9" | 110.090912 | null |
| 2025-08-25 | "Test-Portfolio" | "30303M102" | "cusip9" | 128.86467 | null |
| 2025-08-25 | "Test-Portfolio" | "57636Q104" | "cusip9" | 113.143555 | null |
| 2025-08-25 | "Test-Portfolio" | "67066G10" | "cusip8" | 133.917892 | null |
| 2025-08-25 | "Test-Portfolio" | "92826C839" | "cusip9" | 110.93512 | null |
Getting the idiosyncratic volatility forecasts#
For the idiosyncratic volatility, we can directly query the output. Below we extract a dataframe with the sqrt-diagonal of the idiosyncratic risk matrix. We run with default settings here, but underneath IdiosyncraticVolatilityReportSettings many different options are available.
report_settings = ReportSettings(
report=IdiosyncraticVolatilityReportSettings(
# we can be flexible with the settings here, e.g. half-life
),
risk_model=factorriskmodel_settings,
)
report_engine = bln.equity.portfolioreport.load(
report_settings, hierarchy_ref_or_settings=ph_settings,
)
order = {"date": ["date"], "asset": ["input_asset_id"]}
report = report_engine.get_report(
order, date_start=dt.date(2025, 1, 2), date_end=dt.date(2025, 1, 31)
)
idio_vol_df = (
report.get_data([], expand=("date", "input_asset_id"), value_cols=report.metric_cols)
.rename({"input_asset_id": "asset_id", "IdiosyncraticVolatility": "idio_vol"})
.with_columns(pl.col("date").str.to_date())
)
idio_vol_df
| date | asset_id | idio_vol |
|---|---|---|
| date | str | f32 |
| 2025-01-02 | "02079K107" | 0.245802 |
| 2025-01-02 | "30303M102" | 0.204757 |
| 2025-01-02 | "57636Q104" | 0.145275 |
| 2025-01-02 | "67066G10" | 0.310011 |
| 2025-01-02 | "92826C839" | 0.153002 |
| … | … | … |
| 2025-01-31 | "02079K107" | 0.229656 |
| 2025-01-31 | "30303M102" | 0.229024 |
| 2025-01-31 | "57636Q104" | 0.165487 |
| 2025-01-31 | "67066G10" | 0.441108 |
| 2025-01-31 | "92826C839" | 0.144974 |
Computing the idiosyncratic correlations#
Sometimes it is necessary to allow for density in the idiosyncratic risk matrix. Factor models may not be able to explain co-movement in smaller clusters of highly similar assets. We are working on integrations, but for now it is only possible to manually compute these off-diagonal correlations from the idiosyncratic return time-series as a post-processing step. In the code below, we will extract the idiosyncratic returns, and subsequently compute the correlation matrix for two groups within our portfolio of six assets.
First, we run a very similar report as above to extract the idiosyncratic returns time-series.
report_settings = ReportSettings(
report=IdiosyncraticReturnReportSettings(),
risk_model=factorriskmodel_settings,
)
report_engine = bln.equity.portfolioreport.load(
report_settings, hierarchy_ref_or_settings=ph_settings,
)
order = {"date": ["date"], "asset": ["input_asset_id"]}
report = report_engine.get_report(
order, date_start=dt.date(2025, 1, 2), date_end=dt.date(2025, 1, 31)
)
idio_ret_df = (
report.get_data([], expand=("date", "input_asset_id"), value_cols=report.metric_cols)
.rename({"input_asset_id": "asset_id", "IdiosyncraticReturn": "idio_ret"})
.with_columns(pl.col("date").str.to_date(), pl.col("idio_ret").fill_nan(None))
)
idio_ret_df
| date | asset_id | idio_ret |
|---|---|---|
| date | str | f32 |
| 2025-01-02 | "02079K107" | 0.000638 |
| 2025-01-02 | "30303M102" | 0.022357 |
| 2025-01-02 | "57636Q104" | -0.003709 |
| 2025-01-02 | "67066G10" | 0.023162 |
| 2025-01-02 | "92826C839" | 0.001328 |
| … | … | … |
| 2025-01-31 | "02079K107" | 0.015713 |
| 2025-01-31 | "30303M102" | 0.003797 |
| 2025-01-31 | "57636Q104" | -0.017581 |
| 2025-01-31 | "67066G10" | -0.035644 |
| 2025-01-31 | "92826C839" | 0.00147 |
Next, we define the groups of similar assets. These need to be mutually exclusive. I.e. we cannot have one asset that is part of multiple groups. Not all assets have to be part of a group.
In the example below, we put Alphabet, Microsoft and Meta in a group, and Mastercast and Visa in a separate group. NVIDIA is not part of a group.
groups = [
["02079K107", "2592345", "30303M102"], # Alphabet, Microsoft, Meta
["57636Q104", "92826C839"], # Mastercard, Visa
]
# create a dataframe with all combinations
df_offdiag = pl.DataFrame(
[
(left, right)
for group in groups
for left, right in combinations_with_replacement(group, 2)
],
schema=["asset_id", "asset_id_right"],
orient="row",
)
# just for display, this is in realistic scenarios a very large dataframe
(
df_offdiag.sort("asset_id", "asset_id_right")
.with_columns(pl.lit(1))
.pivot("asset_id_right", index="asset_id", maintain_order=True, sort_columns=True)
)
| asset_id | 02079K107 | 2592345 | 30303M102 | 57636Q104 | 92826C839 |
|---|---|---|---|---|---|
| str | i32 | i32 | i32 | i32 | i32 |
| "02079K107" | 1 | 1 | 1 | null | null |
| "2592345" | null | 1 | 1 | null | null |
| "30303M102" | null | null | 1 | null | null |
| "57636Q104" | null | null | null | 1 | 1 |
| "92826C839" | null | null | null | null | 1 |
# join the time series such that we have each combination that we need to compute
idio_ret_df_joined = (
idio_ret_df.join(df_offdiag, on="asset_id")
.join(idio_ret_df, left_on=("date", "asset_id_right"), right_on=("date", "asset_id"))
)
idio_ret_df_joined
| date | asset_id | idio_ret | asset_id_right | idio_ret_right |
|---|---|---|---|---|
| date | str | f32 | str | f32 |
| 2025-01-02 | "02079K107" | 0.000638 | "02079K107" | 0.000638 |
| 2025-01-02 | "02079K107" | 0.000638 | "30303M102" | 0.022357 |
| 2025-01-02 | "30303M102" | 0.022357 | "30303M102" | 0.022357 |
| 2025-01-02 | "57636Q104" | -0.003709 | "57636Q104" | -0.003709 |
| 2025-01-02 | "57636Q104" | -0.003709 | "92826C839" | 0.001328 |
| … | … | … | … | … |
| 2025-01-31 | "02079K107" | 0.015713 | "30303M102" | 0.003797 |
| 2025-01-31 | "30303M102" | 0.003797 | "30303M102" | 0.003797 |
| 2025-01-31 | "57636Q104" | -0.017581 | "57636Q104" | -0.017581 |
| 2025-01-31 | "57636Q104" | -0.017581 | "92826C839" | 0.00147 |
| 2025-01-31 | "92826C839" | 0.00147 | "92826C839" | 0.00147 |
We compute the covariance matrix first, and then standardize into the correlation matrix. The computation of the covariance matrix relies on computing a rolling mean to correct for autocorrelation, and subsequently an exponentially weighted moving average. We then divide by the standard deviations to obtain the correlations.
# compute the covariance matrix by first using a rolling mean (for overlap),
# and then an exponential weighted moving average (for smoothing)
overlap_window = 5
half_life = 126
idio_vcov_df = (
idio_ret_df_joined
.with_columns(
pl.col("idio_ret").rolling_mean(window_size=overlap_window, min_samples=1).over("asset_id"),
pl.col("idio_ret_right").rolling_mean(window_size=overlap_window, min_samples=1).over("asset_id_right"),
)
.with_columns(
(pl.col("idio_ret") * pl.col("idio_ret_right"))
.ewm_mean(half_life=half_life)
.over(("asset_id", "asset_id_right"))
.alias("idio_vcov")
)
)
idio_vcov_df
| date | asset_id | idio_ret | asset_id_right | idio_ret_right | idio_vcov |
|---|---|---|---|---|---|
| date | str | f32 | str | f32 | f32 |
| 2025-01-02 | "02079K107" | 0.000638 | "02079K107" | 0.000638 | 4.0730e-7 |
| 2025-01-02 | "02079K107" | 0.000638 | "30303M102" | 0.022357 | 0.000014 |
| 2025-01-02 | "30303M102" | 0.022357 | "30303M102" | 0.022357 | 0.0005 |
| 2025-01-02 | "57636Q104" | -0.003709 | "57636Q104" | -0.003709 | 0.000014 |
| 2025-01-02 | "57636Q104" | -0.003709 | "92826C839" | 0.001328 | -0.000005 |
| … | … | … | … | … | … |
| 2025-01-31 | "02079K107" | 0.01522 | "30303M102" | 0.005545 | -0.000002 |
| 2025-01-31 | "30303M102" | 0.010218 | "30303M102" | 0.005138 | 0.000102 |
| 2025-01-31 | "57636Q104" | 0.009262 | "57636Q104" | 0.005794 | 0.000023 |
| 2025-01-31 | "57636Q104" | 0.004519 | "92826C839" | 0.006249 | 0.000022 |
| 2025-01-31 | "92826C839" | 0.006149 | "92826C839" | 0.005329 | 0.000022 |
# to translate the covariance matrix to a correlation matrix,
# we need to select the variance of the idiosyncratic returns
idio_var_df = (
idio_vcov_df.filter(pl.col("asset_id") == pl.col("asset_id_right"))
.select("date", "asset_id", pl.col("idio_vcov").alias("idio_var"))
)
idio_var_df
| date | asset_id | idio_var |
|---|---|---|
| date | str | f32 |
| 2025-01-02 | "02079K107" | 4.0730e-7 |
| 2025-01-02 | "30303M102" | 0.0005 |
| 2025-01-02 | "57636Q104" | 0.000014 |
| 2025-01-02 | "92826C839" | 0.000002 |
| 2025-01-03 | "02079K107" | 0.000003 |
| … | … | … |
| 2025-01-30 | "92826C839" | 0.000022 |
| 2025-01-31 | "02079K107" | 0.00001 |
| 2025-01-31 | "30303M102" | 0.000102 |
| 2025-01-31 | "57636Q104" | 0.000023 |
| 2025-01-31 | "92826C839" | 0.000022 |
# by joining twice and normalizing, we get the correlation matrix
idio_corr_df = (
idio_vcov_df.join(idio_var_df, on=("date", "asset_id"))
.join(idio_var_df, left_on=("date", "asset_id_right"), right_on=("date", "asset_id"))
.select(
"date",
"asset_id",
"asset_id_right",
(pl.col("idio_vcov") / (pl.col("idio_var") * pl.col("idio_var_right")).sqrt()).alias("idio_corr"))
)
idio_corr_df
| date | asset_id | asset_id_right | idio_corr |
|---|---|---|---|
| date | str | str | f32 |
| 2025-01-02 | "02079K107" | "02079K107" | 1.0 |
| 2025-01-02 | "02079K107" | "30303M102" | 1.0 |
| 2025-01-02 | "30303M102" | "30303M102" | 1.0 |
| 2025-01-02 | "57636Q104" | "57636Q104" | 1.0 |
| 2025-01-02 | "57636Q104" | "92826C839" | -1.0 |
| … | … | … | … |
| 2025-01-31 | "02079K107" | "30303M102" | -0.053293 |
| 2025-01-31 | "30303M102" | "30303M102" | 1.0 |
| 2025-01-31 | "57636Q104" | "57636Q104" | 1.0 |
| 2025-01-31 | "57636Q104" | "92826C839" | 0.967835 |
| 2025-01-31 | "92826C839" | "92826C839" | 1.0 |
# for small portfolios, the dataframe is small enough to pivot and display
(
idio_corr_df.pivot("asset_id_right", index=("date", "asset_id"), maintain_order=True, sort_columns=True)
.filter(pl.col("date") > pl.col("date").min())
)
| date | asset_id | 02079K107 | 30303M102 | 57636Q104 | 92826C839 |
|---|---|---|---|---|---|
| date | str | f32 | f32 | f32 | f32 |
| 2025-01-03 | "02079K107" | 1.0 | -0.381608 | null | null |
| 2025-01-03 | "30303M102" | null | 1.0 | null | null |
| 2025-01-03 | "57636Q104" | null | null | 1.0 | 0.153835 |
| 2025-01-03 | "92826C839" | null | null | null | 1.0 |
| 2025-01-06 | "02079K107" | 1.0 | 0.190183 | null | null |
| … | … | … | … | … | … |
| 2025-01-30 | "92826C839" | null | null | null | 1.0 |
| 2025-01-31 | "02079K107" | 1.0 | -0.053293 | null | null |
| 2025-01-31 | "30303M102" | null | 1.0 | null | null |
| 2025-01-31 | "57636Q104" | null | null | 1.0 | 0.967835 |
| 2025-01-31 | "92826C839" | null | null | null | 1.0 |