Factor Covariance Matrix Forecasts#
In this tutorial we are going to create several covariance matrices of factor returns. This covariance matrix follows from a standard factor model. We will also tie out the numbers against a pandas/numpy-based reimplementation. More specifically, we will:
Create a basic risk model.
Extract the factor returns.
Use the covariance matrix forecast report to compute the covariance matrix.
Use pandas to extract the factor volatility and correlation matrix time-series.
Replicate the volatility forecast.
Replicate the correlation forecast.
Throughout this notebook we work with a randomly generated dataset. The results should generalize to real data, but for legal reasons we do not show any real data on our public API. Bayesline clients can run this notebook on real data.
Imports & Setup#
For this tutorial notebook, you will need to import the following packages.
import pandas as pd
import polars as pl
import numpy as np
from bayesline.api.equity import (
ReportSettings,
FactorCovarianceReportSettings,
ExposureSettings,
FactorRiskModelSettings,
ModelConstructionSettings,
ReportSettings,
UniverseSettings,
)
from bayesline.apiclient import BayeslineApiClient
We will also need to have a Bayesline API client configured.
bln = BayeslineApiClient.new_client(
endpoint="https://[ENDPOINT]",
api_key="[API-KEY]",
)
Creating the covariance matrix forecasts#
Let’s first set up a basic risk model and use it to generate the forecasts. We choose to run with mostly default settings. The steps involved are:
Creating the settings of the risk model.
Loading the report model engine.
Running the engine to generate the covariance report.
The first step is creating the risk model settings.
factorriskmodel_settings = FactorRiskModelSettings(
exposures=ExposureSettings(regions=None),
universe=UniverseSettings(dataset="Bayesline-US-All-1y"),
modelconstruction=ModelConstructionSettings(),
)
Next, we create the report engine from the report settings. We run with the defaults here.
report_settings = ReportSettings(
report=FactorCovarianceReportSettings(),
risk_model=factorriskmodel_settings,
)
report_engine = bln.equity.portfolioreport.load(report_settings)
Let’s see what these settings really are by printing them out.
print(report_settings.report.model_dump_json(indent=2))
{
"type": "Factor Covariance report",
"measures": [
{
"type": "FactorCovariance"
}
],
"halflife_factor_vol": 42,
"halflife_factor_adj": null,
"halflife_factor_cor": 126,
"shrink_factor_cor_method": null,
"shrink_factor_cor_length": 1008,
"shrink_factor_cor_standardized": false,
"overlap_factor_vol": 0,
"overlap_factor_vol_halflife_override": null,
"overlap_factor_cor": 0
}
The different settings that jointly make up the covariance matrix are:
halflife_factor_vol The halflife of the factor volatility. The default is a 42-day halflife.
halflife_factor_adj The halflife of the cross-sectional factor volatility adjustment. The default is to not do any adjustment.
halflife_factor_cor The halflife of the factor correlation. The default is a 126-day halflife.
overlap_factor_vol The overlap or Newey-West lags to incluce on the factor volatility forecast. The default is zero, meaning no autocorrelation correction is performed.
overlap_factor_cor The overlap or Newey-West lags to incluce on the factor correlation forecast. The default is zero, meaning no autocorrelation correction is performed.
Now we get the actual time-series of the covariance matrices.
# generate the report data
order = {
"date": ["date"],
"factor": ["factor_group", "factor"],
"factor_col": ["factor_group_col", "factor_col"],
}
report = report_engine.get_report(order=order)
# massage the data into a more usable format
df_report = (
report.get_data(
[],
expand=("date", "factor"),
pivot_cols=("factor_col",),
value_cols=("FactorCovariance",)
)
).with_columns(pl.col("date").cast(pl.Date)) # string to date
# convert to pandas
df_vcov = df_report.to_pandas().set_index(["date", "factor"]).rename(columns=lambda c: c.split("^")[0])
df_vcov
Market | Academic & Educational Services | Basic Materials | Consumer Cyclicals | Consumer Non-Cyclicals | Energy | Financials | Government Activity | Healthcare | Industrials | ... | Real Estate | Technology | Utilities | Dividend | Growth | Leverage | Momentum | Size | Value | Volatility | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | factor | |||||||||||||||||||||
2024-07-11 | Market | 0.138666 | 0.016354 | 0.055118 | 0.051772 | 0.018171 | 0.033746 | 0.015384 | 0.270761 | 0.066521 | 0.051116 | ... | 0.062896 | -0.063362 | 0.041642 | -0.020188 | -0.010864 | 0.028053 | -0.044825 | -0.041976 | 0.026601 | 0.018208 |
Academic & Educational Services | 0.016354 | 0.001929 | 0.006501 | 0.006106 | 0.002143 | 0.003980 | 0.001814 | 0.031934 | 0.007846 | 0.006029 | ... | 0.007418 | -0.007473 | 0.004911 | -0.002381 | -0.001281 | 0.003309 | -0.005287 | -0.004951 | 0.003137 | 0.002148 | |
Basic Materials | 0.055118 | 0.006501 | 0.021909 | 0.020579 | 0.007223 | 0.013414 | 0.006115 | 0.107625 | 0.026441 | 0.020318 | ... | 0.025001 | -0.025186 | 0.016552 | -0.008025 | -0.004318 | 0.011151 | -0.017817 | -0.016685 | 0.010574 | 0.007238 | |
Consumer Cyclicals | 0.051772 | 0.006106 | 0.020579 | 0.019330 | 0.006784 | 0.012600 | 0.005744 | 0.101091 | 0.024836 | 0.019085 | ... | 0.023483 | -0.023657 | 0.015547 | -0.007538 | -0.004056 | 0.010474 | -0.016736 | -0.015672 | 0.009932 | 0.006798 | |
Consumer Non-Cyclicals | 0.018171 | 0.002143 | 0.007223 | 0.006784 | 0.002381 | 0.004422 | 0.002016 | 0.035482 | 0.008717 | 0.006699 | ... | 0.008242 | -0.008303 | 0.005457 | -0.002646 | -0.001424 | 0.003676 | -0.005874 | -0.005501 | 0.003486 | 0.002386 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2025-07-08 | Leverage | 0.000208 | 0.000883 | 0.000224 | 0.000635 | 0.000685 | 0.000136 | -0.000041 | -0.002724 | 0.000265 | 0.000328 | ... | 0.000500 | -0.000457 | 0.000092 | -0.000102 | -0.000024 | 0.000475 | -0.000372 | -0.000133 | 0.000245 | -0.000730 |
Momentum | -0.001627 | 0.001046 | -0.000710 | -0.001791 | -0.000551 | 0.000655 | 0.000510 | 0.001589 | -0.001005 | -0.000337 | ... | 0.000580 | 0.000416 | 0.002197 | -0.000176 | 0.000060 | -0.000372 | 0.002799 | 0.000089 | -0.000734 | 0.000609 | |
Size | 0.001376 | -0.000385 | -0.000092 | -0.000295 | -0.000872 | 0.000137 | 0.000468 | 0.001641 | -0.000730 | 0.000013 | ... | -0.000587 | 0.000225 | -0.000838 | 0.000089 | 0.000123 | -0.000133 | 0.000089 | 0.000538 | -0.000040 | 0.001045 | |
Value | 0.000761 | 0.000126 | 0.000634 | 0.000697 | 0.000207 | 0.000692 | 0.000039 | -0.000847 | 0.000068 | 0.000343 | ... | -0.000274 | -0.000354 | -0.000865 | -0.000037 | -0.000046 | 0.000245 | -0.000734 | -0.000040 | 0.000680 | 0.000043 | |
Volatility | 0.019928 | -0.006335 | -0.001170 | -0.001183 | -0.007117 | -0.000575 | 0.004702 | -0.006853 | -0.006167 | 0.000188 | ... | -0.002735 | 0.000983 | -0.005031 | 0.000119 | 0.000277 | -0.000730 | 0.000609 | 0.001045 | 0.000043 | 0.013796 |
5208 rows × 21 columns
For downstream comparisons, we split these into factor volatilities and correlations.
# calculate actual factor volatilities from vcovs
df_vol = df_vcov.groupby(level="date").apply(
lambda df: pd.Series(np.diag(df) ** 0.5, df.index.droplevel("date"))
)
df_vol.columns.name = None
df_vol.tail()
Market | Academic & Educational Services | Basic Materials | Consumer Cyclicals | Consumer Non-Cyclicals | Energy | Financials | Government Activity | Healthcare | Industrials | ... | Real Estate | Technology | Utilities | Dividend | Growth | Leverage | Momentum | Size | Value | Volatility | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||||||||||
2025-07-01 | 0.208483 | 0.159797 | 0.086144 | 0.081509 | 0.092384 | 0.186383 | 0.056335 | 0.759894 | 0.117661 | 0.051183 | ... | 0.097028 | 0.044640 | 0.117777 | 0.019171 | 0.013853 | 0.022476 | 0.051429 | 0.023902 | 0.026011 | 0.118883 |
2025-07-02 | 0.207833 | 0.166450 | 0.088461 | 0.080930 | 0.091790 | 0.185155 | 0.056033 | 0.753567 | 0.117053 | 0.050761 | ... | 0.096228 | 0.044402 | 0.118081 | 0.019696 | 0.013743 | 0.022290 | 0.051206 | 0.023741 | 0.025995 | 0.119687 |
2025-07-03 | 0.206767 | 0.166880 | 0.088488 | 0.080817 | 0.091278 | 0.183706 | 0.055782 | 0.747276 | 0.116425 | 0.050349 | ... | 0.095430 | 0.044172 | 0.117414 | 0.019781 | 0.013753 | 0.022104 | 0.051063 | 0.023580 | 0.025870 | 0.118949 |
2025-07-07 | 0.206550 | 0.166481 | 0.087749 | 0.080150 | 0.090519 | 0.182276 | 0.055399 | 0.741604 | 0.115696 | 0.050066 | ... | 0.094691 | 0.043853 | 0.116896 | 0.019624 | 0.013645 | 0.021932 | 0.051478 | 0.023384 | 0.025885 | 0.118296 |
2025-07-08 | 0.205368 | 0.165140 | 0.087015 | 0.079619 | 0.090579 | 0.181657 | 0.055160 | 0.735412 | 0.115128 | 0.049969 | ... | 0.094512 | 0.043797 | 0.117521 | 0.019959 | 0.013585 | 0.021798 | 0.052909 | 0.023193 | 0.026077 | 0.117457 |
5 rows × 21 columns
# calculate actual factor correlations from vcovs
df_cor = df_vcov.groupby(level="date").apply(
lambda df: df.droplevel("date")
/ np.outer(np.diag(df) ** 0.5, np.diag(df) ** 0.5)
)
df_cor.tail()
Market | Academic & Educational Services | Basic Materials | Consumer Cyclicals | Consumer Non-Cyclicals | Energy | Financials | Government Activity | Healthcare | Industrials | ... | Real Estate | Technology | Utilities | Dividend | Growth | Leverage | Momentum | Size | Value | Volatility | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | factor | |||||||||||||||||||||
2025-07-08 | Leverage | 0.046430 | 0.245365 | 0.117946 | 0.366173 | 0.346991 | 0.034452 | -0.033962 | -0.169955 | 0.105791 | 0.301447 | ... | 0.242765 | -0.479207 | 0.035737 | -0.233677 | -0.081877 | 1.000000 | -0.322483 | -0.263357 | 0.430742 | -0.285056 |
Momentum | -0.149727 | 0.119716 | -0.154159 | -0.425130 | -0.115010 | 0.068189 | 0.174666 | 0.040834 | -0.165018 | -0.127520 | ... | 0.115889 | 0.179706 | 0.353363 | -0.166688 | 0.083496 | -0.322483 | 1.000000 | 0.072603 | -0.532174 | 0.097972 | |
Size | 0.288918 | -0.100422 | -0.045453 | -0.159948 | -0.415024 | 0.032406 | 0.366078 | 0.096232 | -0.273565 | 0.010928 | ... | -0.267909 | 0.221163 | -0.307440 | 0.191753 | 0.390846 | -0.263357 | 0.072603 | 1.000000 | -0.066403 | 0.383744 | |
Value | 0.142173 | 0.029205 | 0.279307 | 0.335735 | 0.087634 | 0.146042 | 0.027141 | -0.044192 | 0.022658 | 0.263585 | ... | -0.111066 | -0.310323 | -0.282404 | -0.070781 | -0.131239 | 0.430742 | -0.532174 | -0.066403 | 1.000000 | 0.013884 | |
Volatility | 0.826134 | -0.326600 | -0.114482 | -0.126528 | -0.668937 | -0.026949 | 0.725727 | -0.079340 | -0.456035 | 0.032013 | ... | -0.246335 | 0.191103 | -0.364502 | 0.050701 | 0.173511 | -0.285056 | 0.097972 | 0.383744 | 0.013884 | 1.000000 |
5 rows × 21 columns
Manually replicating the covariance forecasts#
We can also estimated the risk model and get the returns directly. From these returns we can construct the covariance forecasts. Bayesline returns dataframes in polars
, but they can be easily converted to pandas
dataframes. We also remove the factor group (market, style, industry, etc.) for convenience.
risk_model = bln.equity.riskmodels.load(factorriskmodel_settings).get()
df_factor_returns = risk_model.fret().to_pandas().set_index("date").rename(columns=lambda c: c.split(".")[1])
df_factor_returns.tail()
Market | Energy | Basic Materials | Industrials | Consumer Cyclicals | Consumer Non-Cyclicals | Financials | Healthcare | Technology | Utilities | ... | Institutions, Associations & Organizations | Government Activity | Academic & Educational Services | Size | Value | Growth | Volatility | Momentum | Dividend | Leverage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||||||||||
2025-07-01 | 0.010428 | -0.010517 | 0.007999 | 0.000665 | 0.014148 | 0.007374 | -0.003812 | 0.000099 | -0.001947 | -0.008221 | ... | 0.0 | -0.049811 | 0.005245 | -0.000770 | 0.004142 | -0.000210 | -0.005483 | -0.012481 | 0.001278 | 0.004278 |
2025-07-02 | 0.010394 | 0.005407 | 0.011215 | 0.000372 | 0.001995 | 0.002794 | 0.002123 | -0.004578 | -0.001691 | -0.008491 | ... | 0.0 | 0.003242 | -0.024866 | -0.000671 | 0.001577 | -0.000194 | 0.010089 | -0.002247 | 0.002515 | -0.000147 |
2025-07-03 | 0.008128 | -0.002950 | -0.005673 | 0.000555 | -0.004651 | -0.003332 | 0.002408 | -0.004408 | 0.001723 | 0.004232 | ... | 0.0 | -0.002009 | -0.012002 | -0.000647 | -0.001069 | -0.000904 | 0.003856 | 0.002629 | -0.001528 | -0.000046 |
2025-07-07 | -0.012177 | 0.003026 | -0.000143 | 0.001810 | -0.000588 | -0.000408 | -0.001482 | -0.003671 | 0.001021 | 0.005076 | ... | 0.0 | 0.014257 | -0.008876 | 0.000077 | -0.001683 | -0.000205 | -0.004388 | 0.004527 | -0.000271 | 0.000365 |
2025-07-08 | 0.007301 | 0.008839 | -0.000022 | -0.002766 | -0.002293 | -0.005926 | -0.002426 | 0.004673 | 0.002541 | -0.009444 | ... | 0.0 | 0.001412 | -0.001998 | -0.000235 | 0.002245 | -0.000594 | 0.002895 | -0.006791 | 0.002167 | -0.000714 |
5 rows × 21 columns
From these returns we can run standard pandas functions to get the EWMAs.
# calculate expected factor volatilities using the ewma
df_vol_tieout = (
pd.DataFrame(df_factor_returns**2)
.ewm(halflife=report_settings.report.halflife_factor_vol)
.mean()
.astype(np.float32)
** 0.5
* 252**0.5
)
df_vol_tieout.tail()
Market | Energy | Basic Materials | Industrials | Consumer Cyclicals | Consumer Non-Cyclicals | Financials | Healthcare | Technology | Utilities | ... | Institutions, Associations & Organizations | Government Activity | Academic & Educational Services | Size | Value | Growth | Volatility | Momentum | Dividend | Leverage | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | |||||||||||||||||||||
2025-07-01 | 0.208483 | 0.186383 | 0.086144 | 0.051183 | 0.081509 | 0.092384 | 0.056335 | 0.117661 | 0.044640 | 0.117777 | ... | 0.0 | 0.759894 | 0.159797 | 0.023902 | 0.026011 | 0.013853 | 0.118883 | 0.051429 | 0.019171 | 0.022476 |
2025-07-02 | 0.207833 | 0.185155 | 0.088461 | 0.050761 | 0.080930 | 0.091790 | 0.056033 | 0.117053 | 0.044402 | 0.118081 | ... | 0.0 | 0.753567 | 0.166450 | 0.023741 | 0.025995 | 0.013743 | 0.119687 | 0.051206 | 0.019696 | 0.022290 |
2025-07-03 | 0.206767 | 0.183706 | 0.088488 | 0.050349 | 0.080817 | 0.091278 | 0.055782 | 0.116425 | 0.044172 | 0.117414 | ... | 0.0 | 0.747276 | 0.166880 | 0.023580 | 0.025870 | 0.013753 | 0.118948 | 0.051063 | 0.019781 | 0.022104 |
2025-07-07 | 0.206550 | 0.182276 | 0.087749 | 0.050066 | 0.080150 | 0.090519 | 0.055399 | 0.115696 | 0.043853 | 0.116896 | ... | 0.0 | 0.741604 | 0.166481 | 0.023384 | 0.025885 | 0.013645 | 0.118296 | 0.051478 | 0.019624 | 0.021932 |
2025-07-08 | 0.205368 | 0.181657 | 0.087015 | 0.049969 | 0.079619 | 0.090579 | 0.055160 | 0.115128 | 0.043797 | 0.117521 | ... | 0.0 | 0.735412 | 0.165140 | 0.023193 | 0.026077 | 0.013585 | 0.117457 | 0.052909 | 0.019959 | 0.021798 |
5 rows × 21 columns
pd.testing.assert_frame_equal(df_vol, df_vol_tieout, check_column_type=False, check_categorical=False, check_like=True)
The correlation are a bit more involved. We need to create the outer products and then run EWMAs on each cell in the outer product matrix.
# calculate the ewma on the outer product (vcov with mean zero)
df_factor_returns_outer = df_factor_returns.groupby("date").apply(
lambda df: pd.DataFrame(np.outer(df, df), df.columns, df.columns)
)
df_factor_returns_outer.index.names = ["date", "factor"]
df_cor_tieout = (
pd.DataFrame(df_factor_returns_outer)
.unstack()
.ewm(halflife=report_settings.report.halflife_factor_cor)
.mean()
.stack(future_stack=True)
.reindex(df_factor_returns_outer.columns, axis=1)
.groupby("date")
.apply(
lambda df: df.droplevel("date")
/ np.outer(np.diag(df) ** 0.5, np.diag(df) ** 0.5)
)
.astype(np.float32)
)
df_cor_tieout
Market | Energy | Basic Materials | Industrials | Consumer Cyclicals | Consumer Non-Cyclicals | Financials | Healthcare | Technology | Utilities | ... | Institutions, Associations & Organizations | Government Activity | Academic & Educational Services | Size | Value | Growth | Volatility | Momentum | Dividend | Leverage | ||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
date | factor | |||||||||||||||||||||
2024-07-11 | Market | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | -1.000000 | 1.000000 | ... | NaN | 1.000000 | 1.000000 | -1.000000 | 1.000000 | -1.000000 | 1.000000 | -1.000000 | -1.000000 | 1.000000 |
Energy | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | -1.000000 | 1.000000 | ... | NaN | 1.000000 | 1.000000 | -1.000000 | 1.000000 | -1.000000 | 1.000000 | -1.000000 | -1.000000 | 1.000000 | |
Basic Materials | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | -1.000000 | 1.000000 | ... | NaN | 1.000000 | 1.000000 | -1.000000 | 1.000000 | -1.000000 | 1.000000 | -1.000000 | -1.000000 | 1.000000 | |
Industrials | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | -1.000000 | 1.000000 | ... | NaN | 1.000000 | 1.000000 | -1.000000 | 1.000000 | -1.000000 | 1.000000 | -1.000000 | -1.000000 | 1.000000 | |
Consumer Cyclicals | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | -1.000000 | 1.000000 | ... | NaN | 1.000000 | 1.000000 | -1.000000 | 1.000000 | -1.000000 | 1.000000 | -1.000000 | -1.000000 | 1.000000 | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
2025-07-08 | Growth | 0.125730 | -0.008989 | -0.111574 | 0.126584 | -0.083716 | -0.259801 | 0.097236 | -0.333627 | 0.306650 | -0.139069 | ... | NaN | 0.054733 | 0.075627 | 0.390846 | -0.131239 | 1.000000 | 0.173511 | 0.083496 | 0.154401 | -0.081877 |
Volatility | 0.826133 | -0.026949 | -0.114482 | 0.032013 | -0.126528 | -0.668937 | 0.725727 | -0.456035 | 0.191103 | -0.364502 | ... | NaN | -0.079340 | -0.326600 | 0.383744 | 0.013884 | 0.173511 | 1.000000 | 0.097973 | 0.050701 | -0.285056 | |
Momentum | -0.149727 | 0.068189 | -0.154159 | -0.127520 | -0.425130 | -0.115010 | 0.174666 | -0.165018 | 0.179706 | 0.353363 | ... | NaN | 0.040834 | 0.119716 | 0.072603 | -0.532173 | 0.083496 | 0.097973 | 1.000000 | -0.166688 | -0.322483 | |
Dividend | -0.015343 | 0.053786 | 0.005147 | -0.243292 | -0.093723 | -0.032744 | -0.307982 | 0.086780 | 0.270403 | -0.010901 | ... | NaN | 0.010815 | -0.255823 | 0.191753 | -0.070781 | 0.154401 | 0.050701 | -0.166688 | 1.000000 | -0.233677 | |
Leverage | 0.046430 | 0.034452 | 0.117946 | 0.301447 | 0.366173 | 0.346992 | -0.033962 | 0.105791 | -0.479207 | 0.035737 | ... | NaN | -0.169955 | 0.245365 | -0.263358 | 0.430742 | -0.081877 | -0.285056 | -0.322483 | -0.233677 | 1.000000 |
5208 rows × 21 columns
pd.testing.assert_frame_equal(df_cor, df_cor_tieout, check_index_type=False, check_categorical=False, check_like=True, atol=1e-6)