# Uploading New Exposures to Existing Dataset and Updating the Risk Dataset

Use this notebook to append new exposure data to an existing upload and update an existing risk dataset.

In [2]:
import datetime as dt
from pathlib import Path

from tqdm import tqdm

from bayesline.apiclient import BayeslineApiClient
from bayesline.api.equity import (
    ExposureSettings, 
    IndustrySettings, 
    RegionSettings,
    UniverseSettings,
)

In [7]:
bln = BayeslineApiClient.new_client(
    endpoint="https://[ENDPOINT]",
    api_key="[API-KEY]",
)

## Updating New Exposure Files

In [3]:
exposure_dir = Path("PATH/TO/EXPOSURES")

In [4]:
exposure_dataset_name = "My-Exposures"

Below gets the exposure uploader for the chosen dataset name `My-Exposures`. This dataset is assummed to be already created since we demonstrate a catch up upload. See the [Uploaders Tutorial](https://docs.bayesline.com/latest/notebooks/tutorial_uploaders.html) for a deep dive into the `Uploaders API`.

In [6]:
exposure_uploader = bln.equity.uploaders.get_data_type("exposures")
uploader = exposure_uploader.get_dataset(dataset=exposure_dataset_name)

Below we list the existing files in the provided folder and filter out all dates for which we already processed files in a previous run.

In [7]:
# list all csv files and group them by year
# expects file pattern "*_YYYY-MM-DD.csv"

files = list(exposure_dir.glob("*.csv"))
file_date_strs = [file.name.split("_")[-1].replace("-", "")[:8] for file in files]

available_dates = [
    dt.date(int(d[:4]), int(d[4:6]), int(d[6:8])) for d in file_date_strs
]

existing_dates = (
    uploader.get_data(columns=["date"], unique=True).collect().to_series().to_list()
)

new_dates = sorted(set(available_dates) - set(existing_dates))

print(f"Got {len(new_dates)} new dates")

Got 24 new dates


In [8]:
files_by_date = {
    dt.date(int(d[:4]), int(d[4:6]), int(d[6:8])): f for d, f in zip(file_date_strs, files)
}

As a next step we iterate over each `csv` file and stage it. Note that for large amounts of files it's much more performant to upload `zip` files instead. See this [recipe](https://docs.bayesline.com/latest/notebooks/recipe_daily_exposure_upload.html) for details.

See the [Uploaders Tutorial](https://docs.bayesline.com/latest/notebooks/tutorial_uploaders.html#staging-data) for more details on the *staging* and *commit* concepts.

In [9]:
for date in tqdm(new_dates):
    file = files_by_date[date]
    result = uploader.stage_file(file)
    assert result.success

  0%|          | 0/24 [00:00<?, ?it/s]

  4%|▍         | 1/24 [00:00<00:17,  1.33it/s]

  8%|▊         | 2/24 [00:01<00:17,  1.29it/s]

 12%|█▎        | 3/24 [00:02<00:15,  1.31it/s]

 17%|█▋        | 4/24 [00:03<00:15,  1.30it/s]

 21%|██        | 5/24 [00:03<00:14,  1.31it/s]

 25%|██▌       | 6/24 [00:04<00:13,  1.33it/s]

 29%|██▉       | 7/24 [00:05<00:12,  1.35it/s]

 33%|███▎      | 8/24 [00:06<00:11,  1.34it/s]

 38%|███▊      | 9/24 [00:06<00:11,  1.34it/s]

 42%|████▏     | 10/24 [00:07<00:10,  1.36it/s]

 46%|████▌     | 11/24 [00:08<00:09,  1.35it/s]

 50%|█████     | 12/24 [00:09<00:09,  1.33it/s]

 54%|█████▍    | 13/24 [00:09<00:08,  1.28it/s]

 58%|█████▊    | 14/24 [00:10<00:07,  1.29it/s]

 62%|██████▎   | 15/24 [00:11<00:07,  1.28it/s]

 67%|██████▋   | 16/24 [00:12<00:06,  1.29it/s]

 71%|███████   | 17/24 [00:12<00:05,  1.28it/s]

 75%|███████▌  | 18/24 [00:13<00:04,  1.29it/s]

 79%|███████▉  | 19/24 [00:14<00:03,  1.26it/s]

 83%|████████▎ | 20/24 [00:15<00:03,  1.29it/s]

 88%|████████▊ | 21/24 [00:16<00:02,  1.32it/s]

 92%|█████████▏| 22/24 [00:16<00:01,  1.30it/s]

 96%|█████████▌| 23/24 [00:17<00:00,  1.29it/s]

100%|██████████| 24/24 [00:18<00:00,  1.31it/s]

100%|██████████| 24/24 [00:18<00:00,  1.31it/s]




In [10]:
uploader.commit(mode="append")

UploadCommitResult(version=2, committed_names=['exposures_2025-06-11', 'exposures_2025-06-01', 'exposures_2025-06-09', 'exposures_2025-06-02', 'exposures_2025-06-16', 'exposures_2025-06-10', 'exposures_2025-06-03', 'exposures_2025-06-15', 'exposures_2025-06-07', 'exposures_2025-06-12', 'exposures_2025-06-21', 'exposures_2025-06-14', 'exposures_2025-06-22', 'exposures_2025-06-17', 'exposures_2025-06-04', 'exposures_2025-06-20', 'exposures_2025-06-24', 'exposures_2025-06-05', 'exposures_2025-06-18', 'exposures_2025-06-13', 'exposures_2025-06-23', 'exposures_2025-06-06', 'exposures_2025-06-08', 'exposures_2025-06-19'])

In [11]:
uploader.get_data_detail_summary()

date,n_assets,min_exposure,max_exposure,mean_exposure,median_exposure,std_exposure
date,u32,f32,f32,f32,f32,f32
2025-05-01,35804,-4.0625,4.09375,0.241719,0.49707,1.013115
2025-05-02,35798,-4.0625,4.09375,0.241691,0.497559,1.013026
2025-05-03,35793,-4.0625,4.09375,0.2417,0.497559,1.012994
2025-05-04,35793,-4.0625,4.09375,0.241718,0.497559,1.013005
2025-05-05,35796,-4.0625,4.09375,0.241675,0.49707,1.012967
…,…,…,…,…,…,…
2025-06-20,35686,-4.042969,4.113281,0.243499,0.494873,1.011643
2025-06-21,35650,-4.042969,4.113281,0.243514,0.495117,1.011587
2025-06-22,35650,-4.042969,4.113281,0.243508,0.495117,1.011592
2025-06-23,35656,-4.042969,4.113281,0.243441,0.495361,1.011482


## Updating the Risk Dataset

To bring the newly uploaded exposures into the pre-existing dataset we need to update it. This pulls the most recent version for all referenced datasets and re-creates the risk dataset.

In [12]:
risk_dataset_name = "My-Risk-Dataset"

In [13]:
risk_datasets = bln.equity.riskdatasets
risk_dataset = risk_datasets.load(risk_dataset_name)

In [14]:
risk_dataset.update()

RiskDatasetUpdateResult()

In [15]:
risk_dataset.describe()

RiskDatasetProperties(calendar_settings_menu=CalendarSettingsMenu(exchanges=['ARCX', 'BVCA', 'BVMF', 'DIFX', 'DSMD', 'ETFP', 'FRAB', 'HSTC', 'JBUL', 'PFTS', 'ROCO', 'SHSC', 'SZSC', 'WBDM', 'XADS', 'XAMM', 'XAMS', 'XASE', 'XASX', 'XATH', 'XBAH', 'XBEL', 'XBEY', 'XBKF', 'XBKK', 'XBOG', 'XBOM', 'XBOS', 'XBRA', 'XBRU', 'XBRV', 'XBUD', 'XBUE', 'XCAI', 'XCAN', 'XCAS', 'XCSE', 'XCYS', 'XDUB', 'XEQY', 'XETB', 'XHEL', 'XHKG', 'XHNX', 'XICE', 'XIDX', 'XJAM', 'XJAS', 'XJSE', 'XKAR', 'XKLS', 'XKOS', 'XKRX', 'XKUW', 'XLIM', 'XLIS', 'XLIT', 'XLJU', 'XLON', 'XLUX', 'XMAD', 'XMAL', 'XMAU', 'XMEX', 'XMUS', 'XNAI', 'XNAM', 'XNAS', 'XNCM', 'XNSA', 'XNSE', 'XNYS', 'XNZE', 'XOSL', 'XPAE', 'XPAR', 'XPHS', 'XPRM', 'XPSX', 'XQUI', 'XRIS', 'XSAU', 'XSEC', 'XSES', 'XSGO', 'XSHE', 'XSHG', 'XSSC', 'XSTC', 'XSTO', 'XSWX', 'XTAE', 'XTAI', 'XTAL', 'XTKS', 'XTSE', 'XTSX', 'XTUN', 'XWAR', 'XZAG', 'XZIM']), universe_settings_menu=UniverseSettingsMenu(id_types=['bayesid'], exchanges=['ARCX', 'BVCA', 'BVMF', 'DIFX', 'DSM

In [16]:
exposures_api = bln.equity.exposures.load(
    ExposureSettings(
        industries=None,
        regions=None,
    )
)

In [17]:
# note that the industry and region hierarchy names tie out with the factor groups we specified above

df = exposures_api.get(
    UniverseSettings(
        dataset=risk_dataset_name, 
        industry=IndustrySettings(hierarchy="industry", include="All"),
        region=RegionSettings(hierarchy="region", include="All")
    )
)

df.tail()

date,bayesid,market.market.Market,style.style.Dividend,style.style.Growth,style.style.Leverage,style.style.Momentum,style.style.Size,style.style.Value,style.style.Volatility
date,str,f32,f32,f32,f32,f32,f32,f32,f32
2025-06-24,"""ZSPC""",1.0,-0.597168,0.366211,-0.072388,-0.144531,-1.766602,-1.714844,1.740234
2025-06-24,"""ZTR""",1.0,1.834961,-0.287354,-0.330566,1.40918,-0.745605,1.161133,-1.368164
2025-06-24,"""ZUMZ""",1.0,-1.166016,-2.351562,-0.849609,-1.360352,-0.789551,1.0625,0.803711
2025-06-24,"""ZVIA""",1.0,-0.050262,-1.299805,-1.982422,1.37793,-1.433594,-0.694824,1.260742
2025-06-24,"""ZVVT""",1.0,-0.501465,-0.906738,1.451172,0.598145,-1.37207,-1.271484,2.162109
