{
  "cells": [
    {
      "cell_type": "code",
      "execution_count": 1,
      "metadata": {},
      "outputs": [],
      "source": [
        "\n",
        "# I am tagged as 'remove-input' and 'remove-output'\n",
        "import warnings\n",
        "warnings.simplefilter(action='ignore', category=FutureWarning)\n",
        "warnings.simplefilter(action='ignore', category=RuntimeWarning)\n",
        "warnings.simplefilter(action='ignore', category=UserWarning)\n",
        "\n",
        "import sys\n",
        "sys.path.append('../')\n",
        "from notebook_env import get_bln\n",
        "bln = get_bln()\n",
        "# below cell is marked as 'skip-execution'"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "# Data Uploaders\n",
        "\n",
        "In this tutorial we are going to demonstrate the usage of the Bayesline Uploaders API. The Uploaders API provides a generalized mechanism to bring different types of data into the Bayesline ecosystem.\n",
        "\n",
        "Specifically, we will introduce and explore:\n",
        "* *Data Types*\n",
        "* *Datasets*\n",
        "* *Schemas* and *Parsers*\n",
        "* The *staging* concept\n",
        "* The *commit* concept\n",
        "* *Staging* Validation\n",
        "* Data filtering and downloading\n",
        "* Housekeeping"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Imports & Setup\n",
        "\n",
        "For this tutorial notebook, you will need to import the following packages."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 2,
      "metadata": {},
      "outputs": [],
      "source": [
        "import tempfile\n",
        "from pathlib import Path\n",
        "\n",
        "import polars as pl\n",
        "\n",
        "from bayesline.apiclient import BayeslineApiClient"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We will also need to have a Bayesline API client configured."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "tags": [
          "skip-execution"
        ]
      },
      "outputs": [],
      "source": [
        "bln = BayeslineApiClient.new_client(\n",
        "    endpoint=\"https://[ENDPOINT]\",\n",
        "    api_key=\"[API-KEY]\",\n",
        ")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The main entrypoint for the Uploaders API sits on `bln.equity.uploaders`. All upload functionality can be reached from here on out.\n",
        "\n",
        "See here for relevant docs:\n",
        "* [Uploaders API Summary](https://docs.bayesline.com/0.12.1/_autosummary/bayesline.api.equity.UploadersApi.html)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 3,
      "metadata": {},
      "outputs": [],
      "source": [
        "uploaders = bln.equity.uploaders"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Data Types\n",
        "\n",
        "A *data type* distinguishes distinct types of data that can be brought into the Bayesline ecosystem and are pre-configured by Bayesline. \n",
        "These include *portfolio holdings*, *factor exposures*, etc. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 4,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "['exposures', 'factors', 'hierarchies', 'portfolios']"
            ]
          },
          "execution_count": 4,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "uploaders.get_data_types()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can obtain a specific uploader for a *data type*. In this tutorial, we will be working with the *exposure uploader*, but all other uploaders operate analogously.\n",
        "\n",
        "The `get_data_type` method will return a `DataTypeUploaderApi` instance which distinguishes the concept of datasets (see below).\n",
        "\n",
        "See here for relevant docs:\n",
        "* [Data Type Uploader API Summary](https://docs.bayesline.com/0.12.1/_autosummary/bayesline.api.equity.DataTypeUploaderApi.html)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 5,
      "metadata": {},
      "outputs": [],
      "source": [
        "exposure_uploader = uploaders.get_data_type(\"exposures\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Datasets\n",
        "\n",
        "For each data type (e.g. *exposures*) we can create isolated datasets. For instance, we might want to upload different sets of exposures. This can be achieved by creating a *dataset*. We can always retrieve existing datasets using the `get_datasets` method. Since we haven't created any datasets yet, this will be empty."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 6,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[]"
            ]
          },
          "execution_count": 6,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "exposure_uploader.get_datasets()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now, we start by creating a new dataset `\"tutorial\"`.\n",
        "\n",
        "See here for relevant docs:\n",
        "* [Uploader API Summary](https://docs.bayesline.com/0.12.1/_autosummary/bayesline.api.equity.UploaderApi.html)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 7,
      "metadata": {},
      "outputs": [],
      "source": [
        "dataset = exposure_uploader.create_dataset(\"tutorial\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Schemas and Parsers\n",
        "\n",
        "Every data type comes with its own dataframe schema. Every uploaded dataframe will be converted into this schema to ensure a uniform way to view the data for a specific data type."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 8,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'date': Date,\n",
              " 'asset_id': String,\n",
              " 'asset_id_type': String,\n",
              " 'factor_group': String,\n",
              " 'factor': String,\n",
              " 'exposure': Float32}"
            ]
          },
          "execution_count": 8,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_schema()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We may have input data in a different format than what the exposures data type declares as its schema (e.g. a wide format). \n",
        "We can either convert it ourselves or use one of the predefined input data parsers. \n",
        "\n",
        "A *parser*: \n",
        "* Is defined for an input format and will convert it to the schema that the uploader expects.\n",
        "* Will add operations such as null-filtering.\n",
        "* Will record error messages if a given input cannot be parsed.\n",
        "* Provides access to example dataframes for the expected input.\n",
        "* Will ensure that the dataframe is valid if the parsing succeeds."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 9,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "['Long-Format', 'Wide-Format']"
            ]
          },
          "execution_count": 9,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_parser_names()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "For demonstration, we will use the `Wide-Format` parser. When uploading dataframes, we can simply pass the name of the parser."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 10,
      "metadata": {},
      "outputs": [],
      "source": [
        "parser = dataset.get_parser(\"Wide-Format\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 11,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (3, 9)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>asset_id</th><th>asset_id_type</th><th>style^momentum_6</th><th>style^momentum_12</th><th>style^growth</th><th>market^market</th><th>industry^consumer</th><th>industry^tech</th></tr><tr><td>date</td><td>str</td><td>str</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>-0.3</td><td>-0.2</td><td>1.2</td><td>1.0</td><td>null</td><td>1.0</td></tr><tr><td>2025-01-06</td><td>&quot;AAPL&quot;</td><td>&quot;cusip9&quot;</td><td>0.1</td><td>0.5</td><td>1.1</td><td>1.0</td><td>1.0</td><td>null</td></tr><tr><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>-0.28</td><td>-0.19</td><td>1.21</td><td>1.0</td><td>null</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (3, 9)\n",
              "┌───────────┬──────────┬───────────┬───────────┬───┬───────────┬───────────┬───────────┬───────────┐\n",
              "│ date      ┆ asset_id ┆ asset_id_ ┆ style^mom ┆ … ┆ style^gro ┆ market^ma ┆ industry^ ┆ industry^ │\n",
              "│ ---       ┆ ---      ┆ type      ┆ entum_6   ┆   ┆ wth       ┆ rket      ┆ consumer  ┆ tech      │\n",
              "│ date      ┆ str      ┆ ---       ┆ ---       ┆   ┆ ---       ┆ ---       ┆ ---       ┆ ---       │\n",
              "│           ┆          ┆ str       ┆ f64       ┆   ┆ f64       ┆ f64       ┆ f64       ┆ f64       │\n",
              "╞═══════════╪══════════╪═══════════╪═══════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡\n",
              "│ 2025-01-0 ┆ GOOG     ┆ cusip9    ┆ -0.3      ┆ … ┆ 1.2       ┆ 1.0       ┆ null      ┆ 1.0       │\n",
              "│ 6         ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
              "│ 2025-01-0 ┆ AAPL     ┆ cusip9    ┆ 0.1       ┆ … ┆ 1.1       ┆ 1.0       ┆ 1.0       ┆ null      │\n",
              "│ 6         ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
              "│ 2025-01-0 ┆ GOOG     ┆ cusip9    ┆ -0.28     ┆ … ┆ 1.21      ┆ 1.0       ┆ null      ┆ 1.0       │\n",
              "│ 7         ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
              "└───────────┴──────────┴───────────┴───────────┴───┴───────────┴───────────┴───────────┴───────────┘"
            ]
          },
          "execution_count": 11,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "example_df = parser.get_examples()[0]\n",
        "example_df"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Before running the parser, we can check if the data can be successfully parsed with the `can_handle` method."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 12,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadParserResult(parser='Wide-Format', success=True, messages=[])"
            ]
          },
          "execution_count": 12,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "parser.can_handle(example_df)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 13,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "(shape: (15, 6)\n",
              " ┌────────────┬──────────┬───────────────┬──────────────┬─────────────┬──────────┐\n",
              " │ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure │\n",
              " │ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---      │\n",
              " │ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32      │\n",
              " ╞════════════╪══════════╪═══════════════╪══════════════╪═════════════╪══════════╡\n",
              " │ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.3     │\n",
              " │ 2025-01-06 ┆ AAPL     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ 0.1      │\n",
              " │ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.28    │\n",
              " │ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.2     │\n",
              " │ 2025-01-06 ┆ AAPL     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ 0.5      │\n",
              " │ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …        │\n",
              " │ 2025-01-06 ┆ AAPL     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0      │\n",
              " │ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0      │\n",
              " │ 2025-01-06 ┆ AAPL     ┆ cusip9        ┆ industry     ┆ consumer    ┆ 1.0      │\n",
              " │ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0      │\n",
              " │ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0      │\n",
              " └────────────┴──────────┴───────────────┴──────────────┴─────────────┴──────────┘,\n",
              " UploadParserResult(parser='Wide-Format', success=True, messages=[]))"
            ]
          },
          "execution_count": 13,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "parser.parse(example_df)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Staging Data\n",
        "\n",
        "*Staging* takes an input dataframe (or file), parses it, and keeps it in a separate area (stage). We can repeat this process of staging multiple times to stage multiple files (e.g. if we have daily files). The staging area can then be *committed* which concatenates all staged dataframes and writes them to versioned storage.\n",
        "\n",
        "### Adding to the Staging Area\n",
        "We use the example wide dataframe for staging. We define a name `example-1` to be able to tell the staged dataframes apart later on. We also specify a concrete parser we want to use. Note that the parser can be left blank in which case all available parsers will be tried and the first succeeding parser will be chosen."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 14,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadStagingResult(name='example-1', timestamp=datetime.datetime(2026, 4, 29, 20, 15, 56, 198537, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])])"
            ]
          },
          "execution_count": 14,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.stage_df(name=\"example-1\", df=example_df, parser=\"Wide-Format\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Below we're adding a second dataframe for demonstration purposes. For this we use the existing example dataframe and roll the dates by one week."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 15,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (3, 9)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>asset_id</th><th>asset_id_type</th><th>style^momentum_6</th><th>style^momentum_12</th><th>style^growth</th><th>market^market</th><th>industry^consumer</th><th>industry^tech</th></tr><tr><td>date</td><td>str</td><td>str</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>-0.3</td><td>-0.2</td><td>1.2</td><td>1.0</td><td>null</td><td>1.0</td></tr><tr><td>2025-01-13</td><td>&quot;AAPL&quot;</td><td>&quot;cusip9&quot;</td><td>0.1</td><td>0.5</td><td>1.1</td><td>1.0</td><td>1.0</td><td>null</td></tr><tr><td>2025-01-14</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>-0.28</td><td>-0.19</td><td>1.21</td><td>1.0</td><td>null</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (3, 9)\n",
              "┌───────────┬──────────┬───────────┬───────────┬───┬───────────┬───────────┬───────────┬───────────┐\n",
              "│ date      ┆ asset_id ┆ asset_id_ ┆ style^mom ┆ … ┆ style^gro ┆ market^ma ┆ industry^ ┆ industry^ │\n",
              "│ ---       ┆ ---      ┆ type      ┆ entum_6   ┆   ┆ wth       ┆ rket      ┆ consumer  ┆ tech      │\n",
              "│ date      ┆ str      ┆ ---       ┆ ---       ┆   ┆ ---       ┆ ---       ┆ ---       ┆ ---       │\n",
              "│           ┆          ┆ str       ┆ f64       ┆   ┆ f64       ┆ f64       ┆ f64       ┆ f64       │\n",
              "╞═══════════╪══════════╪═══════════╪═══════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡\n",
              "│ 2025-01-1 ┆ GOOG     ┆ cusip9    ┆ -0.3      ┆ … ┆ 1.2       ┆ 1.0       ┆ null      ┆ 1.0       │\n",
              "│ 3         ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
              "│ 2025-01-1 ┆ AAPL     ┆ cusip9    ┆ 0.1       ┆ … ┆ 1.1       ┆ 1.0       ┆ 1.0       ┆ null      │\n",
              "│ 3         ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
              "│ 2025-01-1 ┆ GOOG     ┆ cusip9    ┆ -0.28     ┆ … ┆ 1.21      ┆ 1.0       ┆ null      ┆ 1.0       │\n",
              "│ 4         ┆          ┆           ┆           ┆   ┆           ┆           ┆           ┆           │\n",
              "└───────────┴──────────┴───────────┴───────────┴───┴───────────┴───────────┴───────────┴───────────┘"
            ]
          },
          "execution_count": 15,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "example_df2 = example_df.with_columns(pl.col(\"date\").dt.add_business_days(5))\n",
        "example_df2"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 16,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadStagingResult(name='example-2', timestamp=datetime.datetime(2026, 4, 29, 20, 15, 56, 370192, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])])"
            ]
          },
          "execution_count": 16,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# note that if we used the same name \"example-1\" this cell would fail\n",
        "dataset.stage_df(name=\"example-2\", df=example_df2, parser=\"Wide-Format\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieving Staged Data\n",
        "\n",
        "Below we demonstrate how to obtain previously staged data. We can either obtain the staging results (as we saw above when calling the `stage_df` method) or the data itself."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 17,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'example-2': UploadStagingResult(name='example-2', timestamp=datetime.datetime(2026, 4, 29, 20, 15, 56, 370192, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])]),\n",
              " 'example-1': UploadStagingResult(name='example-1', timestamp=datetime.datetime(2026, 4, 29, 20, 15, 56, 198537, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])])}"
            ]
          },
          "execution_count": 17,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_staging_results()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 18,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (30, 7)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>_name</th><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>str</td><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody><tr><td>&quot;example-2&quot;</td><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.3</td></tr><tr><td>&quot;example-2&quot;</td><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.2</td></tr><tr><td>&quot;example-2&quot;</td><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.2</td></tr><tr><td>&quot;example-2&quot;</td><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>&quot;example-2&quot;</td><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.28</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.19</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.21</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (30, 7)\n",
              "┌───────────┬────────────┬──────────┬───────────────┬──────────────┬─────────────┬──────────┐\n",
              "│ _name     ┆ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure │\n",
              "│ ---       ┆ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---      │\n",
              "│ str       ┆ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32      │\n",
              "╞═══════════╪════════════╪══════════╪═══════════════╪══════════════╪═════════════╪══════════╡\n",
              "│ example-2 ┆ 2025-01-13 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.3     │\n",
              "│ example-2 ┆ 2025-01-13 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.2     │\n",
              "│ example-2 ┆ 2025-01-13 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.2      │\n",
              "│ example-2 ┆ 2025-01-13 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0      │\n",
              "│ example-2 ┆ 2025-01-13 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0      │\n",
              "│ …         ┆ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …        │\n",
              "│ example-1 ┆ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.28    │\n",
              "│ example-1 ┆ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.19    │\n",
              "│ example-1 ┆ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.21     │\n",
              "│ example-1 ┆ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0      │\n",
              "│ example-1 ┆ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0      │\n",
              "└───────────┴────────────┴──────────┴───────────────┴──────────────┴─────────────┴──────────┘"
            ]
          },
          "execution_count": 18,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_staging_data().collect()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 19,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (15, 7)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>_name</th><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>str</td><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody><tr><td>&quot;example-1&quot;</td><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.3</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.2</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.2</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.28</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.19</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.21</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (15, 7)\n",
              "┌───────────┬────────────┬──────────┬───────────────┬──────────────┬─────────────┬──────────┐\n",
              "│ _name     ┆ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure │\n",
              "│ ---       ┆ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---      │\n",
              "│ str       ┆ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32      │\n",
              "╞═══════════╪════════════╪══════════╪═══════════════╪══════════════╪═════════════╪══════════╡\n",
              "│ example-1 ┆ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.3     │\n",
              "│ example-1 ┆ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.2     │\n",
              "│ example-1 ┆ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.2      │\n",
              "│ example-1 ┆ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0      │\n",
              "│ example-1 ┆ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0      │\n",
              "│ …         ┆ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …        │\n",
              "│ example-1 ┆ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.28    │\n",
              "│ example-1 ┆ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.19    │\n",
              "│ example-1 ┆ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.21     │\n",
              "│ example-1 ┆ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0      │\n",
              "│ example-1 ┆ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0      │\n",
              "└───────────┴────────────┴──────────┴───────────────┴──────────────┴─────────────┴──────────┘"
            ]
          },
          "execution_count": 19,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_staging_data(names=[\"example-1\"]).collect()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieving Summary Data\n",
        "\n",
        "We can also obtain predefined summary data for each staged file."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 20,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (2, 11)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>_name</th><th>n_dates</th><th>n_assets</th><th>min_date</th><th>max_date</th><th>n_factor_groups</th><th>n_factors</th><th>min_exposure</th><th>max_exposure</th><th>mean_exposure</th><th>std_exposure</th></tr><tr><td>str</td><td>u32</td><td>u32</td><td>date</td><td>date</td><td>u32</td><td>u32</td><td>f32</td><td>f32</td><td>f32</td><td>f32</td></tr></thead><tbody><tr><td>&quot;example-1&quot;</td><td>2</td><td>2</td><td>2025-01-06</td><td>2025-01-07</td><td>3</td><td>6</td><td>-0.3</td><td>1.21</td><td>0.609333</td><td>0.580189</td></tr><tr><td>&quot;example-2&quot;</td><td>2</td><td>2</td><td>2025-01-13</td><td>2025-01-14</td><td>3</td><td>6</td><td>-0.3</td><td>1.21</td><td>0.609333</td><td>0.580189</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (2, 11)\n",
              "┌───────────┬─────────┬──────────┬────────────┬───┬────────────┬───────────┬───────────┬───────────┐\n",
              "│ _name     ┆ n_dates ┆ n_assets ┆ min_date   ┆ … ┆ min_exposu ┆ max_expos ┆ mean_expo ┆ std_expos │\n",
              "│ ---       ┆ ---     ┆ ---      ┆ ---        ┆   ┆ re         ┆ ure       ┆ sure      ┆ ure       │\n",
              "│ str       ┆ u32     ┆ u32      ┆ date       ┆   ┆ ---        ┆ ---       ┆ ---       ┆ ---       │\n",
              "│           ┆         ┆          ┆            ┆   ┆ f32        ┆ f32       ┆ f32       ┆ f32       │\n",
              "╞═══════════╪═════════╪══════════╪════════════╪═══╪════════════╪═══════════╪═══════════╪═══════════╡\n",
              "│ example-1 ┆ 2       ┆ 2        ┆ 2025-01-06 ┆ … ┆ -0.3       ┆ 1.21      ┆ 0.609333  ┆ 0.580189  │\n",
              "│ example-2 ┆ 2       ┆ 2        ┆ 2025-01-13 ┆ … ┆ -0.3       ┆ 1.21      ┆ 0.609333  ┆ 0.580189  │\n",
              "└───────────┴─────────┴──────────┴────────────┴───┴────────────┴───────────┴───────────┴───────────┘"
            ]
          },
          "execution_count": 20,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_staging_data_summary()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Drilling down, we can obtain a more detailed summary as well. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 21,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (4, 7)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>_name</th><th>date</th><th>n_assets</th><th>min_exposure</th><th>max_exposure</th><th>mean_exposure</th><th>std_exposure</th></tr><tr><td>str</td><td>date</td><td>u32</td><td>f32</td><td>f32</td><td>f32</td><td>f32</td></tr></thead><tbody><tr><td>&quot;example-1&quot;</td><td>2025-01-06</td><td>2</td><td>-0.3</td><td>1.2</td><td>0.64</td><td>0.542586</td></tr><tr><td>&quot;example-1&quot;</td><td>2025-01-07</td><td>1</td><td>-0.28</td><td>1.21</td><td>0.548</td><td>0.644528</td></tr><tr><td>&quot;example-2&quot;</td><td>2025-01-13</td><td>2</td><td>-0.3</td><td>1.2</td><td>0.64</td><td>0.542586</td></tr><tr><td>&quot;example-2&quot;</td><td>2025-01-14</td><td>1</td><td>-0.28</td><td>1.21</td><td>0.548</td><td>0.644528</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (4, 7)\n",
              "┌───────────┬────────────┬──────────┬──────────────┬──────────────┬───────────────┬──────────────┐\n",
              "│ _name     ┆ date       ┆ n_assets ┆ min_exposure ┆ max_exposure ┆ mean_exposure ┆ std_exposure │\n",
              "│ ---       ┆ ---        ┆ ---      ┆ ---          ┆ ---          ┆ ---           ┆ ---          │\n",
              "│ str       ┆ date       ┆ u32      ┆ f32          ┆ f32          ┆ f32           ┆ f32          │\n",
              "╞═══════════╪════════════╪══════════╪══════════════╪══════════════╪═══════════════╪══════════════╡\n",
              "│ example-1 ┆ 2025-01-06 ┆ 2        ┆ -0.3         ┆ 1.2          ┆ 0.64          ┆ 0.542586     │\n",
              "│ example-1 ┆ 2025-01-07 ┆ 1        ┆ -0.28        ┆ 1.21         ┆ 0.548         ┆ 0.644528     │\n",
              "│ example-2 ┆ 2025-01-13 ┆ 2        ┆ -0.3         ┆ 1.2          ┆ 0.64          ┆ 0.542586     │\n",
              "│ example-2 ┆ 2025-01-14 ┆ 1        ┆ -0.28        ┆ 1.21         ┆ 0.548         ┆ 0.644528     │\n",
              "└───────────┴────────────┴──────────┴──────────────┴──────────────┴───────────────┴──────────────┘"
            ]
          },
          "execution_count": 21,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_staging_data_detail_summary()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Staging from a File\n",
        "\n",
        "Instead of passing a dataframe directly, we can also stage data from an existing `csv`, `csv.gz`, `parquet` or `zip` file.\n",
        "\n",
        "Below we'll demonstrate how to: \n",
        "\n",
        "1. Stage a file\n",
        "2. Obtain the staged data\n",
        "3. Remove the file from the staging area\n",
        "\n",
        "First, we define an output path where we will write our `example_df2` dataframe."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 22,
      "metadata": {},
      "outputs": [],
      "source": [
        "path = Path(tempfile.mkdtemp()) / \"example2.csv\"\n",
        "example_df2.write_csv(path)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can then stage the output file with `stage_file`, retrieve back the data with `get_staging_data`, and wipe the staging area with `wipe_staging`."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 23,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadStagingResult(name='example2', timestamp=datetime.datetime(2026, 4, 29, 20, 15, 57, 167485, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])])"
            ]
          },
          "execution_count": 23,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.stage_file(path, parser=\"Wide-Format\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 24,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (15, 7)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>_name</th><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>str</td><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody><tr><td>&quot;example2&quot;</td><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.3</td></tr><tr><td>&quot;example2&quot;</td><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.2</td></tr><tr><td>&quot;example2&quot;</td><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.2</td></tr><tr><td>&quot;example2&quot;</td><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>&quot;example2&quot;</td><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>&quot;example2&quot;</td><td>2025-01-14</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.28</td></tr><tr><td>&quot;example2&quot;</td><td>2025-01-14</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.19</td></tr><tr><td>&quot;example2&quot;</td><td>2025-01-14</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.21</td></tr><tr><td>&quot;example2&quot;</td><td>2025-01-14</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>&quot;example2&quot;</td><td>2025-01-14</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (15, 7)\n",
              "┌──────────┬────────────┬──────────┬───────────────┬──────────────┬─────────────┬──────────┐\n",
              "│ _name    ┆ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure │\n",
              "│ ---      ┆ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---      │\n",
              "│ str      ┆ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32      │\n",
              "╞══════════╪════════════╪══════════╪═══════════════╪══════════════╪═════════════╪══════════╡\n",
              "│ example2 ┆ 2025-01-13 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.3     │\n",
              "│ example2 ┆ 2025-01-13 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.2     │\n",
              "│ example2 ┆ 2025-01-13 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.2      │\n",
              "│ example2 ┆ 2025-01-13 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0      │\n",
              "│ example2 ┆ 2025-01-13 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0      │\n",
              "│ …        ┆ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …        │\n",
              "│ example2 ┆ 2025-01-14 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.28    │\n",
              "│ example2 ┆ 2025-01-14 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.19    │\n",
              "│ example2 ┆ 2025-01-14 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.21     │\n",
              "│ example2 ┆ 2025-01-14 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0      │\n",
              "│ example2 ┆ 2025-01-14 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0      │\n",
              "└──────────┴────────────┴──────────┴───────────────┴──────────────┴─────────────┴──────────┘"
            ]
          },
          "execution_count": 24,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_staging_data(names=[\"example2\"]).collect()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 25,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'example2': UploadStagingResult(name='example2', timestamp=datetime.datetime(2026, 4, 29, 20, 15, 57, 167485, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])])}"
            ]
          },
          "execution_count": 25,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.wipe_staging(names=[\"example2\"])"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Committing Data\n",
        "\n",
        "Once we're happy with the staged files, we can commit them into versioned storage. Versions are immutable so every commmit creates a new version, which allows for full time travel.\n",
        "\n",
        "When committing staged data, we need to choose a `mode` which defines how to write the data in the context of a versioned storage."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 26,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'append': 'Appends new factor/date combinations to the existing data. Collisions will be ignored.',\n",
              " 'append_factor': 'Appends new factors to the existing data. Collisions with existing factors will be ignored.',\n",
              " 'overwrite': 'Overwrites the entire dataset with the new data.',\n",
              " 'overwrite_factor': 'Overwrites every factor present in the incoming data.',\n",
              " 'append_from': 'Appends new factor/date combinations to the existing data but only after the last date in the existing data. Collisions will be ignored.',\n",
              " 'overwrite_from': 'Overwrites the entire dataset with the new data but only after the last date in the existing data.'}"
            ]
          },
          "execution_count": 26,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_commit_modes()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We'll follow the steps below to demonstrate the commit and versioning process:\n",
        "\n",
        "1. Commit entire staging area.\n",
        "2. Show empty staging area (committed staged names are cleared from the staging area).\n",
        "3. Show version history.\n",
        "4. Get data at latest version.\n",
        "5. Re-stage the example-1 dataframe and commit in `append` mode.\n",
        "6. Re-stage the example-2 dataframe and commit in `append_from` mode.\n",
        "7. Get data at different versions.\n",
        "\n",
        "We start by committing the entire staging area. When we do this, we see that the commit was created as version 1.\n"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 27,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadCommitResult(version=1, committed_names=['example-2', 'example-1'])"
            ]
          },
          "execution_count": 27,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.commit(mode=\"append\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "After committing, the staging area should now be empty."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 28,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (0, 7)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>_name</th><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>str</td><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (0, 7)\n",
              "┌───────┬──────┬──────────┬───────────────┬──────────────┬────────┬──────────┐\n",
              "│ _name ┆ date ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor ┆ exposure │\n",
              "│ ---   ┆ ---  ┆ ---      ┆ ---           ┆ ---          ┆ ---    ┆ ---      │\n",
              "│ str   ┆ date ┆ str      ┆ str           ┆ str          ┆ str    ┆ f32      │\n",
              "╞═══════╪══════╪══════════╪═══════════════╪══════════════╪════════╪══════════╡\n",
              "└───────┴──────┴──────────┴───────────────┴──────────────┴────────┴──────────┘"
            ]
          },
          "execution_count": 28,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_staging_data().collect()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can get the list of all historical versions with the `version_history` method. We see below that there are 2 versions. Version 0 corresponds to the automatic creation of the dataset before it was overwritten with our commit changes in Version 1."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 29,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{1: datetime.datetime(2026, 4, 29, 20, 15, 57, 656000, tzinfo=datetime.timezone.utc),\n",
              " 0: datetime.datetime(2026, 4, 29, 20, 15, 57, 617000, tzinfo=datetime.timezone.utc)}"
            ]
          },
          "execution_count": 29,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.version_history()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 30,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (30, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.300049</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.199951</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.200195</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>2025-01-14</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.280029</td></tr><tr><td>2025-01-14</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.189941</td></tr><tr><td>2025-01-14</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.209961</td></tr><tr><td>2025-01-14</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-14</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (30, 6)\n",
              "┌────────────┬──────────┬───────────────┬──────────────┬─────────────┬───────────┐\n",
              "│ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure  │\n",
              "│ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---       │\n",
              "│ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32       │\n",
              "╞════════════╪══════════╪═══════════════╪══════════════╪═════════════╪═══════════╡\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.300049 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.199951 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.200195  │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "│ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …         │\n",
              "│ 2025-01-14 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.280029 │\n",
              "│ 2025-01-14 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.189941 │\n",
              "│ 2025-01-14 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.209961  │\n",
              "│ 2025-01-14 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-14 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "└────────────┴──────────┴───────────────┴──────────────┴─────────────┴───────────┘"
            ]
          },
          "execution_count": 30,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_data().collect()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Below we will append some more data to demonstrate the versioning."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 31,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadStagingResult(name='example-1', timestamp=datetime.datetime(2026, 4, 29, 20, 15, 58, 876638, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])])"
            ]
          },
          "execution_count": 31,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.stage_df(\n",
        "    \"example-1\", \n",
        "    example_df.with_columns(pl.col(\"date\").dt.add_business_days(10)), \n",
        "    parser=\"Wide-Format\"\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 32,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadCommitResult(version=2, committed_names=['example-1'])"
            ]
          },
          "execution_count": 32,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.commit(mode=\"append\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 33,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{2: datetime.datetime(2026, 4, 29, 20, 15, 59, 177000, tzinfo=datetime.timezone.utc),\n",
              " 1: datetime.datetime(2026, 4, 29, 20, 15, 57, 656000, tzinfo=datetime.timezone.utc),\n",
              " 0: datetime.datetime(2026, 4, 29, 20, 15, 57, 617000, tzinfo=datetime.timezone.utc)}"
            ]
          },
          "execution_count": 33,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.version_history()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 34,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadStagingResult(name='example-2', timestamp=datetime.datetime(2026, 4, 29, 20, 15, 59, 951215, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])])"
            ]
          },
          "execution_count": 34,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.stage_df(\n",
        "    \"example-2\", \n",
        "    example_df2.with_columns(pl.col(\"date\").dt.add_business_days(10)), \n",
        "    parser=\"Wide-Format\"\n",
        ")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 35,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadCommitResult(version=3, committed_names=['example-2'])"
            ]
          },
          "execution_count": 35,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.commit(mode=\"append_from\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 36,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{3: datetime.datetime(2026, 4, 29, 20, 16, 0, 350000, tzinfo=datetime.timezone.utc),\n",
              " 2: datetime.datetime(2026, 4, 29, 20, 15, 59, 177000, tzinfo=datetime.timezone.utc),\n",
              " 1: datetime.datetime(2026, 4, 29, 20, 15, 57, 656000, tzinfo=datetime.timezone.utc),\n",
              " 0: datetime.datetime(2026, 4, 29, 20, 15, 57, 617000, tzinfo=datetime.timezone.utc)}"
            ]
          },
          "execution_count": 36,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.version_history()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 37,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (60, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.300049</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.199951</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.200195</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.280029</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.189941</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.209961</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (60, 6)\n",
              "┌────────────┬──────────┬───────────────┬──────────────┬─────────────┬───────────┐\n",
              "│ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure  │\n",
              "│ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---       │\n",
              "│ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32       │\n",
              "╞════════════╪══════════╪═══════════════╪══════════════╪═════════════╪═══════════╡\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.300049 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.199951 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.200195  │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "│ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …         │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.280029 │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.189941 │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.209961  │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "└────────────┴──────────┴───────────────┴──────────────┴─────────────┴───────────┘"
            ]
          },
          "execution_count": 37,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# both example-1 and example-2 at this version\n",
        "dataset.get_data(version=3).collect()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 38,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (45, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.300049</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.199951</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.200195</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>2025-01-21</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.280029</td></tr><tr><td>2025-01-21</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.189941</td></tr><tr><td>2025-01-21</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.209961</td></tr><tr><td>2025-01-21</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-21</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (45, 6)\n",
              "┌────────────┬──────────┬───────────────┬──────────────┬─────────────┬───────────┐\n",
              "│ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure  │\n",
              "│ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---       │\n",
              "│ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32       │\n",
              "╞════════════╪══════════╪═══════════════╪══════════════╪═════════════╪═══════════╡\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.300049 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.199951 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.200195  │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "│ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …         │\n",
              "│ 2025-01-21 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.280029 │\n",
              "│ 2025-01-21 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.189941 │\n",
              "│ 2025-01-21 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.209961  │\n",
              "│ 2025-01-21 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-21 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "└────────────┴──────────┴───────────────┴──────────────┴─────────────┴───────────┘"
            ]
          },
          "execution_count": 38,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "# only example-1 at this version\n",
        "dataset.get_data(version=2).collect()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Note that if we append data that already exists as identified by their primary key (i.e. there is no data to append), then no new version will be recorded."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 39,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadStagingResult(name='no-new-data', timestamp=datetime.datetime(2026, 4, 29, 20, 16, 1, 915606, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])])"
            ]
          },
          "execution_count": 39,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.stage_df(\"no-new-data\", example_df)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 40,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadCommitResult(version=3, committed_names=['no-new-data'])"
            ]
          },
          "execution_count": 40,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.commit(mode=\"append\")"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 41,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{3: datetime.datetime(2026, 4, 29, 20, 16, 0, 350000, tzinfo=datetime.timezone.utc),\n",
              " 2: datetime.datetime(2026, 4, 29, 20, 15, 59, 177000, tzinfo=datetime.timezone.utc),\n",
              " 1: datetime.datetime(2026, 4, 29, 20, 15, 57, 656000, tzinfo=datetime.timezone.utc),\n",
              " 0: datetime.datetime(2026, 4, 29, 20, 15, 57, 617000, tzinfo=datetime.timezone.utc)}"
            ]
          },
          "execution_count": 41,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.version_history()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Retrieving Summary Data\n",
        "\n",
        "Similar to the summary data we could obtain for the staging data, we can do the same for the committed data (at different versions)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 42,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (1, 10)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>n_dates</th><th>n_assets</th><th>min_date</th><th>max_date</th><th>n_factor_groups</th><th>n_factors</th><th>min_exposure</th><th>max_exposure</th><th>mean_exposure</th><th>std_exposure</th></tr><tr><td>u32</td><td>u32</td><td>date</td><td>date</td><td>u32</td><td>u32</td><td>f32</td><td>f32</td><td>f32</td><td>f32</td></tr></thead><tbody><tr><td>8</td><td>2</td><td>2025-01-06</td><td>2025-01-28</td><td>3</td><td>6</td><td>-19948.0</td><td>15575.0</td><td>5771.866699</td><td>15326.09668</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (1, 10)\n",
              "┌─────────┬──────────┬────────────┬────────────┬───┬───────────┬───────────┬───────────┬───────────┐\n",
              "│ n_dates ┆ n_assets ┆ min_date   ┆ max_date   ┆ … ┆ min_expos ┆ max_expos ┆ mean_expo ┆ std_expos │\n",
              "│ ---     ┆ ---      ┆ ---        ┆ ---        ┆   ┆ ure       ┆ ure       ┆ sure      ┆ ure       │\n",
              "│ u32     ┆ u32      ┆ date       ┆ date       ┆   ┆ ---       ┆ ---       ┆ ---       ┆ ---       │\n",
              "│         ┆          ┆            ┆            ┆   ┆ f32       ┆ f32       ┆ f32       ┆ f32       │\n",
              "╞═════════╪══════════╪════════════╪════════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡\n",
              "│ 8       ┆ 2        ┆ 2025-01-06 ┆ 2025-01-28 ┆ … ┆ -19948.0  ┆ 15575.0   ┆ 5771.8666 ┆ 15326.096 │\n",
              "│         ┆          ┆            ┆            ┆   ┆           ┆           ┆ 99        ┆ 68        │\n",
              "└─────────┴──────────┴────────────┴────────────┴───┴───────────┴───────────┴───────────┴───────────┘"
            ]
          },
          "execution_count": 42,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_data_summary()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 43,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (1, 10)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>n_dates</th><th>n_assets</th><th>min_date</th><th>max_date</th><th>n_factor_groups</th><th>n_factors</th><th>min_exposure</th><th>max_exposure</th><th>mean_exposure</th><th>std_exposure</th></tr><tr><td>u32</td><td>u32</td><td>date</td><td>date</td><td>u32</td><td>u32</td><td>f32</td><td>f32</td><td>f32</td><td>f32</td></tr></thead><tbody><tr><td>6</td><td>2</td><td>2025-01-06</td><td>2025-01-21</td><td>3</td><td>6</td><td>-19948.0</td><td>15575.0</td><td>5771.866699</td><td>15326.09668</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (1, 10)\n",
              "┌─────────┬──────────┬────────────┬────────────┬───┬───────────┬───────────┬───────────┬───────────┐\n",
              "│ n_dates ┆ n_assets ┆ min_date   ┆ max_date   ┆ … ┆ min_expos ┆ max_expos ┆ mean_expo ┆ std_expos │\n",
              "│ ---     ┆ ---      ┆ ---        ┆ ---        ┆   ┆ ure       ┆ ure       ┆ sure      ┆ ure       │\n",
              "│ u32     ┆ u32      ┆ date       ┆ date       ┆   ┆ ---       ┆ ---       ┆ ---       ┆ ---       │\n",
              "│         ┆          ┆            ┆            ┆   ┆ f32       ┆ f32       ┆ f32       ┆ f32       │\n",
              "╞═════════╪══════════╪════════════╪════════════╪═══╪═══════════╪═══════════╪═══════════╪═══════════╡\n",
              "│ 6       ┆ 2        ┆ 2025-01-06 ┆ 2025-01-21 ┆ … ┆ -19948.0  ┆ 15575.0   ┆ 5771.8666 ┆ 15326.096 │\n",
              "│         ┆          ┆            ┆            ┆   ┆           ┆           ┆ 99        ┆ 68        │\n",
              "└─────────┴──────────┴────────────┴────────────┴───┴───────────┴───────────┴───────────┴───────────┘"
            ]
          },
          "execution_count": 43,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_data_summary(version=2)"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 44,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (8, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>n_assets</th><th>min_exposure</th><th>max_exposure</th><th>mean_exposure</th><th>std_exposure</th></tr><tr><td>date</td><td>i64</td><td>f32</td><td>f32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>2025-01-06</td><td>2</td><td>-0.300049</td><td>1.200195</td><td>0.639978</td><td>0.542577</td></tr><tr><td>2025-01-07</td><td>1</td><td>-0.280029</td><td>1.209961</td><td>0.547998</td><td>0.644514</td></tr><tr><td>2025-01-13</td><td>2</td><td>-0.300049</td><td>1.200195</td><td>0.639978</td><td>0.542577</td></tr><tr><td>2025-01-14</td><td>1</td><td>-0.280029</td><td>1.209961</td><td>0.547998</td><td>0.644514</td></tr><tr><td>2025-01-20</td><td>2</td><td>-0.300049</td><td>1.200195</td><td>0.639978</td><td>0.542577</td></tr><tr><td>2025-01-21</td><td>1</td><td>-0.280029</td><td>1.209961</td><td>0.547998</td><td>0.644514</td></tr><tr><td>2025-01-27</td><td>2</td><td>-0.300049</td><td>1.200195</td><td>0.639978</td><td>0.542577</td></tr><tr><td>2025-01-28</td><td>1</td><td>-0.280029</td><td>1.209961</td><td>0.547998</td><td>0.644514</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (8, 6)\n",
              "┌────────────┬──────────┬──────────────┬──────────────┬───────────────┬──────────────┐\n",
              "│ date       ┆ n_assets ┆ min_exposure ┆ max_exposure ┆ mean_exposure ┆ std_exposure │\n",
              "│ ---        ┆ ---      ┆ ---          ┆ ---          ┆ ---           ┆ ---          │\n",
              "│ date       ┆ i64      ┆ f32          ┆ f32          ┆ f64           ┆ f64          │\n",
              "╞════════════╪══════════╪══════════════╪══════════════╪═══════════════╪══════════════╡\n",
              "│ 2025-01-06 ┆ 2        ┆ -0.300049    ┆ 1.200195     ┆ 0.639978      ┆ 0.542577     │\n",
              "│ 2025-01-07 ┆ 1        ┆ -0.280029    ┆ 1.209961     ┆ 0.547998      ┆ 0.644514     │\n",
              "│ 2025-01-13 ┆ 2        ┆ -0.300049    ┆ 1.200195     ┆ 0.639978      ┆ 0.542577     │\n",
              "│ 2025-01-14 ┆ 1        ┆ -0.280029    ┆ 1.209961     ┆ 0.547998      ┆ 0.644514     │\n",
              "│ 2025-01-20 ┆ 2        ┆ -0.300049    ┆ 1.200195     ┆ 0.639978      ┆ 0.542577     │\n",
              "│ 2025-01-21 ┆ 1        ┆ -0.280029    ┆ 1.209961     ┆ 0.547998      ┆ 0.644514     │\n",
              "│ 2025-01-27 ┆ 2        ┆ -0.300049    ┆ 1.200195     ┆ 0.639978      ┆ 0.542577     │\n",
              "│ 2025-01-28 ┆ 1        ┆ -0.280029    ┆ 1.209961     ┆ 0.547998      ┆ 0.644514     │\n",
              "└────────────┴──────────┴──────────────┴──────────────┴───────────────┴──────────────┘"
            ]
          },
          "execution_count": 44,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_data_detail_summary()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 45,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (6, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>n_assets</th><th>min_exposure</th><th>max_exposure</th><th>mean_exposure</th><th>std_exposure</th></tr><tr><td>date</td><td>i64</td><td>f32</td><td>f32</td><td>f64</td><td>f64</td></tr></thead><tbody><tr><td>2025-01-06</td><td>2</td><td>-0.300049</td><td>1.200195</td><td>0.639978</td><td>0.542577</td></tr><tr><td>2025-01-07</td><td>1</td><td>-0.280029</td><td>1.209961</td><td>0.547998</td><td>0.644514</td></tr><tr><td>2025-01-13</td><td>2</td><td>-0.300049</td><td>1.200195</td><td>0.639978</td><td>0.542577</td></tr><tr><td>2025-01-14</td><td>1</td><td>-0.280029</td><td>1.209961</td><td>0.547998</td><td>0.644514</td></tr><tr><td>2025-01-20</td><td>2</td><td>-0.300049</td><td>1.200195</td><td>0.639978</td><td>0.542577</td></tr><tr><td>2025-01-21</td><td>1</td><td>-0.280029</td><td>1.209961</td><td>0.547998</td><td>0.644514</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (6, 6)\n",
              "┌────────────┬──────────┬──────────────┬──────────────┬───────────────┬──────────────┐\n",
              "│ date       ┆ n_assets ┆ min_exposure ┆ max_exposure ┆ mean_exposure ┆ std_exposure │\n",
              "│ ---        ┆ ---      ┆ ---          ┆ ---          ┆ ---           ┆ ---          │\n",
              "│ date       ┆ i64      ┆ f32          ┆ f32          ┆ f64           ┆ f64          │\n",
              "╞════════════╪══════════╪══════════════╪══════════════╪═══════════════╪══════════════╡\n",
              "│ 2025-01-06 ┆ 2        ┆ -0.300049    ┆ 1.200195     ┆ 0.639978      ┆ 0.542577     │\n",
              "│ 2025-01-07 ┆ 1        ┆ -0.280029    ┆ 1.209961     ┆ 0.547998      ┆ 0.644514     │\n",
              "│ 2025-01-13 ┆ 2        ┆ -0.300049    ┆ 1.200195     ┆ 0.639978      ┆ 0.542577     │\n",
              "│ 2025-01-14 ┆ 1        ┆ -0.280029    ┆ 1.209961     ┆ 0.547998      ┆ 0.644514     │\n",
              "│ 2025-01-20 ┆ 2        ┆ -0.300049    ┆ 1.200195     ┆ 0.639978      ┆ 0.542577     │\n",
              "│ 2025-01-21 ┆ 1        ┆ -0.280029    ┆ 1.209961     ┆ 0.547998      ┆ 0.644514     │\n",
              "└────────────┴──────────┴──────────────┴──────────────┴───────────────┴──────────────┘"
            ]
          },
          "execution_count": 45,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_data_detail_summary(version=2)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### Validating Staging Data\n",
        "\n",
        "When staging multiple files, it's possible that their combined contents may not be valid and so cannot be committed. For example, if the files introduce duplicate entries, the `commit` method will fail.\n",
        "\n",
        "The example below illustrates how to validate the staging area before committing. To simulate a validation failure, we intentionally stage the same example dataframe twice, resulting in duplicate records."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 46,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'example-2': UploadStagingResult(name='example-2', timestamp=datetime.datetime(2026, 4, 29, 20, 16, 4, 153486, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])]),\n",
              " 'example-1': UploadStagingResult(name='example-1', timestamp=datetime.datetime(2026, 4, 29, 20, 16, 4, 20602, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])])}"
            ]
          },
          "execution_count": 46,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.stage_df(\"example-1\", example_df, parser=\"Wide-Format\")\n",
        "dataset.stage_df(\"example-2\", example_df, parser=\"Wide-Format\")\n",
        "\n",
        "dataset.get_staging_results()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "The validation check below will produce **non-empty** dataframes if any validation errors occurred. In this case, it produces the duplicated records together with a `_name` column which indicates the names of the staged dataframes that introduced the duplication."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 47,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'Duplication Check': shape: (15, 7)\n",
              " ┌────────────┬──────────┬───────────────┬──────────────┬─────────────┬─────────┬────────────┐\n",
              " │ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ n_dupes ┆ _name      │\n",
              " │ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---     ┆ ---        │\n",
              " │ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ u32     ┆ str        │\n",
              " ╞════════════╪══════════╪═══════════════╪══════════════╪═════════════╪═════════╪════════════╡\n",
              " │ 2025-01-06 ┆ AAPL     ┆ cusip9        ┆ industry     ┆ consumer    ┆ 2       ┆ example-2, │\n",
              " │            ┆          ┆               ┆              ┆             ┆         ┆ example-1  │\n",
              " │ 2025-01-06 ┆ AAPL     ┆ cusip9        ┆ market       ┆ market      ┆ 2       ┆ example-2, │\n",
              " │            ┆          ┆               ┆              ┆             ┆         ┆ example-1  │\n",
              " │ 2025-01-06 ┆ AAPL     ┆ cusip9        ┆ style        ┆ growth      ┆ 2       ┆ example-2, │\n",
              " │            ┆          ┆               ┆              ┆             ┆         ┆ example-1  │\n",
              " │ 2025-01-06 ┆ AAPL     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ 2       ┆ example-2, │\n",
              " │            ┆          ┆               ┆              ┆             ┆         ┆ example-1  │\n",
              " │ 2025-01-06 ┆ AAPL     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ 2       ┆ example-2, │\n",
              " │            ┆          ┆               ┆              ┆             ┆         ┆ example-1  │\n",
              " │ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …       ┆ …          │\n",
              " │ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 2       ┆ example-2, │\n",
              " │            ┆          ┆               ┆              ┆             ┆         ┆ example-1  │\n",
              " │ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 2       ┆ example-2, │\n",
              " │            ┆          ┆               ┆              ┆             ┆         ┆ example-1  │\n",
              " │ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 2       ┆ example-2, │\n",
              " │            ┆          ┆               ┆              ┆             ┆         ┆ example-1  │\n",
              " │ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ 2       ┆ example-2, │\n",
              " │            ┆          ┆               ┆              ┆             ┆         ┆ example-1  │\n",
              " │ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ 2       ┆ example-2, │\n",
              " │            ┆          ┆               ┆              ┆             ┆         ┆ example-1  │\n",
              " └────────────┴──────────┴───────────────┴──────────────┴─────────────┴─────────┴────────────┘}"
            ]
          },
          "execution_count": 47,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.validate_staging_data()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 48,
      "metadata": {},
      "outputs": [],
      "source": [
        "# this call would fail with: `UploadError: Staging data fails validation checks.`\n",
        "# dataset.commit(mode=\"append\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "To resolve this error, we can delete one of the erroneously staged dataframes, after which the validation will produce an empty dataframe, indicating successful validation."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 49,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{'example-2': UploadStagingResult(name='example-2', timestamp=datetime.datetime(2026, 4, 29, 20, 16, 4, 153486, tzinfo=TzInfo(0)), success=True, results=[UploadParserResult(parser='Wide-Format', success=True, messages=[])])}"
            ]
          },
          "execution_count": 49,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.wipe_staging(names=[\"example-2\"])"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 50,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "{}"
            ]
          },
          "execution_count": 50,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.validate_staging_data()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Fast Commit\n",
        "\n",
        "We can skip the staging process and commit a dataframe straight into versioned storage as demonstrated below."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 51,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "UploadCommitResult(version=3, committed_names=[])"
            ]
          },
          "execution_count": 51,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.fast_commit(example_df, mode=\"append\", parser=\"Wide-Format\")"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Filtering Data\n",
        "\n",
        "`get_data` allows us to obtain committed data at different versions. We can further add filters and column selectors to minimize the payload that has to travel across the network.\n",
        "\n",
        "Filters follow a disjunctive normal form as outlined in the `filters` section of the [PyArrow Documentation](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.ParquetDataset.html).\n",
        "That is, we specify triples of `(column_name, operator, value)` to express a filter. A list of such triples will create an *AND* expression, whereas a nested list of triples will create *AND* expressions on the inner lists and *OR* expressions on the outer.\n",
        "\n",
        "Examples:\n",
        "* `[(\"date\", \">=\", \"2025-01-15\"), (\"asset_id\", \"=\", \"GOOG\")]` produces `date >= 2025-01-15 AND asset_id = GOOG`.\n",
        "* `[[(\"date\", \">=\", \"2025-01-15\"), (\"asset_id\", \"=\", \"GOOG\")], [(\"exposure\", \"<\", 0)]]` produces `(date >= 2025-01-15 AND asset_id = GOOG) OR (exposure < 0)`"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 52,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (5, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.300049</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.199951</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.200195</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (5, 6)\n",
              "┌────────────┬──────────┬───────────────┬──────────────┬─────────────┬───────────┐\n",
              "│ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure  │\n",
              "│ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---       │\n",
              "│ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32       │\n",
              "╞════════════╪══════════╪═══════════════╪══════════════╪═════════════╪═══════════╡\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.300049 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.199951 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.200195  │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "└────────────┴──────────┴───────────────┴──────────────┴─────────────┴───────────┘"
            ]
          },
          "execution_count": 52,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_data(head=5).collect()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 53,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (20, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody><tr><td>2025-01-27</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.300049</td></tr><tr><td>2025-01-27</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.199951</td></tr><tr><td>2025-01-27</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.200195</td></tr><tr><td>2025-01-27</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-27</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>2025-01-21</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.280029</td></tr><tr><td>2025-01-21</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.189941</td></tr><tr><td>2025-01-21</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.209961</td></tr><tr><td>2025-01-21</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-21</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (20, 6)\n",
              "┌────────────┬──────────┬───────────────┬──────────────┬─────────────┬───────────┐\n",
              "│ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure  │\n",
              "│ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---       │\n",
              "│ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32       │\n",
              "╞════════════╪══════════╪═══════════════╪══════════════╪═════════════╪═══════════╡\n",
              "│ 2025-01-27 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.300049 │\n",
              "│ 2025-01-27 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.199951 │\n",
              "│ 2025-01-27 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.200195  │\n",
              "│ 2025-01-27 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-27 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "│ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …         │\n",
              "│ 2025-01-21 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.280029 │\n",
              "│ 2025-01-21 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.189941 │\n",
              "│ 2025-01-21 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.209961  │\n",
              "│ 2025-01-21 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-21 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "└────────────┴──────────┴───────────────┴──────────────┴─────────────┴───────────┘"
            ]
          },
          "execution_count": 53,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_data(filters=[(\"date\", \">=\", \"2025-01-15\"), (\"asset_id\", \"=\", \"GOOG\")]).collect()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 54,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (28, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.300049</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.199951</td></tr><tr><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.280029</td></tr><tr><td>2025-01-07</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.189941</td></tr><tr><td>2025-01-13</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.300049</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.280029</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.189941</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.209961</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (28, 6)\n",
              "┌────────────┬──────────┬───────────────┬──────────────┬─────────────┬───────────┐\n",
              "│ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure  │\n",
              "│ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---       │\n",
              "│ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32       │\n",
              "╞════════════╪══════════╪═══════════════╪══════════════╪═════════════╪═══════════╡\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.300049 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.199951 │\n",
              "│ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.280029 │\n",
              "│ 2025-01-07 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.189941 │\n",
              "│ 2025-01-13 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.300049 │\n",
              "│ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …         │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.280029 │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.189941 │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.209961  │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "└────────────┴──────────┴───────────────┴──────────────┴─────────────┴───────────┘"
            ]
          },
          "execution_count": 54,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_data(filters=[\n",
        "    [(\"date\", \">=\", \"2025-01-15\"), (\"asset_id\", \"=\", \"GOOG\")], \n",
        "    [(\"exposure\", \"<\", 0)]\n",
        "]).collect()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "We can also add the parameters `columns` and `unique`. For instance, to obtain a list of all assets (by asset id type) covered in this upload we would run below. "
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 55,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (2, 2)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>asset_id</th><th>asset_id_type</th></tr><tr><td>str</td><td>str</td></tr></thead><tbody><tr><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td></tr><tr><td>&quot;AAPL&quot;</td><td>&quot;cusip9&quot;</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (2, 2)\n",
              "┌──────────┬───────────────┐\n",
              "│ asset_id ┆ asset_id_type │\n",
              "│ ---      ┆ ---           │\n",
              "│ str      ┆ str           │\n",
              "╞══════════╪═══════════════╡\n",
              "│ GOOG     ┆ cusip9        │\n",
              "│ AAPL     ┆ cusip9        │\n",
              "└──────────┴───────────────┘"
            ]
          },
          "execution_count": 55,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "dataset.get_data(columns=[\"asset_id\", \"asset_id_type\"], unique=True).collect()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Downloading Data\n",
        "\n",
        "For very large datasets it could become prohibitive to download the entire dataset into memory. For that purpose we can stream the data into a flat file and then use Polars' lazy capabilities to read it."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 56,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[]"
            ]
          },
          "execution_count": 56,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "out_dir = Path(tempfile.mkdtemp())\n",
        "list(out_dir.iterdir())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 57,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (60, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.300049</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.199951</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.200195</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.280029</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.189941</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.209961</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (60, 6)\n",
              "┌────────────┬──────────┬───────────────┬──────────────┬─────────────┬───────────┐\n",
              "│ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure  │\n",
              "│ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---       │\n",
              "│ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32       │\n",
              "╞════════════╪══════════╪═══════════════╪══════════════╪═════════════╪═══════════╡\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.300049 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.199951 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.200195  │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "│ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …         │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.280029 │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.189941 │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.209961  │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "└────────────┴──────────┴───────────────┴──────────────┴─────────────┴───────────┘"
            ]
          },
          "execution_count": 57,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "df = dataset.get_data(download_to=out_dir)\n",
        "\n",
        "# this dataframe is reading from the flat files that were downloaded to the out_dir\n",
        "df.collect()"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 58,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/plain": [
              "[PosixPath('/tmp/tmpq8j7_k2x/data-0.parquet')]"
            ]
          },
          "execution_count": 58,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "list(out_dir.iterdir())"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 59,
      "metadata": {},
      "outputs": [
        {
          "data": {
            "text/html": [
              "<div><style>\n",
              ".dataframe > thead > tr,\n",
              ".dataframe > tbody > tr {\n",
              "  text-align: right;\n",
              "  white-space: pre-wrap;\n",
              "}\n",
              "</style>\n",
              "<small>shape: (60, 6)</small><table border=\"1\" class=\"dataframe\"><thead><tr><th>date</th><th>asset_id</th><th>asset_id_type</th><th>factor_group</th><th>factor</th><th>exposure</th></tr><tr><td>date</td><td>str</td><td>str</td><td>str</td><td>str</td><td>f32</td></tr></thead><tbody><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.300049</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.199951</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.200195</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-06</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr><tr><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td><td>&hellip;</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_6&quot;</td><td>-0.280029</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;momentum_12&quot;</td><td>-0.189941</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;style&quot;</td><td>&quot;growth&quot;</td><td>1.209961</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;market&quot;</td><td>&quot;market&quot;</td><td>1.0</td></tr><tr><td>2025-01-28</td><td>&quot;GOOG&quot;</td><td>&quot;cusip9&quot;</td><td>&quot;industry&quot;</td><td>&quot;tech&quot;</td><td>1.0</td></tr></tbody></table></div>"
            ],
            "text/plain": [
              "shape: (60, 6)\n",
              "┌────────────┬──────────┬───────────────┬──────────────┬─────────────┬───────────┐\n",
              "│ date       ┆ asset_id ┆ asset_id_type ┆ factor_group ┆ factor      ┆ exposure  │\n",
              "│ ---        ┆ ---      ┆ ---           ┆ ---          ┆ ---         ┆ ---       │\n",
              "│ date       ┆ str      ┆ str           ┆ str          ┆ str         ┆ f32       │\n",
              "╞════════════╪══════════╪═══════════════╪══════════════╪═════════════╪═══════════╡\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.300049 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.199951 │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.200195  │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-06 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "│ …          ┆ …        ┆ …             ┆ …            ┆ …           ┆ …         │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_6  ┆ -0.280029 │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ momentum_12 ┆ -0.189941 │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ style        ┆ growth      ┆ 1.209961  │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ market       ┆ market      ┆ 1.0       │\n",
              "│ 2025-01-28 ┆ GOOG     ┆ cusip9        ┆ industry     ┆ tech        ┆ 1.0       │\n",
              "└────────────┴──────────┴───────────────┴──────────────┴─────────────┴───────────┘"
            ]
          },
          "execution_count": 59,
          "metadata": {},
          "output_type": "execute_result"
        }
      ],
      "source": [
        "pl.scan_parquet(out_dir).collect()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## Housekeeping\n",
        "\n",
        "To delete a dataset entirely we can call its `destroy` method. Warning: This cannot be undone, so exercise caution when deleting datasets."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": 60,
      "metadata": {},
      "outputs": [],
      "source": [
        "dataset.destroy()"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": ".venv",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.11.15"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}