> ## Documentation Index
> Fetch the complete documentation index at: https://lightdash-mintlify-cccf65ca.mintlify.site/llms.txt
> Use this file to discover all available pages before exploring further.

# Pre-aggregates

> Speed up dashboards and reduce warehouse costs by serving queries from pre-computed, materialized summaries.

<Info>
  **Availability:** Pre-aggregates are a [Beta](/references/workspace/feature-maturity-levels) feature available on **Enterprise plans** only.
</Info>

Pre-aggregates let you define materialized summaries of your data directly in your dbt YAML. When a user runs a query in Lightdash, the system checks if the query can be answered from a pre-aggregate instead of querying your warehouse. If it matches, the query is served from the pre-computed results, making it significantly faster and reducing warehouse load.

This is especially useful for dashboards with high traffic or expensive aggregations that don't need real-time data.

Any query that goes through the Lightdash semantic layer can hit a pre-aggregate — this includes the Lightdash app, the [API](/api-reference/v1/introduction), [MCP](/references/integrations/lightdash-mcp), [AI agents](/guides/ai-agents), the [Embed SDK](/references/embedding), and the [React SDK](/references/react-sdk).

Watch this video walkthrough for an overview of how to get started with pre-aggregates:

<Frame>
  <iframe width="100%" height="420" src="https://www.loom.com/embed/91133871cd994723b4f9ca4b5b35228b" frameborder="0" webkitallowfullscreen mozallowfullscreen allowfullscreen />
</Frame>

<CardGroup cols={2}>
  <Card title="Getting started" icon="rocket" horizontal href="/references/pre-aggregates/getting-started">
    Define pre-aggregates in your dbt project and configure scheduling.
  </Card>

  <Card title="Monitoring and debugging" icon="chart-mixed" horizontal href="/references/pre-aggregates/monitoring">
    Track materialization status, debug query matching, and view hit/miss stats.
  </Card>

  <Card title="CLI audit" icon="terminal" horizontal href="/references/pre-aggregates/cli-audit">
    Inspect dashboard coverage from the terminal and gate CI on hit rates.
  </Card>
</CardGroup>

## How it works

Pre-aggregates follow a four-step cycle:

1. **Define** — You add a `pre_aggregates` block to your dbt model YAML, specifying which dimensions and metrics to include.
2. **Materialize** — Lightdash runs the aggregation query against your warehouse and stores the results. This happens automatically on compile, on a cron schedule you define, or when you trigger it manually.
3. **Match** — When a user runs a query, Lightdash checks if every requested dimension, metric, and filter is covered by a pre-aggregate.
4. **Serve** — If a match is found, the query is served from the materialized data instead of hitting your warehouse.

### Example

Suppose you have an `orders` table with thousands of rows, and you define a pre-aggregate with dimensions `status` and metrics `total_amount` (sum) and `order_count` (count), with a `day` granularity on `order_date`.

**Your warehouse data:**

| order\_date | status  | customer | amount |
| ----------- | ------- | -------- | ------ |
| 2024-01-15  | shipped | Alice    | \$100  |
| 2024-01-15  | shipped | Bob      | \$50   |
| 2024-01-15  | pending | Charlie  | \$75   |
| 2024-01-16  | shipped | Alice    | \$200  |
| 2024-01-16  | pending | Charlie  | \$30   |
| ...         | ...     | ...      | ...    |

**Lightdash materializes this into a pre-aggregate:**

| order\_date\_day | status  | total\_amount | order\_count |
| ---------------- | ------- | ------------- | ------------ |
| 2024-01-15       | shipped | \$150         | 2            |
| 2024-01-15       | pending | \$75          | 1            |
| 2024-01-16       | shipped | \$200         | 1            |
| 2024-01-16       | pending | \$30          | 1            |

Now when a user queries "total amount by status, grouped by **month**", Lightdash re-aggregates from the daily pre-aggregate instead of scanning the full table:

| order\_date\_month | status  | total\_amount |
| ------------------ | ------- | ------------- |
| January 2024       | shipped | \$350         |
| January 2024       | pending | \$105         |

This works because `sum` can be re-aggregated — summing daily sums gives the correct monthly sum.

## Query matching

When a user runs a query, Lightdash automatically checks if a pre-aggregate can serve the results. A pre-aggregate matches when **all** of the following are true:

* Every dimension in the query is included in the pre-aggregate
* Every metric in the query is included in the pre-aggregate
* Every dimension used in **filters** is included in the pre-aggregate
* If the pre-aggregate itself defines `filters`, the query must include an equivalent or narrower filter
* All metrics use [supported metric types](#supported-metric-types)
* The query does not contain raw SQL table calculations
* The query does not use [Parameters](/references/lightdash-config-yml#parameters-configuration), [`sql_filter`](/references/tables#sql-filter-row-level-security), or [user attributes](/references/workspace/user-attributes) inside SQL
* If the query uses a time dimension, the requested granularity must be **equal to or coarser** than the pre-aggregate's granularity

<Tip>
  A **day** pre-aggregate serves `day`, `week`, `month`, `quarter`, and `year` queries. A **month** pre-aggregate serves `month`, `quarter`, and `year` — but **not** `day` or `week`, since those need finer-grained data.
</Tip>

When multiple pre-aggregates match a query, Lightdash picks the smallest one (fewest dimensions, then fewest metrics as tiebreaker).

### Filtered pre-aggregates

Pre-aggregates can define static filters in their YAML definition. This lets you materialize a smaller slice of data for a common query pattern, such as `status = completed` or `order_date: inThePast 52 weeks`.

When a pre-aggregate has definition filters:

* Matching queries must include the same filter or a narrower one
* Queries without the filter, or with a broader or incompatible filter, fall back to another pre-aggregate or the warehouse

This prevents Lightdash from serving incomplete results from a filtered materialization.

### Dimensions from joined tables

Pre-aggregates support dimensions from joined tables. Reference them by their full name (for example, `customers.first_name`) in the `dimensions` list.

## Supported metric types

Pre-aggregates support metrics that can be re-aggregated from pre-computed results:

* `sum`
* `count`
* `min`
* `max`
* `average`

## Current limitations

Pre-aggregates support a narrower subset of the Lightdash semantic layer than regular warehouse queries.

### Not supported

Pre-aggregates do not support:

* [Personal warehouse connections](/references/workspace/personal-warehouse-connections). Materialization always runs under a single user's credentials, so warehouse-level access rules are not applied per viewer. If you rely on personal warehouse connections to enforce data access, use [results caching](/guides/developer/caching) instead.
* [Parameters](/references/lightdash-config-yml#parameters-configuration) — parameter values are picked at query time, so they cannot be resolved during materialization. Queries that use parameters fall back to the warehouse.
* [User attributes](/references/workspace/user-attributes) when referenced from SQL. [`required_attributes`](/references/tables#required-attributes) and [`any_attributes`](/references/tables#any-attributes) are still supported through [`materialization_role`](/references/pre-aggregates/getting-started#materialization-role).
* [Custom metrics](/guides/custom-fields#custom-metrics) created in the Explorer
* [Custom SQL dimensions](/guides/custom-fields#custom-sql) created in the Explorer ([Custom bin dimensions](/guides/custom-fields#bin) are supported)
* SQL table calculations ([Formula table calculations](/guides/formula-table-calculations#formula-table-calculations) are supported)

### SQL compatibility

[`sql_filter`](/references/tables#sql-filter-row-level-security) (and its alias `sql_where`) runs both at materialization time and at query time on top of the materialized data.

* **At materialization time**, the filter is evaluated against your warehouse. If the SQL references [Parameters](/references/lightdash-config-yml#parameters-configuration) or [user attributes](/references/workspace/user-attributes), the values injected come from the materialization context — you can pin this to a fixed identity or attribute set with [`materialization_role`](/references/pre-aggregates/getting-started#materialization-role) so the materialization captures the rows you need.
* **At query time**, the same filter is re-applied against the materialized data, which is served by DuckDB. If the `sql_filter` SQL uses warehouse-specific syntax that DuckDB doesn't understand, the query will fail to run against the pre-aggregate and fall back to the warehouse.

### Metrics that can't be pre-aggregated

Pre-aggregates do not support metric types that cannot be re-aggregated from pre-computed results.

For example, consider `count_distinct` on a daily pre-aggregate. If the pre-aggregate stores "2 distinct customers on 2024-01-15" and "1 distinct customer on 2024-01-16", you cannot sum those daily values to get the monthly distinct count, because the same customer can appear on multiple days.

| order\_date\_day | status  | distinct\_customers |
| ---------------- | ------- | ------------------- |
| 2024-01-15       | shipped | 2 (Alice, Bob)      |
| 2024-01-16       | shipped | 1 (Alice)           |

Re-aggregating gives `2 + 1 = 3`, but the correct monthly answer is `2` (`Alice`, `Bob`). The pre-aggregate no longer knows which customers were counted.

We're investigating supporting `count_distinct` through approximation algorithms. [Follow this issue](https://github.com/lightdash/lightdash/issues/21536) for updates.

For similar reasons, the following metric types are also not supported:

* `sum_distinct`, `average_distinct`
* `median`, `percentile`
* `percent_of_total`, `percent_of_previous`
* `running_total`
* Custom SQL / post-calculation metrics (including many `number` metrics) — [Follow this issue](https://github.com/lightdash/lightdash/issues/21537)
* `number`, `string`, `date`, `timestamp`, `boolean`

For metrics that can't be pre-aggregated, consider using [caching](/guides/developer/caching) instead.

## Pre-aggregates vs results caching

Lightdash has two independent systems for speeding up queries: **results caching** and **pre-aggregates**. They work differently and are designed to be used together, not as replacements for each other.

### Results caching

Results caching stores the exact result of any query that runs through Lightdash, keyed by a hash of the generated SQL. The first time a query runs, Lightdash executes it against your warehouse and caches the result in S3. Subsequent identical queries are served from the cache until it expires (24 hours by default).

Any change to the query — a different filter, column, limit, or user attribute — produces a new SQL hash, a new cache entry, and another warehouse query. Results caching covers every query shape, including custom metrics, table calculations, and SQL runner queries.

See the [caching guide](/guides/developer/caching) for details.

### Pre-aggregates

Pre-aggregates are summary tables you define in your dbt YAML. Lightdash materializes them on a schedule (or on compile, or manually) and stores the results in S3. When a user query matches the pre-aggregate's dimensions, metrics, filters, and granularity, Lightdash serves the query from the materialized data using in-memory DuckDB workers. The warehouse is not touched at query time, even on the first query.

A single pre-aggregate can serve many different queries. A daily pre-aggregate with five dimensions can answer day, week, month, quarter, and year queries across any subset of those dimensions and with any narrower filter. Results caching, in contrast, needs one cache entry per unique SQL.

### Key differences

|                                   | Results caching                                 | Pre-aggregates                                                       |
| --------------------------------- | ----------------------------------------------- | -------------------------------------------------------------------- |
| **Configuration**                 | Automatic once enabled for your instance        | Defined in dbt YAML                                                  |
| **Trigger**                       | First query runs against warehouse, then cached | Materialized on compile, cron, or manual refresh                     |
| **Storage**                       | Query result (row set)                          | Pre-computed summary table                                           |
| **Query execution**               | Exact cached result is returned                 | DuckDB workers re-aggregate at query time                            |
| **Warehouse hit on first query?** | Yes                                             | No — only materialization hits the warehouse, not query-time serving |
| **Coverage**                      | All metric types, all query shapes              | Only re-aggregatable metrics (sum, count, min, max, average)         |
| **Scope**                         | One cache entry per unique SQL                  | One pre-aggregate can serve many query shapes                        |
| **Availability**                  | Cloud Pro+ or self-hosted with license          | Enterprise (Beta)                                                    |

### When to use which

**Use pre-aggregates when:**

* You have high-traffic dashboards with predictable query patterns
* You want to reduce warehouse cost or improve latency on the first query, not just repeat visits
* The metrics are re-aggregatable (sum, count, min, max, average)
* You're willing to design and schedule the materializations

**Use results caching when:**

* Query patterns are ad-hoc or unpredictable
* You need unsupported features listed above, such as `count_distinct`, [Parameters](/references/lightdash-config-yml#parameters-configuration), [`sql_filter`](/references/tables#sql-filter-row-level-security), or raw SQL table calculations
* You're using the SQL runner
* You don't want upfront configuration work

<Tip>
  In most cases, both should be enabled. Pre-aggregates handle your heaviest, most predictable workloads. Results caching is the safety net for everything else.
</Tip>

### Using both together

When both systems are enabled, they act as two layers of caching. A query that matches a pre-aggregate is served from the materialized data by DuckDB workers. The result of that DuckDB query can then be stored in the results cache, so subsequent identical requests skip even the DuckDB step and return the cached result directly. This means pre-aggregates eliminate the warehouse hit, and results caching eliminates repeated computation on top of that.
