Transforming 64 SAP source tables into analytics-ready dimensional models through a three-layer dbt pipeline on BigQuery.
Analytics engineering certification project built on dbt Core and BigQuery. Adventure Works SAP data flows through three disciplined layers — staging standardises raw inputs, intermediate applies business logic, and marts deliver domain-organised dimensional models ready for BI consumption. Every layer covered by automated dbt tests and generated documentation.
From raw SAP exports to a dimensional schema — the analytics engineer's job.
The source data problem
SAP systems generate tables optimised for transactional throughput, not analytical querying. Column names are abbreviated, types are inconsistent across modules, null handling is implicit, and relationships are buried in surrogate key conventions that vary by SAP module. Adventure Works exposes all of this across 64 tables spanning six business domains.
The analytics engineer's job is to impose order without losing fidelity: standardise the grain, enforce types, name things for humans, and model the business logic once — so every downstream consumer sees a consistent, documented, testable surface.
Why the three-layer architecture
Separation of concerns. Staging is never business logic — it is just standardisation. Intermediate is never a BI model — it is preparation. Marts are the contract: the stable, domain-facing surface that BI tools and analysts depend on.
When a source changes schema, only the staging model needs to change. When business logic changes, only the intermediate model changes. The mart stays stable — and with it, the dashboards that depend on it.
Staging cleans. Intermediate joins. Marts deliver.
The three-layer pattern imposes strict separation of concerns. No staging model contains business logic. No mart model reads directly from source. Each layer has a single responsibility — and a test suite to prove it.
One BigQuery project, four datasets — each owned by a dbt layer.
dbt manages the physical dataset structure in BigQuery. Each layer materialises into its own dataset, giving analysts a clear contract: query mart tables, never staging or intermediate tables directly. The layer boundary is both a modelling convention and an access-control point.
A classic star schema — one fact, five dimensions.
The mart layer delivers a dimensional model centred on the sales fact — the most analytically valuable grain in the Adventure Works dataset. Every dimension is directly joinable via surrogate key, following Kimball's dimensional modelling principles.
Every model tested. Every column documented. No exceptions.
dbt's built-in test framework is applied at every layer. Tests are not optional bolt-ons — they are the mechanism by which the pipeline proves to downstream consumers that the data meets its documented contract.
Everything a BI consumer and an engineering reviewer would expect to find.
The project is open-source and documented to certification standard. Every deliverable below is present in the repository and can be verified without running the pipeline locally.
Analytics engineering is not about writing SQL — it is about building a reliable contract between raw data and the people who depend on it. Three layers. Automated tests. Generated documentation. No shortcuts.