Back to Portfolio
Project · Analytics Engineering · dbt + BigQuery

Transforming 64 SAP source tables into analytics-ready dimensional models through a three-layer dbt pipeline on BigQuery.

Analytics engineering certification project built on dbt Core and BigQuery. Adventure Works SAP data flows through three disciplined layers — staging standardises raw inputs, intermediate applies business logic, and marts deliver domain-organised dimensional models ready for BI consumption. Every layer covered by automated dbt tests and generated documentation.

TypeAnalytics Engineering
Sources64 SAP tables
WarehouseBigQuery
Layers3 · stg → int → mart
dbtBigQuerySQLPythonData Modeling
01 — Context & Approach

From raw SAP exports to a dimensional schema — the analytics engineer's job.

The source data problem

SAP systems generate tables optimised for transactional throughput, not analytical querying. Column names are abbreviated, types are inconsistent across modules, null handling is implicit, and relationships are buried in surrogate key conventions that vary by SAP module. Adventure Works exposes all of this across 64 tables spanning six business domains.

The analytics engineer's job is to impose order without losing fidelity: standardise the grain, enforce types, name things for humans, and model the business logic once — so every downstream consumer sees a consistent, documented, testable surface.

Why the three-layer architecture

Separation of concerns. Staging is never business logic — it is just standardisation. Intermediate is never a BI model — it is preparation. Marts are the contract: the stable, domain-facing surface that BI tools and analysts depend on.

When a source changes schema, only the staging model needs to change. When business logic changes, only the intermediate model changes. The mart stays stable — and with it, the dashboards that depend on it.

Infochart 01 · Source Table Inventory
64 SAP source tables across six business domains
Each domain maps to a distinct SAP module. The staging layer creates one-to-one models for every source table shown here.
SAP Module · Table Count · Sample TablesDomainTablesDistributionSample SourcesSales & Distribution14SalesOrderHeader · SalesOrderDetail · CustomerSalesPerson · SalesTerritory · SpecialOfferProducts & Inventory13Product · ProductCategory · ProductSubcategoryProductInventory · ProductPhoto · ProductReviewProduction12WorkOrder · WorkOrderRouting · ProductDocumentLocation · ProductCostHistory · BillOfMaterialsPurchasing11PurchaseOrderHeader · PurchaseOrderDetail · VendorProductVendor · ShipMethod · VendorAddressHuman Resources8Employee · Department · ShiftJobCandidate · EmployeeDepartmentHistory+ 6 shared / master data tables64 totalAddress · AddressType · ContactType · CountryRegion · Currency · StateProvince
02 — Three-Layer Architecture

Staging cleans. Intermediate joins. Marts deliver.

The three-layer pattern imposes strict separation of concerns. No staging model contains business logic. No mart model reads directly from source. Each layer has a single responsibility — and a test suite to prove it.

Infochart 02 · dbt Layer Architecture
Source → Staging → Intermediate → Marts · each layer's job
Reading left to right: data fidelity increases, business meaning increases, and the audience shifts from engineers to analysts.
Source (BigQuery Raw)64 tables · 6 domainssales.SalesOrderHeadersales.SalesOrderDetailproduct.Productproduct.ProductCategoryhr.Employeepurchasing.Vendor… 58 more tablesAudienceData EngineersStaging (stg_*)1-to-1 source mappingWhat happens here▸ Column renaming (snake_case)▸ Type casting (STRING → INT64)▸ Null default substitution▸ No business logic whatsoeverExample modelsstg_sales__ordersstg_product__productsstg_hr__employeesTestsnot_null · uniqueIntermediate (int_*)Business logic & joinsWhat happens here▸ Joining related staging models▸ Applying business rules▸ Enriching with derived fields▸ Preparing grain for mart joinsExample modelsint_orders__enrichedint_products__with_categoryint_employees__historyTestsnot_null · relationshipsMarts (fct_* + dim_*)Business-domain dimensional modelsFact tablesfct_salesone row per sales order lineDimension tablesdim_customerdim_productdim_employeedim_territorydim_dateAudienceAnalysts · BI ToolsDirection of abstractionRaw & verboseBusiness-ready & documented
03 — BigQuery Dataset Structure

One BigQuery project, four datasets — each owned by a dbt layer.

dbt manages the physical dataset structure in BigQuery. Each layer materialises into its own dataset, giving analysts a clear contract: query mart tables, never staging or intermediate tables directly. The layer boundary is both a modelling convention and an access-control point.

Infochart 03 · BigQuery Dataset Layout
Project → Dataset → Table hierarchy by dbt layer
dbt's target schema configuration maps directly to BigQuery dataset names — making layer boundaries visible at the infrastructure level, not just in code.
BigQuery Project · adventure-works-analyticsdbt-managed · dev + prod targetsDataset: raw_adventure_worksSource · read-onlySalesOrderHeaderSalesOrderDetailCustomerProductProductCategoryEmployeeDepartmentVendor… 56 more tables64 tablesDataset: stagingdbt materialisation: viewstg_sales__ordersstg_sales__orderdetailstg_product__productsstg_product__categoriesstg_hr__employeesstg_purchasing__vendors… 58 more models64 view modelsDataset: intermediatedbt materialisation: viewint_orders__enrichedint_products__with_catint_employees__historyint_customers__joinedint_territories__sales… additional modelsview modelsDataset: martsdbt materialisation: tablefct_salesdim_customerdim_productdim_employeedim_territorydim_dateStable contract for BI tools.Analysts never query upstream layers.table materialisations
04 — Dimensional Model

A classic star schema — one fact, five dimensions.

The mart layer delivers a dimensional model centred on the sales fact — the most analytically valuable grain in the Adventure Works dataset. Every dimension is directly joinable via surrogate key, following Kimball's dimensional modelling principles.

Infochart 04 · Mart Layer — Star Schema
fct_sales at the centre · five dimension tables
All joins occur via surrogate keys generated in the staging and intermediate layers. The fact table grain is one row per sales order line item.
fct_salesGrain: one row per order line itemorder_key (SK)customer_key (FK)product_key (FK)employee_key (FK)territory_key (FK)order_date_key (FK)unit_price · order_qty · line_total · discountdim_customercustomer_key (SK)customer_idfirst_name · last_nameemail_address · phonedim_productproduct_key (SK)product_idproduct_namecategory · subcategorylist_price · colordim_datedate_key (SK)date · year · quartermonth · week · day_nameis_weekend · fiscal_perioddim_employeeemployee_key (SK)employee_idfull_name · job_titledepartment · hire_datedim_territoryterritory_key (SK)territory_idterritory_namecountry_regiongroup (North America, etc.)Join via surrogate key (FK → SK)SK = surrogate key · FK = foreign key
05 — Test Coverage & Documentation

Every model tested. Every column documented. No exceptions.

dbt's built-in test framework is applied at every layer. Tests are not optional bolt-ons — they are the mechanism by which the pipeline proves to downstream consumers that the data meets its documented contract.

64
Source tables modeled
Every SAP source table represented in the staging layer with a 1-to-1 model
3
dbt layers
Staging → Intermediate → Marts, each with its own materialisation and test suite
100%
Model documentation
All models documented in schema.yml files and published via dbt docs generate
Infochart 05 · Automated Test Coverage
Test types applied per layer · dbt built-in + custom tests
Tests run on every dbt build. Failures block downstream materialisation — a broken mart cannot be produced from untested source data.
Test type · Coverage scope · Example assertionLayerTest Types AppliedCoverageWhat it provesStagingstg_* modelsnot_nulluniqueApplied to all primary key columns and all non-nullable business keysAll 64 staging models · every PK columnNo null surrogate keysreach the intermediate layerIntermediateint_* modelsnot_nullrelationshipsFK integrity validated — every join key must exist in its referenced staging modelFK columns · join key cardinalityReferential integrity holdsacross all join relationshipsMartsfct_* · dim_*not_nulluniqueaccepted_valuesrelationshipsFull contract validation — the stable BI surface is the most thoroughly tested layerAll mart columns · FK → dim integrity · enum fieldsBI consumers get onlyvalidated, documented datadbt docs generate— runs on every build. Every model, column, test, and relationship is captured in a browsable documentation site.Source freshness checks, model lineage graphs, and test result history are all included in the generated output.
06 — Deliverables

Everything a BI consumer and an engineering reviewer would expect to find.

The project is open-source and documented to certification standard. Every deliverable below is present in the repository and can be verified without running the pipeline locally.

dbt Project (staging → intermediate → marts)
64 staging models covering all SAP source tables, multiple intermediate transformations, and a full star-schema mart layer covering Sales, Products, HR, and Territory domains.
Automated Test Suite
not_null, unique, accepted_values, and relationships tests at every layer. Tests block downstream materialisation on failure — a broken mart cannot be produced from untested source data.
dbt Documentation Site
Generated via dbt docs generate — all models, columns, tests, and source-to-mart lineage graphs browsable without running the project locally.
Conceptual Data Model
Entity-relationship diagram documenting the mart layer star schema — published externally for stakeholder review and onboarding.
BI Dashboard
Dashboard built directly on the mart layer, demonstrating that the dimensional models are BI-tool-ready without further transformation.
profiles.yml + packages.json
Fully configured BigQuery connection profile and package manifest — the project is reproducible from a fresh clone with BigQuery credentials.

Analytics engineering is not about writing SQL — it is about building a reliable contract between raw data and the people who depend on it. Three layers. Automated tests. Generated documentation. No shortcuts.