Back to Portfolio
E-commerce Data Pipeline with dbt and BigQuery

E-commerce Data Pipeline with dbt and BigQuery

This project implements a data pipeline that extracts data from BigQuery, validates it using Pydantic models, and loads it into Streamlit.

dbtBigQueryPythonPydanticStreamlit

Overview

This project implements a data pipeline that extracts data from BigQuery public datasets, validates it using Pydantic models, and loads it into DuckDB. The pipeline supports multiple output destinations including local CSV files, Amazon S3, and MotherDuck for flexible data processing and analysis.

Key Highlights

  • Implemented a data pipeline that ingests data from the BigQuery public dataset 'thelook_ecommerce'
  • Validated data using Pydantic models to ensure data quality and type safety
  • Loaded validated data into DuckDB for efficient processing and analysis
  • Supported multiple output destinations: local CSV files, Amazon S3, and MotherDuck
  • Orchestrated the entire ETL pipeline with modular Python components for BigQuery interactions, DuckDB operations, and data validation
  • Enabled flexible data processing workflows for e-commerce analytics and insights

Technical Approach

The pipeline follows a clear data flow: Extract data from BigQuery using specified table names, Transform by validating with Pydantic models, Load into DuckDB, and Sink to multiple destinations (CSV, S3, MotherDuck) based on requirements.