Skip to content
Back to Case Studies
Portfolio Project
March 18, 2025
E-Commerce / Retail

Building an Open-Source E-Commerce Analytics Platform with Supabase, dbt & Metabase

How I built a cost-effective, fully containerized BI solution that gives SMEs enterprise-grade analytics without the enterprise price tag.

Data Engineering
Business Intelligence
dbt
Supabase
Metabase

Key Outcomes

  • 100% open-source stack with zero licensing costs
  • Star schema dimensional model for optimized BI queries
  • Interactive dashboards for customer & product analytics
  • Fully containerized with one-command deployment

The Challenge

E-commerce businesses live and die by their data. Yet most small and medium enterprises face a frustrating reality:

  • Enterprise BI tools are expensive — Tableau, Looker, and Power BI subscriptions quickly exceed $1,000/month
  • Data lives in silos — Sales, customer, and inventory data scattered across disconnected systems
  • Technical complexity — Setting up a proper analytics pipeline requires engineering resources most SMEs don't have
  • Scalability concerns — Solutions that work for 10K orders break at 100K

I wanted to prove that world-class e-commerce analytics doesn't require a Fortune 500 budget.

The Goal: Build a production-ready, fully open-source analytics platform that any e-commerce business can deploy in a day.

My Approach

I designed the E-Commerce Analytics Platform as a modular, containerized solution using battle-tested open-source tools. Here's how I built it:

Phase 1: Architectural Design

Challenge: Most open-source BI setups are fragile—duct-taped configurations that break with updates.

Solution:

  • Designed a clean separation of concerns:
    • Supabase (PostgreSQL) for data storage with built-in REST API and auth
    • dbt for transformation and business logic
    • Metabase for visualization and exploration
    • Docker Compose for orchestration
  • Created a three-layer data model:
    • Staging (raw data normalization)
    • Intermediate (business logic processing)
    • Mart (analytics-ready aggregations)

Phase 2: Data Modeling with dbt

Challenge: Raw e-commerce data is messy—inconsistent formats, missing values, no standardization.

Solution: Built a comprehensive dbt project with 18 models across three layers:

Staging Layer (9 models):

  • stg_olist__customers.sql — Customer demographics and geography
  • stg_olist__orders.sql — Order headers with status tracking
  • stg_olist__order_items.sql — Line-item details with pricing
  • stg_olist__order_payments.sql — Payment transactions and methods
  • stg_olist__order_reviews.sql — Customer feedback and ratings
  • stg_olist__products.sql — Product catalog with categories
  • stg_olist__sellers.sql — Seller profiles and locations
  • stg_olist__geolocation.sql — Geographic coordinates for mapping
  • stg_olist__product_categories.sql — Category hierarchy mapping

Intermediate Layer (4 models):

  • int_customer_orders.sql — Customer purchase history aggregation
  • int_orders_with_items.sql — Denormalized order details
  • int_product_performance.sql — Product sales metrics
  • int_seller_performance.sql — Seller KPIs and rankings

Mart Layer (3 models):

  • mart_customer_analytics.sql — Customer LTV, cohorts, and segmentation
  • mart_product_analytics.sql — Product profitability and trends
  • mart_seller_analytics.sql — Seller performance scorecards

Phase 3: Star Schema Implementation

Challenge: Flat tables are easy but terrible for analytics. You need dimensional modeling.

Solution: Implemented a proper star schema optimized for BI queries:

Model TypePurposeExample Metrics
Fact TablesTransactional eventsOrder items, payments, reviews
Dimension TablesDescriptive attributesCustomers, products, sellers, dates
Aggregation TablesPre-computed metricsDaily sales, customer LTV, product rankings

Key design decisions:

  • Slowly Changing Dimensions (SCD Type 2) for customer and product history
  • Date dimension with fiscal periods, weekday flags, and holiday markers
  • Conformed dimensions enabling cross-domain analysis

Phase 4: Dashboard Development

Challenge: Dashboards that require SQL knowledge are useless to business users.

Solution: Built two comprehensive Metabase dashboards:

Customer Analytics Dashboard 📊

  • Geographic distribution heat map
  • Customer acquisition trends (daily/weekly/monthly)
  • Lifetime value distribution histogram
  • Cohort retention analysis
  • RFM segmentation matrix

Product Analytics Dashboard 🛍️

  • Revenue by category treemap
  • Top-selling products leaderboard
  • Price point analysis scatter plot
  • Seasonal trends time series
  • Category performance breakdown

Phase 5: Containerization & Deployment

Challenge: "It works on my machine" doesn't cut it for production.

Solution: Created a one-command deployment with Docker Compose:

services:
  supabase-db:     # PostgreSQL with Supabase extensions
  supabase-api:    # REST API and authentication
  metabase:        # BI visualization
  dbt:             # Transformation runner
  airflow:         # Orchestration (optional)

Features:

  • Health checks for all services
  • Volume persistence for data durability
  • Network isolation for security
  • Environment-based configuration for dev/staging/prod

Architecture

The platform follows a modern ELT (Extract, Load, Transform) pattern:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Sources   │────▶│   Supabase  │────▶│     dbt     │────▶│  Metabase   │
│  (CSV/API)  │     │ (PostgreSQL)│     │ (Transform) │     │   (BI/Viz)  │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
       │                   │                   │                   │
       │                   ▼                   ▼                   ▼
       │            Raw Tables          Analytics Models     Dashboards
       │            (Staging)           (Marts)              (Self-serve)
       │                   │                   │                   │
       └───────────────────┴───────────────────┴───────────────────┘
                        Orchestrated by Airflow (optional)

Key Results

MetricResult
Licensing Cost$0 (100% open-source)
Deployment Time<30 minutes with Docker
Data Models18 dbt models (staging → mart)
Dashboard Load Time<2 seconds
Query PerformanceOptimized with proper indexing
MaintainabilitySQL-based, version-controlled

Key Technical Decisions

Why Supabase over Raw PostgreSQL?

Supabase provides PostgreSQL plus:

  • Built-in REST API — No need to build backend endpoints
  • Row-Level Security (RLS) — GDPR/CCPA compliance made easy
  • Real-time subscriptions — Live dashboard updates
  • Authentication — User access control out of the box
  • Dashboard & SQL Editor — Great for debugging

Why dbt for Transformations?

  • Version control — All logic is SQL in Git
  • Dependency management — Automatic DAG execution order
  • Testing — Data quality checks built-in
  • Documentation — Auto-generated lineage graphs
  • Modularity — Reusable macros and packages

Why Metabase over Superset?

  • Lower learning curve — Business users can build dashboards
  • Better default visualizations — Looks good out of the box
  • Simpler deployment — Single Docker container
  • Active community — Great plugin ecosystem

Lessons Learned

1. Dimensional Modeling Matters

Flat tables are a trap. Spending time on proper star schema design:

  • Reduced query times by 10x
  • Made dashboards self-serve
  • Enabled future scalability

2. dbt Tests Saved Hours of Debugging

Built-in tests caught issues early:

models:
  - name: mart_customer_analytics
    tests:
      - unique:
          column_name: customer_id
      - not_null:
          column_name: customer_id
      - accepted_values:
          column_name: customer_segment
          values: ['VIP', 'Regular', 'At-Risk', 'New']

3. Docker Compose is Underrated

Instead of manual installation guides:

  • docker compose up -d — Done
  • Works identically on any machine
  • Easy to upgrade individual services

4. Start with Core Metrics

Resist the temptation to track everything. Focus on:

  • Customer Lifetime Value (LTV)
  • Customer Acquisition Cost (CAC)
  • Average Order Value (AOV)
  • Monthly Recurring Revenue (MRR)
  • Churn Rate

Extending the Platform

The modular architecture makes it easy to add:

Additional Data Sources:

  • Shopify/WooCommerce APIs
  • Google Analytics events
  • Marketing platform data (Facebook, Google Ads)
  • Inventory management systems

Advanced Analytics:

  • dbt Python models for ML predictions
  • Customer churn prediction
  • Demand forecasting
  • Recommendation engines

Production Features:

  • Airflow for scheduled orchestration
  • Great Expectations for advanced data quality
  • Kubernetes for scalable deployment

View the complete source code on GitHub.

Technology Stack

Python
PostgreSQL
Supabase
dbt
Metabase
Docker
Airflow

Have a similar challenge?

Let's discuss how we can help you achieve comparable results.