Building an Open-Source E-Commerce Analytics Platform with Supabase, dbt & Metabase

The Challenge

E-commerce businesses live and die by their data. Yet most small and medium enterprises face a frustrating reality:

Enterprise BI tools are expensive — Tableau, Looker, and Power BI subscriptions quickly exceed $1,000/month
Data lives in silos — Sales, customer, and inventory data scattered across disconnected systems
Technical complexity — Setting up a proper analytics pipeline requires engineering resources most SMEs don't have
Scalability concerns — Solutions that work for 10K orders break at 100K

I wanted to prove that world-class e-commerce analytics doesn't require a Fortune 500 budget.

The Goal: Build a production-ready, fully open-source analytics platform that any e-commerce business can deploy in a day.

My Approach

I designed the E-Commerce Analytics Platform as a modular, containerized solution using battle-tested open-source tools. Here's how I built it:

Phase 1: Architectural Design

Challenge: Most open-source BI setups are fragile—duct-taped configurations that break with updates.

Solution:

Designed a clean separation of concerns:
- Supabase (PostgreSQL) for data storage with built-in REST API and auth
- dbt for transformation and business logic
- Metabase for visualization and exploration
- Docker Compose for orchestration
Created a three-layer data model:
- Staging (raw data normalization)
- Intermediate (business logic processing)
- Mart (analytics-ready aggregations)

Phase 2: Data Modeling with dbt

Challenge: Raw e-commerce data is messy—inconsistent formats, missing values, no standardization.

Solution: Built a comprehensive dbt project with 18 models across three layers:

Staging Layer (9 models):

stg_olist__customers.sql — Customer demographics and geography
stg_olist__orders.sql — Order headers with status tracking
stg_olist__order_items.sql — Line-item details with pricing
stg_olist__order_payments.sql — Payment transactions and methods
stg_olist__order_reviews.sql — Customer feedback and ratings
stg_olist__products.sql — Product catalog with categories
stg_olist__sellers.sql — Seller profiles and locations
stg_olist__geolocation.sql — Geographic coordinates for mapping
stg_olist__product_categories.sql — Category hierarchy mapping

Intermediate Layer (4 models):

int_customer_orders.sql — Customer purchase history aggregation
int_orders_with_items.sql — Denormalized order details
int_product_performance.sql — Product sales metrics
int_seller_performance.sql — Seller KPIs and rankings

Mart Layer (3 models):

mart_customer_analytics.sql — Customer LTV, cohorts, and segmentation
mart_product_analytics.sql — Product profitability and trends
mart_seller_analytics.sql — Seller performance scorecards

Phase 3: Star Schema Implementation

Challenge: Flat tables are easy but terrible for analytics. You need dimensional modeling.

Solution: Implemented a proper star schema optimized for BI queries:

Model Type	Purpose	Example Metrics
Fact Tables	Transactional events	Order items, payments, reviews
Dimension Tables	Descriptive attributes	Customers, products, sellers, dates
Aggregation Tables	Pre-computed metrics	Daily sales, customer LTV, product rankings

Key design decisions:

Slowly Changing Dimensions (SCD Type 2) for customer and product history
Date dimension with fiscal periods, weekday flags, and holiday markers
Conformed dimensions enabling cross-domain analysis

Phase 4: Dashboard Development

Challenge: Dashboards that require SQL knowledge are useless to business users.

Solution: Built two comprehensive Metabase dashboards:

Customer Analytics Dashboard 📊

Geographic distribution heat map
Customer acquisition trends (daily/weekly/monthly)
Lifetime value distribution histogram
Cohort retention analysis
RFM segmentation matrix

Product Analytics Dashboard 🛍️

Revenue by category treemap
Top-selling products leaderboard
Price point analysis scatter plot
Seasonal trends time series
Category performance breakdown

Phase 5: Containerization & Deployment

Challenge: "It works on my machine" doesn't cut it for production.

Solution: Created a one-command deployment with Docker Compose:

services:
  supabase-db: # PostgreSQL with Supabase extensions
  supabase-api: # REST API and authentication
  metabase: # BI visualization
  dbt: # Transformation runner
  airflow: # Orchestration (optional)

Features:

Health checks for all services
Volume persistence for data durability
Network isolation for security
Environment-based configuration for dev/staging/prod

Architecture

The platform follows a modern ELT (Extract, Load, Transform) pattern:

┌─────────────┐     ┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Sources   │────▶│   Supabase  │────▶│     dbt     │────▶│  Metabase   │
│  (CSV/API)  │     │ (PostgreSQL)│     │ (Transform) │     │   (BI/Viz)  │
└─────────────┘     └─────────────┘     └─────────────┘     └─────────────┘
       │                   │                   │                   │
       │                   ▼                   ▼                   ▼
       │            Raw Tables          Analytics Models     Dashboards
       │            (Staging)           (Marts)              (Self-serve)
       │                   │                   │                   │
       └───────────────────┴───────────────────┴───────────────────┘
                        Orchestrated by Airflow (optional)

Key Results

Metric	Result
Licensing Cost	$0 (100% open-source)
Deployment Time	<30 minutes with Docker
Data Models	18 dbt models (staging → mart)
Dashboard Load Time	<2 seconds
Query Performance	Optimized with proper indexing
Maintainability	SQL-based, version-controlled

Key Technical Decisions

Why Supabase over Raw PostgreSQL?

Supabase provides PostgreSQL plus:

Built-in REST API — No need to build backend endpoints
Row-Level Security (RLS) — GDPR/CCPA compliance made easy
Real-time subscriptions — Live dashboard updates
Authentication — User access control out of the box
Dashboard & SQL Editor — Great for debugging

Why dbt for Transformations?

Version control — All logic is SQL in Git
Dependency management — Automatic DAG execution order
Testing — Data quality checks built-in
Documentation — Auto-generated lineage graphs
Modularity — Reusable macros and packages

Why Metabase over Superset?

Lower learning curve — Business users can build dashboards
Better default visualizations — Looks good out of the box
Simpler deployment — Single Docker container
Active community — Great plugin ecosystem

Lessons Learned

1. Dimensional Modeling Matters

Flat tables are a trap. Spending time on proper star schema design:

Reduced query times by 10x
Made dashboards self-serve
Enabled future scalability

2. dbt Tests Saved Hours of Debugging

Built-in tests caught issues early:

models:
  - name: mart_customer_analytics
    tests:
      - unique:
          column_name: customer_id
      - not_null:
          column_name: customer_id
      - accepted_values:
          column_name: customer_segment
          values: ['VIP', 'Regular', 'At-Risk', 'New']

3. Docker Compose is Underrated

Instead of manual installation guides:

docker compose up -d — Done
Works identically on any machine
Easy to upgrade individual services

4. Start with Core Metrics

Resist the temptation to track everything. Focus on:

Customer Lifetime Value (LTV)
Customer Acquisition Cost (CAC)
Average Order Value (AOV)
Monthly Recurring Revenue (MRR)
Churn Rate

Extending the Platform

The modular architecture makes it easy to add:

Additional Data Sources:

Shopify/WooCommerce APIs
Google Analytics events
Marketing platform data (Facebook, Google Ads)
Inventory management systems

Advanced Analytics:

dbt Python models for ML predictions
Customer churn prediction
Demand forecasting
Recommendation engines

Production Features:

Airflow for scheduled orchestration
Great Expectations for advanced data quality
Kubernetes for scalable deployment

View the complete source code on GitHub.

Key Outcomes