The Data Stack Setup Checklist: 30 Steps to a Production-Ready Analytics Platform

Building a data stack from scratch is complex. Use this checklist to ensure you don't miss critical steps—whether you're using Supabase + dbt + Metabase (like my E-Commerce Analytics project) or any modern data stack combination.

Phase 1: Infrastructure Setup (8 items)

1.1 Database Selection & Configuration

Choose your data warehouse based on scale and budget:
- Supabase/PostgreSQL: <100GB, $0-50/month
- BigQuery: 100GB-10TB, pay-per-query
- Snowflake: 10TB+, enterprise features
- Redshift: AWS ecosystem, predictable pricing
Configure database security:
- Strong passwords (16+ characters, no defaults)
- SSL/TLS encryption enabled
- IP allowlisting for production
- Separate users for different applications
Set up connection pooling if using PostgreSQL:
- PgBouncer or Supabase's built-in pooler
- Transaction mode for short-lived connections
- Session mode for long-running transformations
Enable query logging and monitoring:
- pg_stat_statements extension for PostgreSQL
- Query duration thresholds for alerting
- Slow query log analysis scheduled

1.2 Containerization & Orchestration

Create Docker Compose configuration:

version: '3.8'
services:
  database:
    image: postgres:15
    healthcheck:
      test: ['CMD-SHELL', 'pg_isready']
      interval: 10s
      timeout: 5s
      retries: 5
  transformation:
    build: ./dbt
    depends_on:
      database:
        condition: service_healthy
  visualization:
    image: metabase/metabase
    depends_on:
      - database

Configure persistent volumes:
- Database data directory
- Metabase application database
- Log files for debugging
Set up environment variable management:
- .env file for local development
- .env.example with dummy values in Git
- Secrets manager for production (Vault, AWS Secrets Manager)
Document deployment commands:
- docker compose up -d — Start all services
- docker compose logs -f — View logs
- docker compose down -v — Full cleanup

Phase 2: Data Modeling (6 items)

2.1 Source System Analysis

Inventory all data sources:

Source	Type	Refresh	Volume	Owner
Orders	PostgreSQL	Real-time	100K/day	E-commerce
Customers	API	Daily	10K	CRM
Products	CSV	Weekly	5K	Catalog

Document source schemas:
- Column names and data types
- Primary keys and foreign keys
- Known data quality issues
- Business definitions for ambiguous fields

2.2 Dimensional Model Design

Identify fact tables (transactional events):
- fact_orders — Order transactions
- fact_order_items — Line items
- fact_payments — Payment events
- fact_reviews — Customer feedback
Identify dimension tables (descriptive attributes):
- dim_customers — Customer demographics
- dim_products — Product catalog
- dim_sellers — Seller profiles
- dim_dates — Calendar dimension
- dim_geography — Location hierarchy
Define grain for each fact table:
- fact_orders → One row per order
- fact_order_items → One row per order line
- fact_payments → One row per payment installment
Document key business metrics:

Metric	Definition	Calculation
GMV	Gross Merchandise Value	SUM(order_item.price)
AOV	Average Order Value	GMV / COUNT(DISTINCT orders)
LTV	Customer Lifetime Value	SUM(customer_orders.total_spent)
CAC	Customer Acquisition Cost	Marketing Spend / New Customers

Phase 3: Transformation Layer (8 items)

3.1 dbt Project Structure

Initialize dbt project:
```
dbt init my_project
```

Configure dbt_project.yml:

name: 'ecommerce_analytics'
version: '1.0.0'
config-version: 2

model-paths: ['models']
test-paths: ['tests']
seed-paths: ['seeds']
macro-paths: ['macros']

models:
  ecommerce_analytics:
    staging:
      +materialized: view
      +schema: staging
    intermediate:
      +materialized: ephemeral
    marts:
      +materialized: table
      +schema: marts

Create folder structure:

models/
├── staging/
│   ├── _sources.yml
│   ├── stg_orders.sql
│   └── stg_customers.sql
├── intermediate/
│   ├── int_customer_orders.sql
│   └── int_product_performance.sql
└── marts/
    ├── core/
    │   ├── dim_customers.sql
    │   └── fact_orders.sql
    └── marketing/
        └── customer_segments.sql

3.2 Staging Layer Implementation

Create staging models for each source:

-- models/staging/stg_orders.sql
WITH source AS (
    SELECT * FROM {{ source('raw', 'orders') }}
),

renamed AS (
    SELECT
        order_id,
        customer_id,
        order_status AS status,
        order_purchase_timestamp::timestamp AS purchased_at,
        order_delivered_customer_date::timestamp AS delivered_at
    FROM source
)

SELECT * FROM renamed

Define sources in _sources.yml:

sources:
  - name: raw
    database: analytics
    schema: public
    tables:
      - name: orders
        description: 'Raw orders from e-commerce platform'
        columns:
          - name: order_id
            description: 'Unique order identifier'
            tests:
              - unique
              - not_null

3.3 Business Logic Layer

Create intermediate models for complex joins:

-- models/intermediate/int_customer_orders.sql
SELECT
    customer_id,
    COUNT(DISTINCT order_id) AS total_orders,
    SUM(order_total) AS lifetime_value,
    MIN(purchased_at) AS first_order_at,
    MAX(purchased_at) AS last_order_at,
    DATEDIFF('day', MIN(purchased_at), MAX(purchased_at)) AS customer_tenure_days
FROM {{ ref('stg_orders') }}
GROUP BY customer_id

Implement data quality tests:

models:
  - name: int_customer_orders
    columns:
      - name: customer_id
        tests:
          - unique
          - not_null
      - name: total_orders
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 1
      - name: lifetime_value
        tests:
          - not_null
          - dbt_utils.accepted_range:
              min_value: 0

Phase 4: Visualization Layer (4 items)

4.1 Metabase Configuration

Set up Metabase:
- Create admin account with strong password
- Connect to data warehouse
- Configure email settings for alerts
- Set up SSO if using enterprise

Create curated collections:

Collections/
├── Executive Dashboards/
│   ├── Daily KPIs
│   └── Monthly Business Review
├── Sales & Marketing/
│   ├── Customer Acquisition
│   └── Campaign Performance
└── Operations/
    ├── Fulfillment Metrics
    └── Inventory Levels

4.2 Dashboard Design

Build core dashboards:

Executive Dashboard:

Total Revenue (with period comparison)
Order Count
Average Order Value
Customer Count (new vs returning)
Top Products table
Geographic map

Customer Analytics Dashboard:

Customer segments pie chart
LTV distribution histogram
Cohort retention heatmap
RFM segmentation matrix
Acquisition trend line

Product Analytics Dashboard:

Revenue by category treemap
Top 20 products leaderboard
Price vs volume scatter
Category performance bars
Seasonal trends
Configure dashboard filters:
- Date range (preset options + custom)
- Product category
- Customer segment
- Geography (country/state/city)

Phase 5: Data Quality & Governance (4 items)

5.1 Automated Testing

Implement dbt tests:
- Uniqueness: Primary key columns
- Not null: Required fields
- Accepted values: Enum fields
- Relationships: Foreign key integrity
- Custom SQL tests for business rules

Add source freshness checks:

sources:
  - name: raw
    freshness:
      warn_after: { count: 12, period: hour }
      error_after: { count: 24, period: hour }
    tables:
      - name: orders
        loaded_at_field: _loaded_at

5.2 Documentation

Document all models:

models:
  - name: dim_customers
    description: 'Customer dimension with demographics and segments'
    columns:
      - name: customer_id
        description: 'Unique customer identifier (UUID)'
      - name: customer_segment
        description: 'RFM-based segment: VIP, Regular, At-Risk, Churned'

Generate and publish docs:

dbt docs generate
dbt docs serve --port 8080

Phase 6: Deployment & Operations (6 items)

6.1 CI/CD Pipeline

Create GitHub Actions workflow:

name: dbt CI

on:
  pull_request:
    branches: [main]

jobs:
  dbt-test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      - run: pip install dbt-postgres
      - run: dbt deps
      - run: dbt build --target ci

Set up branch protection:
- Require PR reviews before merge
- Require CI checks to pass
- No direct pushes to main

6.2 Scheduling & Orchestration

Configure refresh schedule:

Job	Schedule	Dependencies
Source freshness check	Every hour	None
Staging models	Every 4 hours	Source freshness
Mart models	Daily 6am UTC	Staging complete
Dashboard cache refresh	Daily 6:30am UTC	Mart complete

Set up Airflow DAG (if needed):

from airflow import DAG
from airflow.operators.bash import BashOperator
from datetime import datetime

with DAG('dbt_daily', start_date=datetime(2024, 1, 1), schedule='0 6 * * *'):
    dbt_run = BashOperator(
        task_id='dbt_run',
        bash_command='cd /dbt && dbt run --target prod'
    )
    dbt_test = BashOperator(
        task_id='dbt_test',
        bash_command='cd /dbt && dbt test --target prod'
    )
    dbt_run >> dbt_test

6.3 Monitoring & Alerting

Set up alerting:
- dbt test failures → Slack/email
- Source freshness violations → PagerDuty
- Dashboard query timeouts → Ops channel
- Database storage thresholds → Infrastructure team
Create runbooks:
- How to rerun failed dbt models
- How to backfill historical data
- How to add new data sources
- How to debug slow dashboards

Quick Reference: Technology Recommendations

For Startups (<$5M revenue, <10 employees)

Component	Recommendation	Cost
Warehouse	Supabase Free Tier	$0
Transformation	dbt Core	$0
Visualization	Metabase Open Source	$0
Orchestration	GitHub Actions	$0
Total		$0/month

For SMEs ($5M-50M revenue, 10-100 employees)

Component	Recommendation	Cost
Warehouse	Supabase Pro or BigQuery	$25-200/month
Transformation	dbt Core + dbt Cloud Developer	$0-100/month
Visualization	Metabase Pro	$85/user/month
Orchestration	Airflow on Cloud Run	$50-200/month
Total		$200-1000/month

For Growth Companies ($50M+ revenue)

Component	Recommendation	Cost
Warehouse	Snowflake or Databricks	$1000+/month
Transformation	dbt Cloud Team	$100/seat/month
Visualization	Looker or Tableau	$70+/user/month
Orchestration	Airflow (managed) or Dagster	$500+/month
Total		$3000+/month

Troubleshooting Common Issues

Database Connection Failures

# Check if database is running
docker compose ps

# View database logs
docker compose logs database

# Test connection manually
psql -h localhost -U postgres -d analytics

dbt Model Errors

# Debug specific model
dbt run --select model_name --target dev

# View compiled SQL
cat target/compiled/project/models/model_name.sql

# Check for circular dependencies
dbt ls --resource-type model --output json | jq

Metabase Query Timeouts

Check query execution plan in database
Add appropriate indexes
Consider materializing slow models as tables
Increase Metabase query timeout setting

Check out my E-Commerce Analytics Platform case study to see this checklist in action.

Need help implementing your data stack? Let's talk about building a solution tailored to your business.