This project demonstrates a complete analytics pipeline built with dbt, Snowflake, and Power BI, showcasing production-grade data modeling patterns used by data teams at top companies to drive real business decisions.
Key Analytics Questions Answered
Customer Segmentation & Lifetime Value
- RFM Analysis: Segment customers into actionable groups (Champions, At-Risk, Lost) based on Recency, Frequency, and Monetary value
- Customer Lifetime Value (CLV): Predict lifetime value of customers and identify high-ROI cohorts
- Churn Prediction: Identify customers at risk of churning based on behavioral signals and dormancy periods
Revenue & Performance Analysis
- Pareto Analysis (80/20 Rule): Identify which 20% of products and customers generate 80% of revenue
- Funnel & Conversion Analysis: Track conversion rates from order placement to delivery confirmation
- Seller Performance Scoring: Rank sellers using composite scores based on delivery time, reviews, and volume
Cohort & Trend Analysis
- Cohort Analysis: Compare purchasing behavior across monthly acquisition cohorts and track retention curves
- Time Intelligence: Analyze MoM/YoY growth with 7-day and 30-day moving averages
- Geographic Performance: Identify regional opportunities based on average order value and delivery performance
Market Basket Analysis
- Co-purchase Patterns: Discover products frequently purchased together for cross-sell optimization
- Product Bundling: Identify bundling opportunities based on co-occurrence matrices
Advanced SQL Patterns
| Pattern | Business Value | Key SQL Features |
|---|---|---|
| RFM Analysis | Customer segmentation | NTILE(), CASE WHEN scoring |
| Pareto Analysis | Focus on high-impact items | SUM() OVER, cumulative percentages |
| Customer Lifetime Value | Revenue forecasting | Cohort averages, predictive aggregations |
| Funnel Analysis | Conversion optimization | COUNT(CASE WHEN...), stage ratios |
| Cohort Analysis | Retention tracking | DATE_TRUNC, cohort pivots |
| Market Basket | Cross-sell opportunities | Self-joins, co-occurrence matrices |
| Churn Indicators | Proactive retention | DATEDIFF, LAG(), behavioral flags |
| Time Intelligence | Trend analysis | LAG(), moving averages, YoY/MoM |
| Seller Scoring | Vendor management | PERCENT_RANK(), weighted composites |
| Geo Performance | Regional strategy | Location-based aggregations |
Architecture
The project implements a Medallion Architecture with a 2-database pattern for environment isolation:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Kaggle │ │ Snowflake │ │ dbt │ │ Power BI │
│ (Source) │─────▶│ (Warehouse) │─────▶│ (Transform) │─────▶│ (Visualize) │
└──────────────┘ └──────────────┘ └──────────────┘ └──────────────┘
Key Design Decisions:
- Bronze + Silver in DEV only — No data duplication, cost efficient
- Gold layer separated — Dev and Prod environments fully isolated
- Cross-database reference — PROD reads from DEV.STAGING (single source of truth)
- CI/CD with GitHub Actions — Automated testing and deployment
Data Layers
| Layer | Purpose |
|---|---|
| Bronze (RAW) | Raw source data, immutable |
| Silver (STAGING) | Cleaned, typed, validated views |
| Gold (INTERMEDIATE) | Joined and enriched models |
| Gold (MARTS) | Fact and dimension tables for BI |
Tech Stack
| Component | Tool |
|---|---|
| Warehouse | Snowflake |
| Transform | dbt |
| Orchestration | GitHub Actions CI/CD |
| Visualization | Power BI |
| Package Manager | uv (Python) |
Data Model
Core Fact Tables:
fct_orders— One row per order with metricsfct_order_items— One row per order item
Customer Analytics:
fct_rfm_segments— RFM scores and customer segmentsfct_cohort_retention— Cohort retention metricsfct_clv_customer— Customer lifetime value calculationsfct_churn_risk— Churn risk scores
Finance & Marketing:
fct_daily_revenue— Daily revenue aggregatesfct_category_performance— Category metrics by monthfct_geo_performance— Geographic performance by state
Dimensions:
dim_customers,dim_cohorts,dim_dates,dim_products,dim_sellers