Retail Logistics: Real-Time Data Platform Processing 2M+ Events/Day

6 min read

Client: Retail logistics platform · 800+ stores across India · ~$180M GMV

Real-Time Lakehouse Architecture800 stores · 2M+ events/day · <2 min freshness14 SourcesPOS systemsInventoryLogistics APICRM, ERP, …+ 10 moreApache KafkaSchema registry2M events/dayDelta Lakeon S3 · dbtBronze/Silver/GoldSnowflakeBI query engineSub-50ms p95Power BI800 storedashboardsResult: 12-hour latency → <2-minute freshness · 99.99% pipeline uptime
The end-to-end architecture Ohveda built for a 800-store retail logistics platform in 12 weeks.

Engagement: 12-week lakehouse build · November 2025–February 2026

The Challenge

Fragmented data infrastructure: 14 source systems, 12-hour data latency, no unified analytics layer. Operations teams making decisions on stale data. The 2025 holiday season exposed the cost — $2M+ in inventory mismatches.

Approach

01

Weeks 1–2 · Audit + use-case prioritization

Inventoried 14 source systems. Workshops with operations + finance leadership to rank the top 8 use cases by P&L impact. Real-time inventory and demand forecasting led the list.

02

Weeks 3–5 · Streaming pipeline

Apache Kafka with schema registry. Debezium CDC connectors against MySQL/Postgres source DBs. Delta Lake on S3 (Bronze/Silver/Gold) for unified storage. End-to-end exactly-once semantics.

03

Weeks 6–9 · Transformations + serving

140 dbt models for the analytics layer. Snowflake as BI-facing query engine via Iceberg external tables. Power BI dashboards for store managers; refresh via Direct Lake.

04

Weeks 10–12 · Productionization

Grafana + PagerDuty monitoring with SLO-based alerting. Team enablement workshops. Runbooks. Pipeline tested with simulated holiday-volume traffic.

Sample: Inventory Reconciliation dbt Model

-- models/marts/inventory_realtime.sql
{{ config(
    materialized='incremental',
    unique_key=['store_id', 'sku'],
    on_schema_change='sync_all_columns',
    incremental_strategy='merge'
) }}

with pos_movements as (
    select
        store_id,
        sku,
        sum(quantity_change) as net_movement,
        max(event_ts) as last_event
    from {{ ref('silver_pos_events') }}
    {% if is_incremental() %}
        where event_ts > (select max(last_event) from {{ this }})
    {% endif %}
    group by 1, 2
),

current_inventory as (
    select
        store_id,
        sku,
        on_hand_qty,
        reserved_qty,
        on_hand_qty - reserved_qty as available_qty,
        last_count_ts
    from {{ ref('silver_inventory_snapshots') }}
)

select
    coalesce(p.store_id, c.store_id) as store_id,
    coalesce(p.sku, c.sku)            as sku,
    c.available_qty,
    p.net_movement,
    c.available_qty + coalesce(p.net_movement, 0) as projected_qty,
    case
        when c.available_qty + coalesce(p.net_movement, 0) < 0 then 'OVERSOLD'
        when c.available_qty + coalesce(p.net_movement, 0) < 5 then 'LOW'
        else 'OK'
    end as status,
    greatest(p.last_event, c.last_count_ts) as last_updated
from pos_movements p
full outer join current_inventory c
    on p.store_id = c.store_id and p.sku = c.sku

Results

Metric Before After
Data freshness 12 hours <2 minutes
Events / day processed ~250K (batch) 2M+ (streaming)
Operational query p95 4–8s <50ms
Pipeline uptime (90 days) 99.99%

“We went from 12-hour stale data to real-time operational dashboards in under 12 weeks. The platform Ohveda built is the foundation of every decision we make today.” — Rahul Sharma, Head of Engineering