Data Foundation

USA · Packaging Manufacturing

Korpack

“The production manager would open Power BI at 8am. The data was from yesterday.”

Semantic model refresh slashed 90% - from 1.5 hours to 15 minutes with Microsoft Fabric.

Key Results

15 min

Semantic model refresh

Was 1.5 hours

−85%

Pipeline runtime

Was 3–4 hours

Real-time

Data freshness

As soon as data lands

Storage layer

OneLake replaces 4 separate stores

Tech Stack

Microsoft FabricOneLakeDirect LakeAzure SQLAzure Data FactoryCDCPower BI

Built on

Microsoft FabricOneLakeDirect LakeAzure SQLAzure Data FactoryCDCPower BI

The Situation

At Korpack, the analytics platform existed - Power BI was deployed, dashboards were built, KPIs were defined. The problem was that by the time anyone opened a dashboard, the data was already hours old. The semantic model took 1.5 hours to refresh. The Azure Data Factory pipeline that fed it ran for 3–4 hours. And because there was no Change Data Capture, every single run processed the entire dataset from scratch - regardless of how much had actually changed. The result was a BI platform that the business had lost confidence in. "Is this live data or yesterday's?" became a standard question before anyone acted on a number. When BI data can't be trusted to be current, people stop using it - and go back to calling the ERP directly or asking someone in operations.

✓

In manufacturing and packaging operations, this pattern is extremely common:

✓
Your dashboards take 1–2 hours to refresh - so they're almost never showing live data
✓
Your ETL or data pipelines run for hours even when very little has actually changed
✓
People have stopped trusting the BI tool because they're never sure if the data is current
✓
You're running full dataset refreshes because nobody set up incremental loading from the start
✓
You've moved to a modern BI tool but the underlying architecture is still batch-based and slow
✓
Operations decisions default back to ERP or direct system queries because "the dashboard is probably out of date"

If three or more of these describe your operation, you're looking at the right case study.

The Root Problem

1
Power BI semantic model refresh took 1.5 hours - meaning dashboards were stale for most of the working day
2
Azure Data Factory pipelines processed the entire dataset from scratch on every run (3–4 hours each)
3
No Change Data Capture meant 85–90% of each pipeline run was redundant - reprocessing unchanged records
4
Business users had lost confidence in the BI platform because they couldn't rely on data being current
5
Operations and finance decisions were reverting to direct ERP queries, undermining the BI investment

How We Fixed It

Diagnose the architecture before touching anything

The first step was understanding exactly why the pipeline was slow - not assuming. We traced data from source to dashboard and found two root causes: the ADF pipeline had no incremental logic (full table scans every run), and the Power BI semantic model was using Import mode which required a full data reload on every refresh. Both were fixable without replacing the underlying data.

Migrate to Microsoft Fabric - OneLake as the single storage layer

We migrated the platform to Microsoft Fabric, collapsing what had been separate compute and storage layers into one. OneLake became the single storage destination for all data, eliminating the copy-move-transform cycles that were adding latency at every step. Fabric's native integration between ADF, Spark, and Power BI also removed the handoff overhead between tools.

One storage layer. No more copy-move-transform cycles between systems.

Implement Change Data Capture on Azure SQL

CDC was enabled on all Azure SQL source tables. Now ADF pipelines only process rows that have actually changed since the last run - not the full dataset. A pipeline that previously scanned 40 million rows now scans 15,000 changed rows. That is where the 85% time reduction came from.

40M rows scanned → 15K changed rows. Same result. 85% of the time gone.

Rebuild semantic models in Direct Lake mode

Power BI semantic models were rebuilt using Direct Lake mode - which queries OneLake directly without importing data into a separate engine. The 1.5-hour import refresh cycle was eliminated entirely. Dashboards now reflect data as soon as it lands in OneLake - no separate refresh job needed.

Implement Hub & Spoke to enforce clean data boundaries

A Hub & Spoke architecture was put in place within Fabric: a central Hub containing curated, governed data, with department-specific Spokes consuming from it. This ensured incremental processing was maintained at every layer and gave the business a scalable pattern for adding new data sources without rebuilding the core.

Measured Outcomes

MetricBeforeAfter

Semantic model refresh

1.5 hours

15 minutes (−90%)

↑ Key win

ADF pipeline runtime

3–4 hours

15–20 minutes (−85%)

↑ Key win

Data freshness

Hours stale

Near real-time

Processing method

Full table scans every run

CDC incremental only

BI trust / adoption

Low - "is this current?"

High - live data, verified

What This Means For You

What this means if your BI platform feels slow or untrustworthy

Slow pipelines and stale dashboards are almost never a tooling problem - they're an architecture problem. Power BI, Fabric, and ADF are all capable of near-real-time data at scale. But if the underlying pattern is "full table scan → import refresh", speed is structurally impossible regardless of compute power you throw at it. The fix - CDC plus Direct Lake - is well-established and deployable without replacing your existing BI investments. If your team has stopped trusting the dashboard, that's the symptom. The root cause is almost certainly in the pipeline architecture.

Next Step

Is this your situation?

Book a 30-minute call. No slides, no pitch. We'll look at your specific setup, tell you what's causing the problem, and what a realistic fix looks like - including timeline and cost range.

Book a Discovery Call ← All Case Studies