Skip to main content
Kuala Lumpur · Malaysia · APAC

Modern Data Stack in Kuala Lumpur

The modern data stack conversation in most organisations starts with "we need Snowflake." It shouldn't. It should start with "what problem are we actually trying to solve and what does our current stack prevent us from doing?"

The question isn't whether to adopt Snowflake or Databricks. The question is whether your data problem is actually a tool problem — and most of the time, it isn't. Malaysian manufacturing operations have typically invested in ERP but underinvested in the analytics layer. The Power BI deployment is common — the semantic model underneath it is less common. Most Kuala Lumpur-based manufacturers we engage with have Power BI dashboards that pull from Excel files or direct database connections rather than a governed data model. The ERP data is good. The analytics layer doesn't reflect it properly.

What we hear from operators

The problems we solve

These aren't hypothetical pain points assembled from industry reports. They're observations from actual plant floors, warehouse ops, and finance desks — written down because they come up in almost every first conversation.

01

The tool was bought before the architecture was designed

Snowflake or Databricks licences are purchased — often after a vendor demo or a board directive — before the data engineering team has designed the ingestion layer, the transformation model, or the semantic layer on top. The tool sits mostly idle, being used as an expensive substitute for what a well-configured cloud data warehouse already does. The gap between the licence cost and the business value is wide and growing.

02

Transformation logic lives in stored procedures nobody understands

Most organisations that have been running SQL-based analytics for more than three years have transformation logic buried in stored procedures, views, and scheduled jobs that predate the current team. Nobody fully understands them. Changes break things upstream. Testing doesn't exist. Introducing dBT isn't just a tooling decision — it's a process change that brings version control, testing, and documentation to transformation code.

03

Ingestion is bespoke, brittle, and constantly breaking

Custom-built ELT pipelines for every source system — each one slightly different, each one owned by the person who built it, each one breaking when the source system changes. Fivetran and Airbyte exist to solve this: managed connectors for 300+ sources, schema change handling built in, monitoring included. The decision to build vs buy these connectors is usually obvious in retrospect.

How we work

Our approach

01

Design the stack before buying the tools

We start with the data volumes, the required refresh frequency, the team's existing capability, and the downstream use cases. Then we recommend the right combination: Fivetran or Airbyte for ingestion, Snowflake or Databricks or Microsoft Fabric for storage and compute, dBT or SQLMesh for transformation, and the right semantic layer for the reporting tools in use. Stack selection is an output of the architecture process, not an input to it.

02

Build the transformation layer in dBT or SQLMesh

Transformation logic in version-controlled SQL models. Every model documented, tested, and lineage-tracked. Incremental models for large tables. Tests defined for every business-critical metric — null checks, uniqueness, referential integrity, and custom business logic tests. The first time a transformation breaks, it fails with a clear test error rather than silently corrupting a dashboard.

03

Connect the semantic layer and the reporting tools

The warehouse or lakehouse is the source of truth. On top of it, a semantic layer — whether that's dBT metrics layer, AtScale, or a Power BI semantic model — defines the business metrics once, in one place, used consistently across every reporting tool. The situation where the same metric returns different numbers in different reports becomes structurally impossible.

What changes

Outcomes

These are specific, measurable shifts — not benefit statements. Every outcome listed here has been achieved with a client.

Ingestion pipelines: bespoke and brittle → managed connectors with monitoring

Fivetran or Airbyte connectors replace custom ingestion code. Source schema changes handled automatically. Pipeline failures alerted within minutes, not discovered when a dashboard is wrong.

Transformation code: undocumented stored procedures → version-controlled dBT models

Every transformation in Git. Every model tested. Data lineage documented automatically. New team members understand the transformation layer from the dBT project — not from tribal knowledge.

Metric consistency: different numbers in different reports → single governed semantic layer

One definition of revenue, margin, OTIF, OEE — wherever the metric is displayed. The semantic layer becomes the single source of truth for metric definitions across the organisation.

Technology stack

SnowflakeDatabricksdBT (dbt Core / dbt Cloud)SQLMeshFivetranAirbyteApache SparkDelta LakePower BILookerMicrosoft Fabric

Common questions

What buyers ask us

These are questions that come up in almost every first or second conversation. If yours isn't here, it will be in the first call.

Snowflake or Databricks — which should we choose?

It depends on your primary use case. Snowflake is better suited for SQL-first analytics workloads — structured data, business intelligence, ad-hoc querying by analysts. Databricks is better for data science, machine learning, and streaming — Python-heavy workloads, Spark processing, ML model training and serving. Many organisations need both. If your primary requirement is operational analytics on structured ERP/transactional data, Snowflake or Microsoft Fabric is usually the better starting point.

We already use dBT but our models are a mess. Can you fix that?

A dBT project that has grown organically without governance typically has duplicated models, inconsistent naming, missing tests, and transformation logic that doesn't reflect current business definitions. We audit the project, identify what's used vs what's orphaned, document the business logic, add tests, and refactor the model structure. It's less glamorous than a greenfield build but the outcome — a maintainable, tested dBT project — is more valuable.

We have Fivetran already but we're not extracting everything we need. What's missing?

Fivetran covers the connection. What's usually missing is the transformation layer on top of the raw data it lands. Fivetran delivers raw source tables — schema-matched to the source system, with no business logic applied. dBT transforms that raw data into the dimensional model, aggregations, and calculated metrics that analytics requires. Fivetran without dBT gives you data availability. Fivetran with dBT gives you analytics.

Ready to move

Start with a conversation, not a proposal

First call is 45 minutes. No deck. We ask about your systems, your team, and your most pressing operational problem. You get a clear view of where the gap is and what closing it looks like. No obligation.