Analytics Architecture

Why I Prefer BigQuery, dbt, and Airflow

By Kris Kokomoor Role Principal Data Engineer / System Architect Published May 28, 2026 Last updated May 28, 2026 Tags gcp, bigquery, airflow, dbt, data-engineering, architecture, cloud-native Read time 6 minute read

Let the cloud-native data plane scale. Keep the control plane understandable.

Modern data platforms are becoming increasingly complex. Over the past several years I have repeatedly encountered architectures containing distributed Spark clusters, multiple orchestration layers, proprietary transformation engines, Kubernetes fleets, vendor-specific SQL extensions, and operational patterns that require entire teams simply to keep the platform functioning.

I understand how we arrived here. At sufficiently large scale, distributed compute systems absolutely have their place. But in many organizations, I believe the modern data stack has drifted toward unnecessary complexity and operational overhead.

My own architectural bias has increasingly shifted toward a simpler cloud-native ELT model built around:

BigQuery as the compute engine
dbt as the transformation framework
Airflow as the orchestration/control plane
Cloud Run or Cloud Functions as stateless execution wrappers
Object storage as the landing and raw persistence layer

The core principle is simple:

Let the cloud-native data plane scale. Keep the control plane understandable.

BigQuery Already Solves the Hard Part

BigQuery provides massively parallel distributed execution as a managed service. That is an enormously important observation.

When transformations are fundamentally SQL-oriented, introducing a second distributed compute layer often duplicates capabilities that already exist within the warehouse itself.

In many environments, Spark clusters are introduced before anyone has demonstrated that BigQuery cannot perform the required work efficiently. The result is frequently:

additional infrastructure
additional operational burden
duplicated optimization concerns
additional security surface area
additional failure modes
additional onboarding complexity

If the warehouse already provides elastic distributed execution, I prefer to start by exploiting that capability fully before introducing another distributed processing layer.

This is not an anti-Spark argument. Spark is an excellent system. There are workloads for which it is clearly the right answer.

It is an argument against introducing distributed compute reflexively.

Airflow as a Control Plane, Not a Compute Engine

I frequently hear concerns that an Airflow VM "won't scale."

Usually this reflects a misunderstanding of the role Airflow is intended to play.

In this model:

Airflow orchestrates
BigQuery computes
Cloud Run executes stateless jobs
dbt manages transformations and lineage

The Airflow server itself is not processing terabytes of data. It is coordinating dependencies, execution order, retries, scheduling, and observability.

The orchestration layer should not need to scale at the same rate as the data plane.

A modest Airflow VM can comfortably orchestrate substantial enterprise workloads when the heavy lifting occurs in managed cloud-native systems.

If orchestration load eventually grows beyond a single VM, there are well-understood migration paths:

CeleryExecutor
KubernetesExecutor
Managed Airflow
Composer

But I strongly prefer to earn that complexity rather than assume it from day one.

The Hidden Cost of Platform Sprawl

Every distributed system introduced into an architecture creates operational gravity.

Clusters require:

monitoring
upgrades
patching
security management
IAM integration
tuning
disaster recovery planning
staffing expertise

This overhead is frequently underestimated during initial architecture discussions because the platform is evaluated primarily through the lens of capability rather than operational lifetime cost.

I prefer architectures that reduce:

moving parts
persistent infrastructure
operational coupling
specialized operational knowledge

Simplicity is not merely aesthetic.

Simplicity improves:

reliability
onboarding
maintainability
troubleshooting
auditability
security posture
organizational adaptability

Why dbt Fits This Model So Well

dbt occupies a particularly elegant position in this architecture.

It provides:

transformation structure
dependency management
lineage
testing
documentation
modular SQL development

while still allowing the warehouse itself to perform execution.

This separation is important.

I prefer systems where:

orchestration remains orchestration
transformation remains transformation
compute remains compute

dbt encourages this separation naturally.

Avoiding Vendor Lock-In

One of my concerns with some modern data platforms is the increasing reliance on proprietary abstractions and vendor-specific SQL semantics.

The more business logic becomes intertwined with platform-specific behavior:

the harder migration becomes
the harder reasoning becomes
the more organizational leverage shifts toward the vendor

I strongly prefer:

standard SQL
portable orchestration
explicit infrastructure
open execution semantics

This does not eliminate vendor dependence entirely. No cloud architecture truly does. But it reduces unnecessary coupling.

The Cognitive Dimension of Architecture

One influence on my thinking comes from earlier exposure to RISC philosophy during my years working around Sun Microsystems technologies.

The insight was not merely that smaller instruction sets were efficient.

It was that simpler systems are often easier to reason about deeply.

I believe the same principle applies to modern data architecture.

A platform should be understandable.

An engineer should be able to explain:

where data lives
how it moves
where computation occurs
how failures propagate
how recovery works
how costs are generated

without requiring a whiteboard session involving twelve distributed subsystems.

Final Thoughts

Cloud-native managed services have fundamentally changed the economics of data infrastructure.

In many organizations, we no longer need to build and manage large portions of the distributed compute machinery ourselves.

That should influence architecture.

My preference is therefore:

keep orchestration lightweight
keep transformations declarative
let the warehouse scale elastically
introduce distributed complexity only when justified by demonstrated need

Not because simpler architectures are fashionable.

Because they are often more robust, more economical, easier to operate, and easier for organizations to evolve over time.

Kris Kokomoor is a principal-level engineer and architect with experience spanning healthcare analytics, clinical data systems, operational data platforms, and cloud-native infrastructure. His work focuses on operationally sustainable analytics architectures, orchestration systems, data quality, observability, and AI-augmented engineering systems.

Additional essays and technical notes are available at Pysynapse. Consulting and advisory information is available through PalmerCove LLC.

Back to Blog Discuss Architecture