Platform Architecture

The Deeper Truth: Platform Complexity

By Kris Kokomoor Role Principal Data Engineer / System Architect Published June 3, 2026 Last updated June 3, 2026 Tags data-engineering, architecture, platform-engineering, cloud-native, bigquery, airflow, dbt, spark, databricks, complexity Read time 6 minute read

In a recent article, I described my preference for a cloud-native ELT architecture centered around BigQuery, dbt, Airflow, and lightweight execution layers such as Cloud Run or Cloud Functions.

The core argument was straightforward:

Modern data platforms are often more operationally complex than they need to be.

That remains my default architectural posture.

But there is a deeper truth worth discussing:

Complexity is not inherently bad.
Complexity is sometimes earned.

The challenge for architects and engineering organizations is distinguishing between:

necessary complexity
premature complexity
accidental complexity

That distinction matters enormously.

Simplicity Is a Strategy, Not a Religion

When I argue for simpler architectures, I am not arguing that distributed systems, Spark, Databricks, Snowflake, Kubernetes, or lakehouse architectures are unnecessary technologies.

Many are exceptional technologies.

The question is not whether they are powerful.

The question is:

Under what conditions does their additional complexity become economically and operationally justified?

That is a much more interesting engineering discussion.

Architectural maturity is not about avoiding complexity at all costs.

It is about introducing complexity deliberately, proportionally, and with clear justification.

Workload Characteristics Matter

One reason architectural discussions often become polarized is that people generalize from their own workloads.

But workloads vary enormously.

A SQL-centric ELT pipeline performing relational transformations against structured business data has very different requirements from:

real-time telemetry systems
AI feature engineering pipelines
graph analytics
scientific computing
recommendation engines
stateful streaming applications

In many organizations, BigQuery or another cloud-native warehouse can absorb the overwhelming majority of transformation workloads efficiently and elegantly.

But there are legitimate cases where additional distributed compute layers become appropriate.

Examples include:

iterative machine learning workflows
large-scale Python or Scala processing
graph traversal algorithms
stateful streaming joins
GPU-oriented processing
workloads that do not naturally map to declarative SQL

At that point, systems like Spark begin solving real problems rather than hypothetical ones.

Streaming Changes the Conversation

Streaming architectures are one of the clearest examples of justified complexity.

Batch-oriented ELT pipelines and continuously operating event-processing systems have fundamentally different characteristics.

Once an organization begins dealing with:

sub-second latency requirements
event-time semantics
watermarking
stateful stream processing
massive continuous ingestion

the architecture often changes substantially.

The engineering tradeoffs become different:

operational complexity increases
but so does the value of specialized streaming infrastructure

In those environments, additional distributed processing frameworks may become entirely reasonable.

Organizational Scale Matters Too

Technology choices are rarely driven solely by technical characteristics.

Organizations themselves impose constraints.

Large enterprises may require:

strict workload isolation
chargeback models
dedicated compute domains
regulatory segmentation
multi-tenant governance
independently managed engineering environments

In those situations, platforms that provide explicit compute segmentation and resource governance may become attractive even when the underlying workload itself is not especially exotic.

Architecture exists within organizations, not in isolation from them.

This is one reason why engineering discussions that focus exclusively on technical purity are often incomplete.

Operational governance is itself a technical requirement.

The Operational Cost of Complexity

At the same time, complexity always carries a price.

Every additional distributed subsystem introduces:

additional monitoring
additional IAM configuration
additional deployment processes
additional upgrade paths
additional operational expertise
additional troubleshooting surface area
additional organizational coupling

These costs are frequently underestimated because architectural diagrams tend to emphasize capability rather than operational lifetime burden.

This matters more than many teams initially realize.

Operational simplicity improves:

reliability
onboarding
debugging
security posture
maintainability
long-term adaptability

A platform that fewer engineers fully understand may be more technically sophisticated while simultaneously becoming more organizationally fragile.

Open Formats and Data Locality

There are also legitimate strategic reasons organizations move toward lakehouse-oriented ecosystems.

Open table formats such as:

Iceberg
Delta Lake
Hudi

provide attractive properties:

decoupled storage and compute
cross-engine interoperability
long-term data portability
flexible processing models

Similarly, some organizations reach data scales where moving large datasets repeatedly into centralized warehouse environments becomes economically inefficient.

Processing data closer to object storage may then become the rational choice.

Again, the point is not that one model is universally superior.

It is that workload economics eventually shape architecture.

Architectural Defaults Still Matter

Even after acknowledging all of this, my own default posture remains largely unchanged.

I still believe many organizations prematurely adopt:

distributed compute layers
Kubernetes-based data infrastructure
complex streaming systems
heavily fragmented platform architectures

before their actual workload characteristics justify doing so.

In many cases:

simpler orchestration
declarative transformations
managed cloud-native compute
warehouse-centric execution

remain entirely sufficient.

The existence of edge cases does not invalidate the value of simplicity as a starting principle.

Quite the opposite.

A simpler architecture creates a clearer baseline from which additional complexity can later be justified.

The Real Goal

Ultimately, the goal of architecture is not simplicity.

Nor is it sophistication.

The goal is fitness.

A good architecture:

matches workload characteristics
matches organizational maturity
minimizes unnecessary operational burden
evolves proportionally to demonstrated need

That evolution should ideally be intentional rather than fashionable.

The deeper truth is that complexity is neither virtue nor failure.

It is a tool.

And like all tools, it should be used carefully, deliberately, and with full awareness of its cost.

Kris Kokomoor is a principal-level engineer and architect with experience spanning healthcare analytics, clinical data systems, operational data platforms, and cloud-native infrastructure. His work focuses on operationally sustainable analytics architectures, orchestration systems, data quality, observability, and AI-augmented engineering systems.

Additional essays and technical notes are available at Pysynapse. Consulting and advisory information is available through PalmerCove LLC.

Back to Blog Discuss Architecture