Data Product as Service: Building Smart Analytics with Data Mesh Principles

A comprehensive guide to implementing Data Mesh architecture using Databricks SQL endpoints, Streamlit micro-frontends, and DAPR for cloud-native analytics that bridges the gap between traditional BI and modern data applications.

GM
Gaurav Malhotra
January 15, 202412 min readView on GitHub
PythonStreamlitDatabricksDAPRDelta LakeDocker

The Evolution of Analytics Architecture

Data engineering has fundamentally become a software engineering problem. While the industry has long recognized this truth, the practical implementation of software development principles in analytics remains elusive for many organizations. The question that drives modern data architecture is simple yet profound: How can we apply battle-tested software engineering practices to analytics?

In the world of application development, we have embraced APIs-first approaches, microservices architecture, and micro-frontend patterns. These patterns enable teams to rapidly develop and release new features with agility and end-to-end automation. This is the essence of the 12-factor app methodology that has revolutionized software delivery.

The analytics domain stands at a similar crossroads today. The rise of frameworks like Streamlit and Dash has made building interactive visualizations in Python increasingly attractive. Data teams can now leverage their programming language of choice to create powerful, interactive data applications.

What is Data Product as Service?

At its core, Data Product as Service is an architectural pattern that treats data products as first-class services, accessible through protocol-agnostic interfaces. This approach enables:

  • APIs/Service-first approach for data product access
  • Microservices architecture for data processing
  • Micro-frontend/modular UI for visualization
  • Serverless/containerization for deployment automation
  • OAM (Open Application Model) compliance via DAPR

The key insight is that data in itself has no value. It is the smart analytics layer that provides "data storytelling" and unlocks business insights.

Reference Architecture

The following architecture represents the complete Smart Analytics solution, combining data mesh principles with modern frontend patterns:

Medallion Data Architecture

Loading diagram...

Core Components Deep Dive

Data Product Layer

A Data Product in this architecture lives in a data lakehouse, typically stored on object storage like AWS S3 or Azure ADLS. Using Databricks, data products are materialized as Delta tables on Delta Lake, providing:

  • ACID transactions for data reliability
  • Time travel for data versioning
  • Schema enforcement and evolution
  • Optimized query performance
-- Creating a Data Product as a Delta table
CREATE TABLE IF NOT EXISTS default.nyctaxi_yellow
USING DELTA
LOCATION "dbfs:/databricks-datasets/nyctaxi/tables/nyctaxi_yellow";

Databricks SQL Endpoints

The SQL Endpoint is the critical abstraction that transforms data products into services. Databricks SQL endpoints provide:

  • Protocol-agnostic data access
  • Built-in authentication and authorization
  • Connection support for traditional BI tools (Tableau, Power BI, Qlik)
  • Native drivers for Python, JDBC, and ODBC
  • Serverless compute options for cost efficiency

This layer handles horizontal concerns including:

  • Authentication: Integration with OAuth, OKTA, and enterprise identity providers
  • Authorization: Fine-grained access control via Apache Ranger policies
  • Audit: Complete query logging and lineage tracking

DAPR Integration

DAPR (Distributed Application Runtime) brings the Open Application Model to analytics workloads. By running Streamlit applications as DAPR-enabled services, we gain:

  • Service discovery across the analytics mesh
  • State management for session handling
  • Pub/Sub messaging for real-time updates
  • Secrets management for secure credential handling
  • Observability with built-in tracing and metrics
# Running the analytics app with DAPR
dapr run --app-id smartapp --app-port 9999 --dapr-http-port 9999 python main.py

Streamlit Micro-Frontends

Streamlit enables data teams to build interactive visualizations using Python. As a micro-frontend, the Streamlit app:

  • Focuses on a single analytics domain
  • Deploys independently of other frontends
  • Integrates with the broader application through standard web patterns
  • Leverages the full Python data science ecosystem

Data Mesh Alignment

This architecture embodies key Data Mesh principles as defined by Zhamak Dehghani:

Domain Ownership

Each data product is owned by a domain team that understands both the data and its business context. The SQL endpoint layer provides standardized access without requiring cross-team coordination.

Data as a Product

By exposing data through service endpoints, data products become self-describing, discoverable, and consumable. Teams can access data as easily as calling an API.

Self-Serve Data Platform

The combination of Databricks, DAPR, and containerization creates a self-serve platform where domain teams can:

  • Create and publish data products
  • Build visualization frontends
  • Deploy without infrastructure expertise

Federated Computational Governance

The authentication and authorization layer (OAuth, Ranger policies) enforces governance policies consistently across all data products while allowing domain-specific customization.

Implementation Guide

Prerequisites

Before implementing this architecture, ensure you have:

  • Databricks workspace with SQL endpoint capability
  • Docker for local development and containerization
  • Mapbox token for geo-visualization features
  • DAPR runtime installed locally or on Kubernetes/EKS

Environment Configuration

Create a .env file with your configuration:

DATABRICKS_HOST=<your-databricks-host>
DATABRICKS_TOKEN=<your-personal-access-token>
DATABRICKS_SQL_ENDPOINT=<your-sql-endpoint-id>
MAPBOX_TOKEN=<your-mapbox-token>

Running Locally

For local development with Docker:

# Build and run the container
make docker-run

# Access the application
open http://localhost:9999

For DAPR-enabled deployment:

# Install DAPR CLI and initialize
dapr init

# Run with DAPR sidecar
dapr run --app-id smartapp --app-port 9999 --dapr-http-port 9999 python main.py

Micro-Frontend Integration Patterns

The architecture supports multiple integration strategies for composing analytics into larger applications:

PatternUse CaseComplexity
RoutingSeparate pages for each analytics moduleLow
iFrameEmbedding analytics in existing portalsLow
Micro-appsIndependent deployment with shared shellMedium
Web ComponentsReusable analytics widgetsMedium
Module FederationShared dependencies across frontendsHigh

Consider frameworks like Single SPA, Module Federation, Bit, or Piral for advanced micro-frontend orchestration.

Production Deployment Options

Since the application is containerized via Dockerfile, deployment options include:

Serverless Container Platforms

  • AWS Fargate for serverless container execution
  • Azure Container Instances for quick deployments
  • Google Cloud Run for auto-scaling

Kubernetes with DAPR

apiVersion: apps/v1
kind: Deployment
metadata:
  name: smart-analytics
  annotations:
    dapr.io/enabled: "true"
    dapr.io/app-id: "smartapp"
    dapr.io/app-port: "9999"
spec:
  replicas: 3
  template:
    spec:
      containers:
        - name: analytics
          image: smart-analytics:latest
          ports:
            - containerPort: 9999

Industry Use Cases

This architecture pattern applies across industries:

Retail Analytics

  • Sales performance dashboards
  • Inventory optimization
  • Logistics tracking
  • Customer behavior analysis

Healthcare and Insurance

  • Claims administration analytics
  • Policy performance tracking
  • Product definition with sample claim processing
  • Population health insights

Financial Services

  • Payment tracking and reconciliation
  • Multi-channel activity monitoring
  • Merchant analytics
  • Fraud detection dashboards

The Path Forward

This architecture represents a fundamental shift from "report building departments" to "smart analytics application building departments." By treating data engineering as a software problem, organizations can:

  1. Reduce complexity through domain-driven design
  2. Accelerate delivery via independent deployments
  3. Improve quality with software engineering practices
  4. Scale efficiently using cloud-native patterns

The 90s era of monolithic analytics-in-a-box is over. Modern cloud-native analytics, powered by Data Product as Service, brings the same agility, scale, and automation that has transformed application development to the analytics domain.

Conclusion

The journey from traditional analytics to Smart Analytics requires embracing software engineering principles wholesale. By combining:

  • Delta Lake for reliable data products
  • Databricks SQL Endpoints for service abstraction
  • DAPR for distributed application runtime
  • Streamlit for rapid frontend development
  • Micro-frontend patterns for composable UIs

Organizations can build analytics platforms that match the velocity and quality of modern application development. The investment in this architecture pays dividends through reduced time-to-insight, improved data quality, and empowered domain teams.

As the original author eloquently states: "Be the change you want to see in the world of advanced analytics."


This post is based on the data-product-as-service project, which demonstrates these concepts using real-time geo-location tracking similar to ride-sharing applications like Uber.