Open Source · Data 3.0 · Apache 2.0

Data Mesh
Made Real

Where human expertise, AI agents, and governance converge.

Provenance is an open-source platform that operationalizes the data mesh - enabling domain teams to publish governed, federated data products with built-in lineage, policy enforcement, and AI-native participation.

Human
Agent
Gov
Mesh
Data
The Problem

Data mesh is a philosophy.
Provenance makes it software.

Most enterprises have the right ideas about data ownership and federation - but no platform to enforce them. They fall back on central pipelines, undocumented lineage, and governance that exists only in spreadsheets.

  • 01 Domain teams lack self-service tooling to publish and own data products with enforced contracts.
  • 02 Governance frameworks live in documents, not in code - untraced, unenforced, and invisible to downstream consumers.
  • 03 AI agents are bolted on after the fact, operating outside the governance model rather than as first-class participants.
  • 04 Data lineage is documented manually at best, missing entirely at worst - creating risk for regulated environments.

"The data mesh is a sociotechnical paradigm that takes a distributed, domain-oriented approach to ownership and architecture... with the platform as an enabler."

- Zhamak Dehghani, Data Mesh (O'Reilly, 2022)



Provenance is purpose-built to be that enabler - translating the four principles of data mesh into working, deployable software with OPA-enforced governance, federated data product APIs, and an agent-aware architecture ready for Data 3.0.

Architecture

Three pillars.
One governed mesh.

Provenance models the data ecosystem as an interplay of three distinct participants - each with identity, accountability, and governed access.

Domain Teams

Human Domain Ownership

Domain teams own their data products end-to-end - schema, SLOs, metadata, and publishing cadence - through a self-service interface that enforces standards without mandating implementation.

AI Agents

AI as First-Class Participants

AI agents are not consumers - they are governed participants with identity, lineage, and policy constraints. Every agent action is traceable. Every output is attributable. No black boxes in production.

Federated Governance

Policy as Code, Not Process

Governance is enforced at publish time via Open Policy Agent - not documented in wikis. Organizations define their own governance contracts; Provenance makes them executable, auditable, and real.

Capabilities

What Provenance delivers

A full-stack platform for governed data product operations - from schema registration to marketplace discovery to agent-aware lineage capture.

Data Product Lifecycle Management

End-to-end tooling to register, version, publish, and deprecate data products with enforced contracts - schemas, SLOs, quality thresholds, and ownership all tracked in a single system of record.

OPA-Enforced Policy Engine

Governance policies defined in Rego and evaluated at every critical boundary. No product publishes without passing governance checks. Policies are versioned, auditable, and customizable per organization.

Federated Connector Framework

Native connectors for PostgreSQL, Snowflake, S3, and more - with an extensible framework for custom sources. Schemas snapshot at registration, not at query time, ensuring contract stability.

Data Product Marketplace

A self-service catalog where domain teams publish and consumers discover governed data products. Ratings, SLO visibility, ownership contact, and access request flows - all in one place.

Automated Lineage Capture

Lineage is captured automatically at the platform level - not documented manually by engineers. Every transformation, access, and publication is recorded and queryable, satisfying audit requirements by default.

Agent Identity & MCP Layer (Phase 4)

First-class support for AI agents as governed data mesh participants via the Model Context Protocol. Agents are issued identities, their data access is policy-gated, and every inference is traceable back to source.

Audience

Built for organizations
that move at mission speed

Defense & Intelligence

DoD & Federal Agencies

Designed with DoD data standards and auditability requirements in mind. Provenance gives mission-critical environments the traceable, defensible data lineage needed for AI-assisted operations.

Enterprise

Regulated Industries

Financial services, healthcare, and energy organizations with compliance mandates benefit from governance-by-default - policy enforcement that happens at the platform level, not as an afterthought.

Data Engineering

Platform Engineering Teams

Engineering teams that have outgrown central data pipelines and are ready to move to a domain-owned, federated model - without building the governance infrastructure from scratch.

Open Source

Built in public.
Governed by design.

Provenance is Apache 2.0 licensed and developed openly on GitHub. The platform is a reference implementation of data mesh principles - built to be forked, extended, and deployed.

  • NestJS monorepo - clean, modular, extensible

    TypeScript throughout. Nx workspace. Designed for domain teams to contribute their own connectors and governance policies.

  • One-command deployment to AWS via Terraform

    Infrastructure as code for EC2, RDS, and networking. From zero to running platform in under 30 minutes.

  • Roadmap published. Contributions welcome.

    Phase 3 (advanced connectors, SLO monitoring) and Phase 4 (MCP agent layer) are on the public roadmap. Come build with us.

provenance / data-product.ts
// Register a governed data product
const product = await provenance.publish({
  domain:   'intelligence-fusion',
  name:     'entity-resolution-v2',
  owner:    'team-geoint',

  schema: connector.snapshot('postgres://...'),

  slo: {
    freshness:    'PT6H',
    availability: 0.999,
    quality:      0.98
  },

  governance: {
    policy:      'opa://federal-data-standards',
    sensitivity: 'CUI',
    lineage:     true
  }
})

// OPA evaluates. Lineage is captured.
// Product is live in the marketplace.
console.log(product.status)  // → 'published'
Roadmap

Where we are.
Where we're going.

Complete

Phase 1 - Foundation

Core data product API, domain management, organization model, Flyway migrations, React frontend scaffold, Docker Compose development environment, Terraform AWS deployment.

Complete

Phase 2 - Governance & Mesh

OPA policy integration, connector framework (PostgreSQL, Snowflake, S3), schema snapshotting, publish workflow with governance gating, data product marketplace, governance UI and audit log.

In Progress

Phase 3 - Production Hardening

SLO monitoring and alerting, expanded connector library, lineage graph visualization, access control and consumer request flows, observability instrumentation, performance optimization.

Planned

Phase 4 - Agent Layer (Data 3.0)

Model Context Protocol (MCP) integration, AI agent identity and access management, governed agent participation in the data mesh, AI-assisted metadata generation with human-in-the-loop approval, agent lineage capture.

Get Involved

The mesh needs builders.

Provenance is early-stage and actively developed. Whether you're an enterprise evaluating a data mesh platform, a data engineer looking to contribute, or an organization with governed data requirements - we'd like to talk.

Star on GitHub Contact Provenance Logic