Coordination and Contract Platform  ·  Data 3.0  ·  Apache 2.0

Data Mesh
Made Real

The first platform built for AI agents as first-class data mesh participants.

Provenance is an open-source platform that operationalizes the data mesh. Domain teams publish governed data products. Policies enforce themselves at every boundary. AI agents participate with identity and full lineage. Every decision is traceable.

Human
Agent
Gov
Mesh
Data
Provenance
The Problem

The philosophy exists.
The platform has not. Until now.

Most organizations have the right instincts about domain ownership and federation. What they lack is a platform that enforces governance in code, captures lineage automatically, and treats AI agents as governed participants rather than ungoverned consumers bolted on after the fact.

  • 01 Domain teams have no self-service tooling to publish data products with enforced contracts. Ownership is nominal, not operational.
  • 02 Governance lives in documents and spreadsheets, not in code. It is unenforced, untraced, and invisible to downstream consumers.
  • 03 AI agents are bolted on outside the governance model. Their data access is opaque, their outputs untraceable.
  • 04 Data lineage is documented manually at best, missing entirely at worst, creating serious risk in regulated environments.

"The data mesh is a sociotechnical paradigm that takes a distributed, domain-oriented approach to ownership and architecture, with the platform as an enabler."

Zhamak Dehghani, Data Mesh (O'Reilly, 2022)

Provenance is that enabler. It is a coordination and contract platform: it stores metadata and contracts, not data. Governance runs in OPA and evaluates at every publish boundary. Lineage is captured automatically in Neo4j. AI agents operate with identity, policy constraints, and full lineage attribution via the Model Context Protocol. It is not a catalog, not a warehouse, not a pipeline tool.

Architecture

Three pillars.
One governed mesh.

Provenance models the data ecosystem as three distinct participants, each with identity, accountability, and governed access.

Domain Teams

Human Domain Ownership

Domain teams own their data products end-to-end: schema, SLOs, metadata, and publishing cadence. Self-service tooling that enforces standards without mandating implementation.

AI Agents

AI as First-Class Participants

AI agents are governed participants with identity, lineage, and policy constraints. Every agent action is traceable. Every output is attributable. No black boxes in production data environments.

Federated Governance

Policy as Code, Not Process

Governance is enforced at publish time via Open Policy Agent, not documented in wikis. Organizations define their own governance contracts. Provenance makes them executable and auditable.

Capabilities

What Provenance delivers

A full-stack platform for governed data product operations, from schema registration to marketplace discovery to agent-aware lineage capture.

Data Product Lifecycle Management

Register, version, publish, and deprecate data products with enforced contracts. Schemas, SLOs, quality thresholds, and ownership are tracked in a single system of record.

OPA-Enforced Policy Engine

Governance policies defined in Rego, evaluated at every critical boundary. No product publishes without passing governance checks. Policies are versioned, auditable, and customizable per organization.

Federated Connector Framework

Native connectors for PostgreSQL, Snowflake, S3, and more, with an extensible framework for custom sources. Schemas snapshot at registration, ensuring contract stability for downstream consumers.

Data Product Marketplace

A self-service catalog where domain teams publish and consumers discover governed data products. SLO visibility, ownership contact, and access request flows, all in one governed interface.

Automated Lineage Capture

Lineage is captured automatically at the platform level, not documented manually. Every transformation, access, and publication is recorded and queryable, satisfying audit requirements by default.

Agent Identity and MCP Layer (Phase 4)

First-class support for AI agents as governed mesh participants via the Model Context Protocol. Agents are issued identities, their data access is policy-gated, and every inference is traceable to source.

Built For

Built for environments where
accountability is not optional

Defense and Intelligence

DoD and Federal Agencies

Built with DoD data standards and auditability requirements in mind. Provenance gives mission-critical environments the traceable, defensible data lineage needed for AI-assisted operations.

Enterprise

Regulated Industries

Financial services, healthcare, and energy organizations with compliance mandates benefit from governance-by-default, policy enforcement that happens at the platform level, not as an afterthought.

Data Engineering

Platform Engineering Teams

Teams that have outgrown central data pipelines and are ready to move to a domain-owned, federated model without building the governance infrastructure from scratch.

Clarity

What Provenance is not

Provenance is a coordination and contract layer. It sits alongside your existing infrastructure, not instead of it. Knowing what it is not is as important as knowing what it is.

Not a data warehouse or data lake

Provenance does not store your data. Data stays in domain infrastructure. The platform stores metadata, contracts, and lineage only.

Not a pipeline orchestrator or ETL engine

Provenance does not move or transform data. It governs the contracts under which data products are defined and consumed.

Not a traditional data catalog

Catalogs document. Provenance enforces. Governance policies are evaluated in code at every publish boundary, not maintained in spreadsheets.

Not a centralized query engine

Consumers query data through domain-owned ports. Provenance governs access and captures lineage when they do. It is not the query layer itself.

Open Source

Built in public.
Governed by design.

Provenance is Apache 2.0 licensed and developed openly on GitHub. It is a reference implementation of data mesh principles, built to be forked, extended, and deployed.

  • NestJS monorepo, clean, modular, extensible

    TypeScript throughout. Nx workspace. Designed for domain teams to contribute their own connectors and governance policies.

  • One-command deployment to AWS via Terraform

    Infrastructure as code for EC2, RDS, and networking. From zero to running platform in under 30 minutes.

  • Public roadmap, contributions welcome

    Phase 3 and Phase 4 are on the public roadmap. Come build with us.

provenance / data-product.ts
// Register a governed data product
const product = await provenance.publish({
  domain:  'intelligence-fusion',
  name:    'entity-resolution-v2',
  owner:   'team-geoint',

  schema: connector.snapshot('postgres://...'),

  slo: {
    freshness:    'PT6H',
    availability: 0.999,
    quality:      0.98
  },

  governance: {
    policy:      'opa://federal-data-standards',
    sensitivity: 'CUI',
    lineage:     true
  }
})

// OPA evaluates. Lineage is captured.
// Product is live in the marketplace.
console.log(product.status) // 'published'
Roadmap

Where we are.
Where we're going.

Complete

Phase 1 - Foundation

Core data product API, domain management, organization model, Flyway migrations, React frontend scaffold, Docker Compose development environment, Terraform AWS deployment.

Complete

Phase 2 - Governance and Mesh

OPA policy integration, connector framework (PostgreSQL, Snowflake, S3), schema snapshotting, publish workflow with governance gating, data product marketplace, governance UI and audit log.

Complete

Phase 3 - Production Hardening

SLO monitoring and alerting, expanded connector library, lineage graph visualization, access control and consumer request flows, observability instrumentation, performance optimization.

In Progress

Phase 4 - Agent Layer (Data 3.0)

Model Context Protocol integration, AI agent identity and access management, governed agent participation in the data mesh, AI-assisted metadata generation with human-in-the-loop approval, agent lineage capture.