The first platform built for AI agents as first-class data mesh participants.
Provenance is an open-source platform that operationalizes the data mesh. Domain teams publish governed data products. Policies enforce themselves at every boundary. AI agents participate with identity and full lineage. Every decision is traceable.
Most organizations have the right instincts about domain ownership and federation. What they lack is a platform that enforces governance in code, captures lineage automatically, and treats AI agents as governed participants rather than ungoverned consumers bolted on after the fact.
"The data mesh is a sociotechnical paradigm that takes a distributed, domain-oriented approach to ownership and architecture, with the platform as an enabler."
Zhamak Dehghani, Data Mesh (O'Reilly, 2022)
Provenance is that enabler. It is a coordination and contract platform: it stores metadata and contracts, not data. Governance runs in OPA and evaluates at every publish boundary. Lineage is captured automatically in Neo4j. AI agents operate with identity, policy constraints, and full lineage attribution via the Model Context Protocol. It is not a catalog, not a warehouse, not a pipeline tool.
Provenance models the data ecosystem as three distinct participants, each with identity, accountability, and governed access.
Domain teams own their data products end-to-end: schema, SLOs, metadata, and publishing cadence. Self-service tooling that enforces standards without mandating implementation.
AI agents are governed participants with identity, lineage, and policy constraints. Every agent action is traceable. Every output is attributable. No black boxes in production data environments.
Governance is enforced at publish time via Open Policy Agent, not documented in wikis. Organizations define their own governance contracts. Provenance makes them executable and auditable.
A full-stack platform for governed data product operations, from schema registration to marketplace discovery to agent-aware lineage capture.
Register, version, publish, and deprecate data products with enforced contracts. Schemas, SLOs, quality thresholds, and ownership are tracked in a single system of record.
Governance policies defined in Rego, evaluated at every critical boundary. No product publishes without passing governance checks. Policies are versioned, auditable, and customizable per organization.
Native connectors for PostgreSQL, Snowflake, S3, and more, with an extensible framework for custom sources. Schemas snapshot at registration, ensuring contract stability for downstream consumers.
A self-service catalog where domain teams publish and consumers discover governed data products. SLO visibility, ownership contact, and access request flows, all in one governed interface.
Lineage is captured automatically at the platform level, not documented manually. Every transformation, access, and publication is recorded and queryable, satisfying audit requirements by default.
First-class support for AI agents as governed mesh participants via the Model Context Protocol. Agents are issued identities, their data access is policy-gated, and every inference is traceable to source.
Built with DoD data standards and auditability requirements in mind. Provenance gives mission-critical environments the traceable, defensible data lineage needed for AI-assisted operations.
Financial services, healthcare, and energy organizations with compliance mandates benefit from governance-by-default, policy enforcement that happens at the platform level, not as an afterthought.
Teams that have outgrown central data pipelines and are ready to move to a domain-owned, federated model without building the governance infrastructure from scratch.
Provenance is a coordination and contract layer. It sits alongside your existing infrastructure, not instead of it. Knowing what it is not is as important as knowing what it is.
Provenance does not store your data. Data stays in domain infrastructure. The platform stores metadata, contracts, and lineage only.
Provenance does not move or transform data. It governs the contracts under which data products are defined and consumed.
Catalogs document. Provenance enforces. Governance policies are evaluated in code at every publish boundary, not maintained in spreadsheets.
Consumers query data through domain-owned ports. Provenance governs access and captures lineage when they do. It is not the query layer itself.
Provenance is Apache 2.0 licensed and developed openly on GitHub. It is a reference implementation of data mesh principles, built to be forked, extended, and deployed.
TypeScript throughout. Nx workspace. Designed for domain teams to contribute their own connectors and governance policies.
Infrastructure as code for EC2, RDS, and networking. From zero to running platform in under 30 minutes.
Phase 3 and Phase 4 are on the public roadmap. Come build with us.
// Register a governed data product
const product = await provenance.publish({
domain: 'intelligence-fusion',
name: 'entity-resolution-v2',
owner: 'team-geoint',
schema: connector.snapshot('postgres://...'),
slo: {
freshness: 'PT6H',
availability: 0.999,
quality: 0.98
},
governance: {
policy: 'opa://federal-data-standards',
sensitivity: 'CUI',
lineage: true
}
})
// OPA evaluates. Lineage is captured.
// Product is live in the marketplace.
console.log(product.status) // 'published'
Core data product API, domain management, organization model, Flyway migrations, React frontend scaffold, Docker Compose development environment, Terraform AWS deployment.
OPA policy integration, connector framework (PostgreSQL, Snowflake, S3), schema snapshotting, publish workflow with governance gating, data product marketplace, governance UI and audit log.
SLO monitoring and alerting, expanded connector library, lineage graph visualization, access control and consumer request flows, observability instrumentation, performance optimization.
Model Context Protocol integration, AI agent identity and access management, governed agent participation in the data mesh, AI-assisted metadata generation with human-in-the-loop approval, agent lineage capture.
Microservices split, migration to managed AWS services (Aurora, Neptune, MSK, OpenSearch), Kubernetes deployment on EKS, security hardening, Kong gateway, high availability and disaster recovery.