Where human expertise, AI agents, and governance converge.
Provenance is an open-source platform that operationalizes the data mesh - enabling domain teams to publish governed, federated data products with built-in lineage, policy enforcement, and AI-native participation.
Most enterprises have the right ideas about data ownership and federation - but no platform to enforce them. They fall back on central pipelines, undocumented lineage, and governance that exists only in spreadsheets.
"The data mesh is a sociotechnical paradigm that takes a distributed, domain-oriented approach to ownership and architecture... with the platform as an enabler."
- Zhamak Dehghani, Data Mesh (O'Reilly, 2022)
Provenance is purpose-built to be that enabler - translating the four principles of data mesh into working, deployable software with OPA-enforced governance, federated data product APIs, and an agent-aware architecture ready for Data 3.0.
Provenance models the data ecosystem as an interplay of three distinct participants - each with identity, accountability, and governed access.
Domain teams own their data products end-to-end - schema, SLOs, metadata, and publishing cadence - through a self-service interface that enforces standards without mandating implementation.
AI agents are not consumers - they are governed participants with identity, lineage, and policy constraints. Every agent action is traceable. Every output is attributable. No black boxes in production.
Governance is enforced at publish time via Open Policy Agent - not documented in wikis. Organizations define their own governance contracts; Provenance makes them executable, auditable, and real.
A full-stack platform for governed data product operations - from schema registration to marketplace discovery to agent-aware lineage capture.
End-to-end tooling to register, version, publish, and deprecate data products with enforced contracts - schemas, SLOs, quality thresholds, and ownership all tracked in a single system of record.
Governance policies defined in Rego and evaluated at every critical boundary. No product publishes without passing governance checks. Policies are versioned, auditable, and customizable per organization.
Native connectors for PostgreSQL, Snowflake, S3, and more - with an extensible framework for custom sources. Schemas snapshot at registration, not at query time, ensuring contract stability.
A self-service catalog where domain teams publish and consumers discover governed data products. Ratings, SLO visibility, ownership contact, and access request flows - all in one place.
Lineage is captured automatically at the platform level - not documented manually by engineers. Every transformation, access, and publication is recorded and queryable, satisfying audit requirements by default.
First-class support for AI agents as governed data mesh participants via the Model Context Protocol. Agents are issued identities, their data access is policy-gated, and every inference is traceable back to source.
Designed with DoD data standards and auditability requirements in mind. Provenance gives mission-critical environments the traceable, defensible data lineage needed for AI-assisted operations.
Financial services, healthcare, and energy organizations with compliance mandates benefit from governance-by-default - policy enforcement that happens at the platform level, not as an afterthought.
Engineering teams that have outgrown central data pipelines and are ready to move to a domain-owned, federated model - without building the governance infrastructure from scratch.
Provenance is Apache 2.0 licensed and developed openly on GitHub. The platform is a reference implementation of data mesh principles - built to be forked, extended, and deployed.
TypeScript throughout. Nx workspace. Designed for domain teams to contribute their own connectors and governance policies.
Infrastructure as code for EC2, RDS, and networking. From zero to running platform in under 30 minutes.
Phase 3 (advanced connectors, SLO monitoring) and Phase 4 (MCP agent layer) are on the public roadmap. Come build with us.
// Register a governed data product
const product = await provenance.publish({
domain: 'intelligence-fusion',
name: 'entity-resolution-v2',
owner: 'team-geoint',
schema: connector.snapshot('postgres://...'),
slo: {
freshness: 'PT6H',
availability: 0.999,
quality: 0.98
},
governance: {
policy: 'opa://federal-data-standards',
sensitivity: 'CUI',
lineage: true
}
})
// OPA evaluates. Lineage is captured.
// Product is live in the marketplace.
console.log(product.status) // → 'published'
Core data product API, domain management, organization model, Flyway migrations, React frontend scaffold, Docker Compose development environment, Terraform AWS deployment.
OPA policy integration, connector framework (PostgreSQL, Snowflake, S3), schema snapshotting, publish workflow with governance gating, data product marketplace, governance UI and audit log.
SLO monitoring and alerting, expanded connector library, lineage graph visualization, access control and consumer request flows, observability instrumentation, performance optimization.
Model Context Protocol (MCP) integration, AI agent identity and access management, governed agent participation in the data mesh, AI-assisted metadata generation with human-in-the-loop approval, agent lineage capture.
Provenance is early-stage and actively developed. Whether you're an enterprise evaluating a data mesh platform, a data engineer looking to contribute, or an organization with governed data requirements - we'd like to talk.