Dear Airflow Community,
I have been working on a proposal to bring an OpenLineage provider to
Airflow
<https://docs.google.com/document/d/1aN5i8WV2Za7XiHTtyrewZscQ-4eXs1ZNfPw58JscFEw/edit#>
.
I am looking for feedback with the goal to post an official AIP.
Please feel free to comment in the doc above.
Thank you,
Julien (OpenLineage project lead)

*For convenience, here is the rationale from the doc:*

Operational lineage collection is a common need to understand dependencies
between data pipelines and track end-to-end provenance of data. It enables
many use cases from ensuring reliable delivery of data through
observability to compliance and cost management.

Publishing operational lineage is a core Airflow capability to enable
troubleshooting and governance.

OpenLineage is a project part of the LFAI&Data foundation that provides a
spec standardizing operational lineage collection and sharing across the
data ecosystem. If it provides plugins for popular open source projects,
its intent is very similar to OpenTelemetry (also under the Linux
Foundation umbrella): to remain a spec for lineage exchange that projects -
open source or proprietary - implement.

Built-in OpenLineage support in Airflow will make it easier and more
reliable for Airflow users to publish their operational lineage through the
OpenLineage ecosystem.

The current external plugin maintained in the OpenLineage project depends
on Airflow and operators internals and gets broken when changes are made on
those. Having a built-in integration ensures a better first class support
to expose lineage that gets tested alongside other changes and therefore is
more stable.

Reply via email to