Dear Airflow Community, I have been working on a proposal to bring an OpenLineage provider to Airflow <https://docs.google.com/document/d/1aN5i8WV2Za7XiHTtyrewZscQ-4eXs1ZNfPw58JscFEw/edit#> . I am looking for feedback with the goal to post an official AIP. Please feel free to comment in the doc above. Thank you, Julien (OpenLineage project lead)
*For convenience, here is the rationale from the doc:* Operational lineage collection is a common need to understand dependencies between data pipelines and track end-to-end provenance of data. It enables many use cases from ensuring reliable delivery of data through observability to compliance and cost management. Publishing operational lineage is a core Airflow capability to enable troubleshooting and governance. OpenLineage is a project part of the LFAI&Data foundation that provides a spec standardizing operational lineage collection and sharing across the data ecosystem. If it provides plugins for popular open source projects, its intent is very similar to OpenTelemetry (also under the Linux Foundation umbrella): to remain a spec for lineage exchange that projects - open source or proprietary - implement. Built-in OpenLineage support in Airflow will make it easier and more reliable for Airflow users to publish their operational lineage through the OpenLineage ecosystem. The current external plugin maintained in the OpenLineage project depends on Airflow and operators internals and gets broken when changes are made on those. Having a built-in integration ensures a better first class support to expose lineage that gets tested alongside other changes and therefore is more stable.