+1 binding , this should make lineage a first-class citizen for Airflow
users. Excited for this one

On Sun, 12 Feb 2023 at 07:57, Jarek Potiuk <ja...@potiuk.com> wrote:

> A little side-track., small comment to what Shubham wrote
>
> Yeah. I also noticed AIP-47 mentioned - but I considered that
> implementation detail. I read that those will be rather regular unit
> tests (so not reaching out to external systems as it makes little
> sense and we definitely want to make open-lineage tests run regularly
> with every PR - otherwise we would end up in the same boat as
> currently where the repos are separated out), I believe the AIP-47
> mentioned there was more an attempt to say "the tests coverage will be
> high". Julian, am I right ?
>
> On Sat, Feb 11, 2023 at 11:57 PM Mehta, Shubham
> <shu...@amazon.com.invalid> wrote:
> >
> > +1 non-binding. I'll be on the lookout for initial PRs to learn more
> about the implementation details of how System Tests will be extended to
> cover these changes, as well as the ongoing maintenance required from
> providers. The proposed changes should definitely make it easier for
> Airflow customers to adopt lineage and improve stability. I'm looking
> forward to seeing how customers will end up using it!
> >
> >
> > Shubham
> >
> >
> >
> > From: Julien Le Dem <jul...@astronomer.io.INVALID>
> > Reply-To: "dev@airflow.apache.org" <dev@airflow.apache.org>
> > Date: Friday, February 10, 2023 at 3:28 PM
> > To: "dev@airflow.apache.org" <dev@airflow.apache.org>
> > Subject: [EXTERNAL] [VOTE] AIP-53 OpenLineage in Airflow
> >
> >
> >
> > CAUTION: This email originated from outside of the organization. Do not
> click links or open attachments unless you can confirm the sender and know
> the content is safe.
> >
> >
> >
> > Dear Airflow community,
> >
> >
> >
> > Following the discussion thread over the past few weeks, I'd like to
> call a vote on AIP-53 OpenLineage in Airflow:
> >
> >
> https://cwiki.apache.org/confluence/display/AIRFLOW/AIP-53+OpenLineage+in+Airflow
> >
> >
> >
> > The discussion thread is linked in the confluence doc if you wish to
> consult the history of the conversation. Thank you to all who contributed!
> >
> >
> >
> > This is my (non-binding!) +1, the vote will last until midnight (UTC) on
> Friday 17th February.
> >
> >
> >
> > Thanks,
> >
> > Julien
> >
> >
> >
> > For reference, the Motivation section in the doc:
> >
> > Operational lineage collection is a common need to understand
> dependencies between data pipelines and track end-to-end provenance of
> data. It enables many use cases from ensuring reliable delivery of data
> through observability to compliance and cost management.
> >
> > Publishing operational lineage is a core Airflow capability to enable
> troubleshooting and governance.
> >
> > OpenLineage is a project part of the LFAI&Data foundation that provides
> a spec standardizing operational lineage collection and sharing across the
> data ecosystem. If it provides plugins for popular open source projects,
> its intent is very similar to OpenTelemetry (also under the Linux
> Foundation umbrella): to remain a spec for lineage exchange that projects -
> open source or proprietary - implement.
> >
> > Built-in OpenLineage support in Airflow will make it easier and more
> reliable for Airflow users to publish their operational lineage through the
> OpenLineage ecosystem.
> >
> > The current external plugin maintained in the OpenLineage project
> depends on Airflow and operators internals and gets broken when changes are
> made on those. Having a built-in integration ensures a better first class
> support to expose lineage that gets tested alongside other changes and
> therefore is more stable.
> >
> > Today, OpenLineage consumers in the ecosystem include: Egeria (bank
> compliance), Marquez (build your own metadata platform for compliance for
> example), Microsoft Purview (Governance, …), Astro (data observability),
> Amundsen. AWS recently blogged about using OpenLineage in the AWS
> ecosystem. Other projects are at various levels of progress.
> >
> > On the producer side, there is support for open source projects like
> Airflow, dbt, Spark, Flink, GreatExpectations and proprietary warehouses
> like Snowflake, BigQuery, Redshift through API integration or SQL parsing.
> >
> > Examples of users talking about their usage of OpenLineage can be found
> on the Openlineage blog..
> >
> > This integration will also stimulate the continued growth of the
> OpenLineage ecosystem and create more value for Airflow users.
>

Reply via email to