Hey Charles,

I'm the co-creator of OpenLineage and project lead of Marquez (reference
implementation of the OLin spec). I read your proposal. Very exciting stuff!
Let me know how I can help in the initial design phase (or development). I
worked on the Airflow and Spark integrations for OLin and just opened a
proposal to extend the spec to support ML training (see ML support for
OpenLineage <https://github.com/OpenLineage/OpenLineage/issues/4035>) that
I think would be very relevant for Beam.

Cool to see this work kick off!

On Tue, Oct 14, 2025 at 12:48 PM Charles Nguyen <[email protected]> wrote:

> Hey everyone,
>
> There is this feature [1] to integrate Beam with OpenLineage, and a while
> back I was planning to work on this but never got around to it [2]. I've
> been revisiting this feature and want to take this up again. Please take a
> look at the proposal [3] to support building a lineage graph for Beam's
> local runner(s) and integration with OpenLineage's open standard for
> lineage collection. Any feedback is much appreciated.
>
> The proposal targets Python and Java Direct Runners and Prism Runner, for
> the sake of completeness. But I'm looking for input on which local runner
> should we proceed with as a start. Of course the ultimate goal is to
> support Prism Runner, and there's work already underway to make Prism
> Runner the default local runner for some Python pipelines (which kinda
> makes Python Direct Runner not worthwhile to start with). At the same time,
> Prism Runner is also actively in development and there might be blockers
> that I'm not aware of...
>
> Best,
> Charles
>
> [1] https://github.com/apache/beam/issues/33981
> [2] https://lists.apache.org/thread/wwm8qnymvoy80lvdkr4p8hwrpdrot9do
> [3]
> https://docs.google.com/document/d/1Styamoo35QSn0mp4iaL8MUfE2r0p1iqmoDDMZdpSlGQ/edit?usp=sharing
>

Reply via email to