Hey Charles, I'm the co-creator of OpenLineage and project lead of Marquez (reference implementation of the OLin spec). I read your proposal. Very exciting stuff! Let me know how I can help in the initial design phase (or development). I worked on the Airflow and Spark integrations for OLin and just opened a proposal to extend the spec to support ML training (see ML support for OpenLineage <https://github.com/OpenLineage/OpenLineage/issues/4035>) that I think would be very relevant for Beam.
Cool to see this work kick off! On Tue, Oct 14, 2025 at 12:48 PM Charles Nguyen <[email protected]> wrote: > Hey everyone, > > There is this feature [1] to integrate Beam with OpenLineage, and a while > back I was planning to work on this but never got around to it [2]. I've > been revisiting this feature and want to take this up again. Please take a > look at the proposal [3] to support building a lineage graph for Beam's > local runner(s) and integration with OpenLineage's open standard for > lineage collection. Any feedback is much appreciated. > > The proposal targets Python and Java Direct Runners and Prism Runner, for > the sake of completeness. But I'm looking for input on which local runner > should we proceed with as a start. Of course the ultimate goal is to > support Prism Runner, and there's work already underway to make Prism > Runner the default local runner for some Python pipelines (which kinda > makes Python Direct Runner not worthwhile to start with). At the same time, > Prism Runner is also actively in development and there might be blockers > that I'm not aware of... > > Best, > Charles > > [1] https://github.com/apache/beam/issues/33981 > [2] https://lists.apache.org/thread/wwm8qnymvoy80lvdkr4p8hwrpdrot9do > [3] > https://docs.google.com/document/d/1Styamoo35QSn0mp4iaL8MUfE2r0p1iqmoDDMZdpSlGQ/edit?usp=sharing >
