Thank you very much for the support and feedback! I've put some more thought into it over the past week, and decided to go forward with Prism Runner now given the focus for it to be the default local runner.
It's a bit of a problem that the OpenLineage Go client is available through a third party and not maintained by OpenLineage, though they do work after some initial experiment. There's also some work to be done ( https://github.com/apache/beam/pull/36578) to support the lineage metrics for Prism Runner, though hopefully it should be straightforward. Best, Charles On Wed, Oct 15, 2025 at 6:09 PM Robert Burke <[email protected]> wrote: > I'm happy to help review/advise on how to integrate it with Prism, as time > permits. > > Based on the design, and that metrics are sent back to the runner, it > should be straightforward to add special handling for the lineage metrics > to collect them for querying from the Job service handle. > > There's also an option to perhaps render them with the (very rudimentary) > Web UI that prism also provides in standalone mode. Though a cursory look > only shows a single OpenLineage implementation in Go, generated from the > spec. > > On Wed, Oct 15, 2025, 11:15 AM Willy Lulciuc <[email protected]> > wrote: > >> Hey Charles, >> >> I'm the co-creator of OpenLineage and project lead of Marquez (reference >> implementation of the OLin spec). I read your proposal. Very exciting stuff! >> Let me know how I can help in the initial design phase (or development). >> I worked on the Airflow and Spark integrations for OLin and just opened a >> proposal to extend the spec to support ML training (see ML support for >> OpenLineage <https://github.com/OpenLineage/OpenLineage/issues/4035>) >> that I think would be very relevant for Beam. >> >> Cool to see this work kick off! >> >> On Tue, Oct 14, 2025 at 12:48 PM Charles Nguyen <[email protected]> >> wrote: >> >>> Hey everyone, >>> >>> There is this feature [1] to integrate Beam with OpenLineage, and a >>> while back I was planning to work on this but never got around to it [2]. >>> I've been revisiting this feature and want to take this up again. Please >>> take a look at the proposal [3] to support building a lineage graph for >>> Beam's local runner(s) and integration with OpenLineage's open standard for >>> lineage collection. Any feedback is much appreciated. >>> >>> The proposal targets Python and Java Direct Runners and Prism Runner, >>> for the sake of completeness. But I'm looking for input on which local >>> runner should we proceed with as a start. Of course the ultimate goal is to >>> support Prism Runner, and there's work already underway to make Prism >>> Runner the default local runner for some Python pipelines (which kinda >>> makes Python Direct Runner not worthwhile to start with). At the same time, >>> Prism Runner is also actively in development and there might be blockers >>> that I'm not aware of... >>> >>> Best, >>> Charles >>> >>> [1] https://github.com/apache/beam/issues/33981 >>> [2] https://lists.apache.org/thread/wwm8qnymvoy80lvdkr4p8hwrpdrot9do >>> [3] >>> https://docs.google.com/document/d/1Styamoo35QSn0mp4iaL8MUfE2r0p1iqmoDDMZdpSlGQ/edit?usp=sharing >>> >>
