Re: [I] Automatic Column lineage Tracking for Hudi Tables [hudi]

via GitHub Wed, 18 Mar 2026 11:10:00 -0700


vinothchandar commented on issue #18298:
URL: https://github.com/apache/hudi/issues/18298#issuecomment-4084600730


   > However, after HoodieStreamer converts that transformed dataset into 
JavaRDD / GenericRecord and writes through the Hudi write path, that plan-level 
lineage is no longer directly visible to lineage systems
   
   We could move more of that into DataFrame code (it's a worthy pursuit). But 
I also wonder this should be a standard ask from Spark lineage tracking tools. 
e.g. 
https://docs.datahub.com/docs/metadata-integration/java/acryl-spark-lineage 
uses Spark listeners to do this generically at the Spark layer. 
   
   All in all - I think this is not a Hudi problem. You probably want to track 
something across all Spark jobs. and we can do this cleanly with some 
listeners? I believe that is the standard approach across OpenLineage, DataHub 
etc. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [I] Automatic Column lineage Tracking for Hudi Tables [hudi]

Reply via email to