fm100 opened a new issue, #22944:
URL: https://github.com/apache/datafusion/issues/22944

   ### Is your feature request related to a problem or challenge?
   
   OpenLineage has become a common standard for collecting lineage metadata 
from processing engines. DataFusion is increasingly used to build query 
engines, but each DataFusion-based project currently needs to implement lineage 
extraction independently. This leads to duplicated effort and inconsistent 
OpenLineage support.
   
   ### Describe the solution you'd like
   
   I would like DataFusion to expose OpenLineage support, either directly or 
through stable APIs/hooks that downstream engines can use.
   
   Useful metadata to capture would include:
   * Resolved input and output datasets
   * Dataset schemas
   * Column-level lineage, where possible
   * Logical and/or physical plans, if appropriate
   * Query metadata such as query ID, status, timing, and errors
   
   I do not have a strong preference on the implementation. A separate crate, 
feature flag, or stable lineage extraction API would all be reasonable options.
   
   ### Describe alternatives you've considered
   
   Each DataFusion-based engine could implement OpenLineage support 
independently by inspecting SQL, logical plans, or physical plans. However, 
this duplicates work, may depend on unstable internals, and can produce 
inconsistent lineage semantics.
   
   ### Additional context
   
   OpenLineage integration would make DataFusion more useful as a foundation 
for production query engines and data platforms, especially for projects that 
want lineage and observability support without building it from scratch.
   
   I am not very familiar with the DataFusion codebase yet, but I would be 
happy to collaborate with the DataFusion community on the OpenLineage side and 
help shape the expected metadata/modeling requirements.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to