[I] Remove `recompute_schema` usage from optimizer [datafusion]

via GitHub Wed, 29 Jan 2025 05:11:40 -0800


findepi opened a new issue, #14357:
URL: https://github.com/apache/datafusion/issues/14357


   The basic assumption that for a given operator we can recompute its schema 
from inputs' schema is unsound.
   
   - metadata: for plans constructed from SQL metadata will usually be empty, 
but an application can attach additional metadata to schema or field. The 
metadata can be assigned on the relational operator (its schema or one of the 
fields) and may not be derivable from inputs.
     - for examples of metadata ussage see 
https://github.com/apache/datafusion/issues/14247, 
https://github.com/apache/datafusion/issues/12644, but also other, non-type 
related use-cases, like primary ID tracking
   - field qualification: a plan node may have field qualification retained 
from inputs or erased, or reassigned. At the optimizer time, we cannot simply 
assume one way or the other.
     - DataFusion deals with plans created by it's own frontend, but DataFusion 
is also a library. It also deals with plans constructed by other frontends 
(https://github.com/apache/datafusion/issues/12723). Optimizers need to take 
any valid plan and produce a valid plan.
   
   The usage of  `recompute_schema` within optimizer should be replaced with 
explicit node schema updates.
   For example, when pruning inputs with `RequiredIndices`, the node's schema 
should be pruned the same way, not recomputed anew. 
   
   The usage of `recompute_schema` within analyzer is left for a different 
issue.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

[I] Remove `recompute_schema` usage from optimizer [datafusion]

Reply via email to