findepi opened a new issue, #14357: URL: https://github.com/apache/datafusion/issues/14357
The basic assumption that for a given operator we can recompute its schema from inputs' schema is unsound. - metadata: for plans constructed from SQL metadata will usually be empty, but an application can attach additional metadata to schema or field. The metadata can be assigned on the relational operator (its schema or one of the fields) and may not be derivable from inputs. - for examples of metadata ussage see https://github.com/apache/datafusion/issues/14247, https://github.com/apache/datafusion/issues/12644, but also other, non-type related use-cases, like primary ID tracking - field qualification: a plan node may have field qualification retained from inputs or erased, or reassigned. At the optimizer time, we cannot simply assume one way or the other. - DataFusion deals with plans created by it's own frontend, but DataFusion is also a library. It also deals with plans constructed by other frontends (https://github.com/apache/datafusion/issues/12723). Optimizers need to take any valid plan and produce a valid plan. The usage of `recompute_schema` within optimizer should be replaced with explicit node schema updates. For example, when pruning inputs with `RequiredIndices`, the node's schema should be pruned the same way, not recomputed anew. The usage of `recompute_schema` within analyzer is left for a different issue. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org