adriangb opened a new issue, #16528: URL: https://github.com/apache/datafusion/issues/16528
Several feature requests / issues have come up that I think can all be addressed with the groundwork being laid in https://github.com/apache/datafusion/pull/16461: - https://github.com/apache/datafusion/pull/15261 - https://github.com/apache/datafusion/pull/15057 - https://github.com/apache/datafusion/issues/16004 - https://github.com/apache/datafusion/issues/14993 In particular, https://github.com/apache/datafusion/pull/16461 introduces a general framework for adapting an expression to a file's schema, handling any necessary casts and missing columns. We can expand this by: - Optimizing the expressions to minimize cost of casts, wip in https://github.com/pydantic/datafusion/pull/31. Closes https://github.com/apache/datafusion/issues/16004. - Other optimizations passes, such as evaluating literals / nulls. Also related to https://github.com/apache/datafusion/issues/16004. - Hook to handle missing columns (e.g. do something other than fill in with nulls based on Field metadata, could be a user defined default value); closes https://github.com/apache/datafusion/pull/15261 - Hook to transform an expression before or after it is rewritten for the physical file schema; closes https://github.com/apache/datafusion/pull/15057. - Optimization to eliminate casts altogether when two types share the same parquet physical type (change the schema the data is read with and remove the cast). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org