adriangb opened a new issue, #16800: URL: https://github.com/apache/datafusion/issues/16800
As discussed in https://github.com/apache/datafusion/pull/16791 the long term plan in my mind (and that I would like to discuss with the community) is to replace `SchemaAdapter` with `PhysicalExprAdapter`. There are multiple reasons for this: - We can better optimize scenarios like missing columns or casts. For example, it's cheaper to cast a literal and evaluate it against the data as read from the file than it is to read the data from the file and cast that to the type of the literal. It is also cheaper to evaluate the expression `1 > col1` as `1 > null` when `col1` is missing than it is to create an array of nulls. Since we can also simplify `PhysicalExpr` we can even simplify `1 > null` into just `null`. - It's easier to manipulate `PhysicalExpr`s than it is to manipulate arrays. We already have machinery (`TreeNode` APIs, etc.) to do so. - This is necessary to be able to push down projections into file scans which we need for upcoming [Variant work ](https://github.com/apache/arrow-rs/issues/6736) and will also allow us to read single fields in a struct without reading the entire struct into memory. - Paves the path for any other advanced optimizations, e.g. we could do crazy stuff like only read the dictionary page from a parquet column for a filter `col = 'a'` and if `'a'` is not in the dictionary don't even bother reading the keys. We've already implemented a replacement system for predicate pushdown via `PhysicalExprAdapter` and have examples showing how to do some of the things a custom SchemaAdapter can do. Once we implement https://github.com/apache/datafusion/issues/14993 we'll be able to deprecate SchemaAdapter for the most part. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org