adriangb commented on issue #14993: URL: https://github.com/apache/datafusion/issues/14993#issuecomment-3033503866
I started looking into this and where it gets messy is: 1. Partition columns. I think this needs a rethink. I suggest pushing partition column generation down into the actual scan of the data using projection pushdown, then everything above that doesn't need to special case them. But that might mean loosing this nifty optimization: https://github.com/apache/datafusion/blob/5a0ddbf00ed1336079444cb9217ab2069b6780fc/datafusion/datasource/src/file_scan_config.rs#L1140-L1143 2. Computing all of the equivalence properties, etc. The good news is that I think this will all come out simpler: we should essentially re-use what `ProjectionExec` does instead of having a different path for when the projection is a `Vec<usize>`. 3. There's going to be a good amount of breaking changes needed for folks using `FileScanConfig` & co directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org