adriangb commented on PR #17273: URL: https://github.com/apache/datafusion/pull/17273#issuecomment-3218814835
> I think one proposal that may work is: > > 1. Try to arrange the files as non-overlapping ordered into the number of partitions requested. > 2. If that is not possible (there is overlap) simply place them as ordered into the number of partitions requested. > 3. If that's not possible (e.g. not statistics) then order them by some deterministic property (file path). > That keeps backwards compatibility in all cases I believe but unlocks state (2) which will be beneficial to many. I've now implemented this but the diff is much larger. I propose that if we want to move forward this top level PR get some review and if we agree on the big picture I'll split this up into: 1. Refactor `scan` -> `scan_with_args` 2. Implement the sort pushdown rule and add the field to `TableScan` 3. Update ListingTable to use this preferred sort order -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org