adriangb commented on PR #17273: URL: https://github.com/apache/datafusion/pull/17273#issuecomment-3215815211
I think one proposal that may work is: 1. Try to arrange the files as non-overlapping ordered into the number of partitions requested. 2. If that is not possible (there is overlap) simply place them as ordered into the number of partitions requested. 3. If that's not possible (e.g. not statistics) then order them by some deterministic property (file path). That keeps backwards compatibility in all cases I believe but unlocks state (2) which will be beneficial to many. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org