adriangb commented on PR #17273:
URL: https://github.com/apache/datafusion/pull/17273#issuecomment-3215815211

   I think one proposal that may work is:
   1. Try to arrange the files as non-overlapping ordered into the number of 
partitions requested.
   2. If that is not possible (there is overlap) simply place them as ordered 
into the number of partitions requested.
   3. If that's not possible (e.g. not statistics) then order them by some 
deterministic property (file path).
   That keeps backwards compatibility in all cases I believe but unlocks state 
(2) which will be beneficial to many.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to