adriangb commented on PR #17273:
URL: https://github.com/apache/datafusion/pull/17273#issuecomment-3218814835

   > I think one proposal that may work is:
   > 
   > 1. Try to arrange the files as non-overlapping ordered into the number of 
partitions requested.
   > 2. If that is not possible (there is overlap) simply place them as ordered 
into the number of partitions requested.
   > 3. If that's not possible (e.g. not statistics) then order them by some 
deterministic property (file path).
   >    That keeps backwards compatibility in all cases I believe but unlocks 
state (2) which will be beneficial to many.
   
   I've now implemented this but the diff is much larger. I propose that if we 
want to move forward this top level PR get some review and if we agree on the 
big picture I'll split this up into:
   
   1. Refactor `scan` -> `scan_with_args`
   2. Implement the sort pushdown rule and add the field to `TableScan`
   3. Update ListingTable to use this preferred sort order
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to