adriangb commented on PR #17273:
URL: https://github.com/apache/datafusion/pull/17273#issuecomment-3228672310

   > There are multiple things going on in this PR:
   > 
   > 1. Update TableProvider API
   > 2. Pushdown sorts
   > 3. rework how filter pushdown works during physical planning
   > 
   > It might help to split them up into separate PRs
   
   Yep agreed, that's why I laid out a plan in 
https://github.com/apache/datafusion/pull/17273#issuecomment-3218814835:
   
   
   > 1. Refactor `scan` -> `scan_with_args`
   > 2. Implement the sort pushdown rule and add the field to `TableScan`
   > 3. Update ListingTable to use this preferred sort order
   > 4. Read the sort order of the files at the same time as we read stats. May 
need to refactor some traits from `collect_stats` to `collect_file_properties` 
or something like that. Should be free for parquet (order is in the metadata 
which we have to pull down to get stats from). (This work is not in this PR)
   
   > Also in terms of pushdown sort, I feel like it would really help to add an 
example how how to use it, or even better find some way to use that information 
to improve one of the built in features of DataFusion (so more users could 
benefit)
   
   Agreed as well, that's why I've kept this "larger" PR so we can see the e2e 
change. My hope is that this e2e change can demonstrate a performance 
improvement using `datafusion-cli` as the "example".
   
   It sounds like you're generally on board with the vision here, so I'll start 
the work of splitting this PR up.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to