adriangb commented on PR #17273: URL: https://github.com/apache/datafusion/pull/17273#issuecomment-3228672310
> There are multiple things going on in this PR: > > 1. Update TableProvider API > 2. Pushdown sorts > 3. rework how filter pushdown works during physical planning > > It might help to split them up into separate PRs Yep agreed, that's why I laid out a plan in https://github.com/apache/datafusion/pull/17273#issuecomment-3218814835: > 1. Refactor `scan` -> `scan_with_args` > 2. Implement the sort pushdown rule and add the field to `TableScan` > 3. Update ListingTable to use this preferred sort order > 4. Read the sort order of the files at the same time as we read stats. May need to refactor some traits from `collect_stats` to `collect_file_properties` or something like that. Should be free for parquet (order is in the metadata which we have to pull down to get stats from). (This work is not in this PR) > Also in terms of pushdown sort, I feel like it would really help to add an example how how to use it, or even better find some way to use that information to improve one of the built in features of DataFusion (so more users could benefit) Agreed as well, that's why I've kept this "larger" PR so we can see the e2e change. My hope is that this e2e change can demonstrate a performance improvement using `datafusion-cli` as the "example". It sounds like you're generally on board with the vision here, so I'll start the work of splitting this PR up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org