adriangb commented on code in PR #16014: URL: https://github.com/apache/datafusion/pull/16014#discussion_r2085377284
########## datafusion/physical-optimizer/src/pruning.rs: ########## @@ -995,6 +996,184 @@ fn build_statistics_record_batch<S: PruningStatistics>( }) } +/// Prune a set of containers represented by their statistics. Review Comment: > Pruning on statistics during plan time would potentially be redundant with also trying to prune again during opening, but it would reduce the files earlier int he plan Yeah I don't think it's redundant: you either prune or you don't. If we prune earlier the files don't make it this far. If we don't we may now be able to prune them. What's redundant is if there are no changes to the filters (i.e. no dynamic filters), but that sounds both hard to track and like a possible future optimization 😄 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org