ctsk commented on PR #15981: URL: https://github.com/apache/datafusion/pull/15981#issuecomment-2859158487
Another tried-and-true strategy for this kind of problem is to partition in multiple stages: Instead of having a "wide" fanout partitioning to, for instance 256 partitions, it is preferable to chain two stages of partitioning to 16 partitions. I am unsure how to best apply this idea to datafusion: Could be a planning-level change or could be realized as a change in `partition_iter` like the changes in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org