berkaysynnada commented on issue #15601: URL: https://github.com/apache/datafusion/issues/15601#issuecomment-2781412638
> The round robin repartitioning is added to increase parallelism (by increasing number of partitions). Hash repartitioning does also increase the number of partitions, but the hash partitioning itself also benefits of doing the repartitioning in parallel, which is why both are present in the plan. If we would (somehow) integrate the parallelism inside `RepartitionExec` or change how we plan tasks altogether this might be redundant, but this is how it works now. This is right. This duo comes up very often in plans, but the only alternative is implementing an operator capable of parallel hashing. That would trims the plans by one and maybe we can get rid some channel passes, but designing that in an efficient way isn't very trivial. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org