berkaysynnada commented on issue #15601:
URL: https://github.com/apache/datafusion/issues/15601#issuecomment-2781412638

   > The round robin repartitioning is added to increase parallelism (by 
increasing number of partitions). Hash repartitioning does also increase the 
number of partitions, but the hash partitioning itself also benefits of doing 
the repartitioning in parallel, which is why both are present in the plan. If 
we would (somehow) integrate the parallelism inside `RepartitionExec` or change 
how we plan tasks altogether this might be redundant, but this is how it works 
now.
   
   This is right. This duo comes up very often in plans, but the only 
alternative is implementing an operator capable of parallel hashing. That would 
trims the plans by one and maybe we can get rid some channel passes, but 
designing that in an efficient way isn't very trivial.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to