Rachelint commented on issue #11680:
URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2276152860

   > I think this data is very interesting and we should look more deeply into 
why is the single group mode faster than doing a repartition / aggregate.
   > 
   > It seems like the only differences are:
   > 
   >     1. There is a `RepartitionExec` and `CoalesceBatchesExec`
   > 
   >     2. The final `AggregateExec` happens in parallel  (but on distinct 
subsets of the group)
   > 
   > 
   > I would expect doing the final aggregate in parallel on distinct subsets 
to be about as fast
   > 
   > So one reasonable conclusion conclusion that the overhead of 
`RepartitionExec` and `CoalesceBatchesExec` accounts for the difference 🤔 and 
this if we reduced the Repartition overhead we could see similar improvements 
as the group by single mode
   > 
   > This is the idea behind exploring #11647 -- I think we could avoid a copy 
at the output of CoalesceBatchesExec which would help to reduce the overhead
   
   It seems the cpu cost about `RepartitionExec` and `CoalesceBatchesExec` is 
not the bottleleck for the q32 in clickbench according to the flamegraph.
   
   One possibility is that it may not be the problem CPU cost, and it is the 
problem about schedule (tokio).
   
   
![xxx](https://raw.githubusercontent.com/Rachelint/drawio-store/main/dperf.08085.svg)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to