Rachelint commented on issue #11680: URL: https://github.com/apache/datafusion/issues/11680#issuecomment-2276152860
> I think this data is very interesting and we should look more deeply into why is the single group mode faster than doing a repartition / aggregate. > > It seems like the only differences are: > > 1. There is a `RepartitionExec` and `CoalesceBatchesExec` > > 2. The final `AggregateExec` happens in parallel (but on distinct subsets of the group) > > > I would expect doing the final aggregate in parallel on distinct subsets to be about as fast > > So one reasonable conclusion conclusion that the overhead of `RepartitionExec` and `CoalesceBatchesExec` accounts for the difference 🤔 and this if we reduced the Repartition overhead we could see similar improvements as the group by single mode > > This is the idea behind exploring #11647 -- I think we could avoid a copy at the output of CoalesceBatchesExec which would help to reduce the overhead It seems the cpu cost about `RepartitionExec` and `CoalesceBatchesExec` is not the bottleleck for the q32 in clickbench according to the flamegraph. One possibility is that it may not be the problem CPU cost, and it is the problem about schedule (tokio).  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
