yjshen commented on issue #12596: URL: https://github.com/apache/datafusion/issues/12596#issuecomment-2372599393
> Introduce the partitioned hashtable in partial aggregation, and we partition the datafusion before inserting them into hashtable. > And we push them into final aggregation partition by partition after, rather than split them again in repartition, and merge them again in coalesce. I'm not clear on how this proposal works. Could you please explain why it provides performance benefits compared to partial aggregation, exchange, and final aggregation? Is the proposal aimed explicitly at accelerating high cardinality aggregation, or is it intended to enhance aggregation performance? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
