korowa commented on PR #15324: URL: https://github.com/apache/datafusion/pull/15324#issuecomment-2739393073
Thank you @waynexia, I'm planning to check it out at most tomorrow. I have a question in advance before reviewing -- have you been considering to implement groups accumulator for [specialized](https://github.com/apache/datafusion/tree/main/datafusion/functions-aggregate-common/src/aggregate/count_distinct) cases of DistinctCountAccumulator (primitive/native types and bytes)? I'm asking because, as for me, it looks a bit odd (though I haven't rechecked performance results, and perhaps GroupsAccumulatorAdapter introduces some insane overhead), that switching from native Rust types to ScalarValue still gives x5 faster execution, while groups accumulator in this case, if I'm not mistaken, does basically the same as the GroupsAccumulatorAdapter -- storing separate states (hashsets) in the vector is already [implemented](https://github.com/apache/datafusion/blob/97548a2584614acb3211b862e1ebca8349b6a832/datafusion/functions-aggregate-common/src/aggregate/groups_accumulator.rs#L96) in the adapter. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org