korowa commented on code in PR #11627:
URL: https://github.com/apache/datafusion/pull/11627#discussion_r1698976831
##########
datafusion/functions-aggregate/src/count.rs:
##########
@@ -433,6 +433,49 @@ impl GroupsAccumulator for CountGroupsAccumulator {
Ok(vec![Arc::new(counts) as ArrayRef])
}
+ fn convert_to_state(
+ &self,
+ values: &[ArrayRef],
+ opt_filter: Option<&BooleanArray>,
+ ) -> Result<Vec<ArrayRef>> {
+ let values = &values[0];
+
+ let state_array = match (values.logical_nulls(), opt_filter) {
+ (Some(nulls), None) => {
+ let mut builder = Int64Builder::with_capacity(values.len());
+ nulls
+ .into_iter()
+ .for_each(|is_valid| builder.append_value(is_valid as
i64));
+ builder.finish()
+ }
+ (Some(nulls), Some(filter)) => {
+ let mut builder = Int64Builder::with_capacity(values.len());
+ nulls.into_iter().zip(filter.iter()).for_each(
+ |(is_valid, filter_value)| {
+ builder.append_value(
Review Comment:
That was the missing link for me (thank you!) -- we can operate directly on
underlying buffers.
I've rewritten state conversion for count on bitand on buffers + cast to
Int64 in the end, and according to benchmarks from the commit it got 20-25%
faster.
Just a suggestion -- won't it be better to use BooleanBuffer + & (bitand
operator) instead of NullBuffer + union? NullBuffer is a bit confusing, so I've
"pulled" the logic from union right into state conversion function.
Additionally, I plan to prepare benches and minimize ArrayBuilder usage for
min / max / sum during tomorrow.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]