jayzhan211 commented on code in PR #12996:
URL: https://github.com/apache/datafusion/pull/12996#discussion_r1823718252
##########
datafusion/physical-plan/src/aggregates/group_values/column.rs:
##########
@@ -196,6 +570,324 @@ impl GroupValues for GroupValuesColumn {
let b =
ByteViewGroupValueBuilder::<BinaryViewType>::new();
v.push(Box::new(b) as _)
}
+ dt => {
+ return not_impl_err!(
+ "{dt} not supported in VectorizedGroupValuesColumn"
+ )
+ }
+ }
+ }
+ self.group_values = v;
+ }
+
+ // tracks to which group each of the input rows belongs
+ groups.clear();
+ groups.resize(n_rows, usize::MAX);
+
+ let mut batch_hashes = mem::take(&mut self.hashes_buffer);
+ batch_hashes.clear();
+ batch_hashes.resize(n_rows, 0);
+ create_hashes(cols, &self.random_state, &mut batch_hashes)?;
+
+ // General steps for one round `vectorized equal_to & append`:
+ // 1. Collect vectorized context by checking hash values of `cols`
in `map`,
+ // mainly fill `vectorized_append_row_indices`,
`vectorized_equal_to_row_indices`
+ // and `vectorized_equal_to_group_indices`
+ //
+ // 2. Perform `vectorized_append` for
`vectorized_append_row_indices`.
+ // `vectorized_append` must be performed before
`vectorized_equal_to`,
+ // because some `group indices` in
`vectorized_equal_to_group_indices`
+ // may be actually placeholders, and still point to no actual
values in
Review Comment:
I'm unclear on the meaning of the placeholders here, but I understand why we
need to append values first now. Groups that require an equality check may need
to compare against the newly added groups.
In collect_vectorized_process_context, new groups only update the hash in
the hash table without appending values yet. After checking the hash, rows are
allocated to groups requiring an equality check. Therefore, we need to append
the new groups first so that equality checks can include the newly appended
groups
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]