2010YOUY01 commented on code in PR #11943:
URL: https://github.com/apache/datafusion/pull/11943#discussion_r1726804168
##########
datafusion/physical-plan/src/aggregates/group_values/row.rs:
##########
@@ -121,16 +135,31 @@ impl GroupValues for GroupValuesRows {
create_hashes(cols, &self.random_state, batch_hashes)?;
for (row, &target_hash) in batch_hashes.iter().enumerate() {
+ let mode = self.mode;
let entry = self.map.get_mut(target_hash, |(exist_hash,
group_idx)| {
// Somewhat surprisingly, this closure can be called even if
the
// hash doesn't match, so check the hash first with an integer
// comparison first avoid the more expensive comparison with
// group value. https://github.com/apache/datafusion/pull/11718
- target_hash == *exist_hash
- // verify that the group that we are inserting with hash is
- // actually the same key value as the group in
- // existing_idx (aka group_values @ row)
- && group_rows.row(row) == group_values.row(*group_idx)
+ if target_hash != *exist_hash {
+ return false;
+ }
+
+ // verify that the group that we are inserting with hash is
+ // actually the same key value as the group in
+ // existing_idx (aka group_values @ row)
+ match mode {
+ GroupStatesMode::Flat => {
+ group_rows.row(row)
+ == group_values.current().unwrap().row(*group_idx)
+ }
+ GroupStatesMode::Blocked(_) => {
+ let blocked_index = BlockedGroupIndex::new(*group_idx);
+ group_rows.row(row)
+ == group_values[blocked_index.block_id]
Review Comment:
My basic understanding for vectorized HT is to build a hash table that can
operate on a batch of values, and its implementation steps also process a batch
at a time. It's gonna be efficient since the CPU can run super fast for simple
loops (due to deep instruction pipeline and cache)
A regular vs. vectorized hash table lookup implementation looks like this:
hashing 1 key -> find initial slot for 1 key -> linear probing for 1 key ->
...
hashing 8192 keys -> find initial slot for 8192 keys -> linear probing for
8192 keys -> ...
The regular hash table approach will inevitably cause branch mispredictions
and cache misses within a single operation to slow things down. The vectorized
version's execution consists of several tight loops (looping thousands of
times) so that the CPU can run at full speed within loops.
Now the aggregation has already vectorized the first step -- hashing, the
vectorized HT approach intends to continue vectorizing the following steps
Though it would be a super complex algorithm basically write a batched hash
table from scratch, I haven't thought about the specific algorithm either
#7095 quoted one paper section describing Photon's algorithm's high-level
idea
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]