Re: [PR] POC: Vectorized hashtable for aggregation [datafusion]

via GitHub Mon, 28 Oct 2024 07:52:56 -0700


Dandandan commented on code in PR #12996:
URL: https://github.com/apache/datafusion/pull/12996#discussion_r1819216243



##########
datafusion/physical-plan/src/aggregates/group_values/group_column.rs:
##########
@@ -287,6 +469,63 @@ where
         };
     }
 
+    fn vectorized_equal_to(

Review Comment:
   Hmm... 🤔  I think `filter` is a subset of `take` in what it supports as you 
can repeat values / use them out of order in `take` but can't do that in 
`filter`.
   
   Somehow generalizing them sounds reasonable, as `filter` even uses the same 
strategy whenever the predicate is sparse (converting to indices).
   
   Appending to a in-progress array seems like it could be very useful to 
aggregates yeah, I think that will mostly remove the need for the `GroupColumn` 
code...
   
   For the equality check however we would need to take in two arrays of 
indices and/or boolean predicate followed by a equality operation (ideally in 
one go without copying the input arrays).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] POC: Vectorized hashtable for aggregation [datafusion]

Reply via email to