Rachelint commented on code in PR #16136: URL: https://github.com/apache/datafusion/pull/16136#discussion_r2101735859
########## datafusion/physical-plan/src/aggregates/group_values/single_group_by/primitive/mod.rs: ########## @@ -74,21 +77,21 @@ macro_rules! hash_float { hash_float!(f16, f32, f64); -/// A [`GroupValues`] storing a single column of primitive values +/// A [`GroupValues`] storing a single column of normal primitive values (bits <= 64) /// /// This specialization is significantly faster than using the more general /// purpose `Row`s format pub struct GroupValuesPrimitive<T: ArrowPrimitiveType> { /// The data type of the output array data_type: DataType, - /// Stores the `(group_index, hash)` based on the hash of its value + /// Stores the `(group_index, group_value)` /// - /// We also store `hash` is for reducing cost of rehashing. Such cost - /// is obvious in high cardinality group by situation. + /// We directly store copy of `group_value` for not only efficient + /// rehashing, but also efficient probing. /// More details can see: - /// <https://github.com/apache/datafusion/issues/15961> + /// <https://github.com/apache/datafusion/issues/16136> /// - map: HashTable<(usize, u64)>, + map: HashTable<(usize, T::Native)>, Review Comment: I want to try it, too. But I think it will lead to regression, becasue many extra random writes exist in `emit` if we do that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org