kosiew commented on code in PR #17280: URL: https://github.com/apache/datafusion/pull/17280#discussion_r2293418951
########## datafusion/physical-plan/src/joins/hash_join.rs: ########## @@ -258,7 +264,7 @@ impl SharedBoundsAccumulator { // Create a predicate for each partition let mut partition_predicates = Vec::with_capacity(bounds.len()); - for partition_bounds in bounds.iter() { + for partition_bounds in bounds.iter().sorted_by_key(|b| b.partition) { Review Comment: The accumulator sorts collected bounds with sorted_by_key, introducing extra allocations and O(n log n) overhead. Pre-indexing bounds by partition ID could remove the need for sorting and improve efficiency. ########## datafusion/physical-plan/src/joins/hash_join.rs: ########## @@ -258,7 +264,7 @@ impl SharedBoundsAccumulator { // Create a predicate for each partition let mut partition_predicates = Vec::with_capacity(bounds.len()); - for partition_bounds in bounds.iter() { + for partition_bounds in bounds.iter().sorted_by_key(|b| b.partition) { Review Comment: In other words, remove PartitionBounds abstraction and change a “vector of structs with IDs” into a “struct of vectors indexed by partition” -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org