kosiew commented on code in PR #17280:
URL: https://github.com/apache/datafusion/pull/17280#discussion_r2293418951


##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -258,7 +264,7 @@ impl SharedBoundsAccumulator {
         // Create a predicate for each partition
         let mut partition_predicates = Vec::with_capacity(bounds.len());
 
-        for partition_bounds in bounds.iter() {
+        for partition_bounds in bounds.iter().sorted_by_key(|b| b.partition) {

Review Comment:
   The accumulator sorts collected bounds with sorted_by_key, introducing extra 
allocations and O(n log n) overhead. Pre-indexing bounds by partition ID could 
remove the need for sorting and improve efficiency.



##########
datafusion/physical-plan/src/joins/hash_join.rs:
##########
@@ -258,7 +264,7 @@ impl SharedBoundsAccumulator {
         // Create a predicate for each partition
         let mut partition_predicates = Vec::with_capacity(bounds.len());
 
-        for partition_bounds in bounds.iter() {
+        for partition_bounds in bounds.iter().sorted_by_key(|b| b.partition) {

Review Comment:
   In other words, remove PartitionBounds abstraction and  change a “vector of 
structs with IDs” into a “struct of vectors indexed by partition”



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to