Kontinuation commented on code in PR #563:
URL: https://github.com/apache/sedona-db/pull/563#discussion_r2757128734


##########
rust/sedona-spatial-join/src/utils/join_utils.rs:
##########
@@ -93,6 +115,161 @@ pub(crate) fn get_final_indices_from_bit_map(
     (left_indices, right_indices)
 }
 
+pub(crate) fn adjust_indices_with_visited_info(
+    left_indices: UInt64Array,
+    right_indices: UInt32Array,
+    adjust_range: Range<usize>,
+    join_type: JoinType,
+    preserve_order_for_right: bool,
+    visited_info: Option<(&mut BooleanBufferBuilder, usize)>,
+    produce_unmatched_probe_rows: bool,
+) -> Result<(UInt64Array, UInt32Array)> {
+    let Some((bitmap, offset)) = visited_info else {
+        return adjust_indices_by_join_type(
+            left_indices,
+            right_indices,
+            adjust_range,
+            join_type,
+            preserve_order_for_right,
+        );
+    };
+
+    // Update the bitmap with the current matches first
+    for idx in right_indices.values() {
+        bitmap.set_bit(offset + (*idx as usize), true);
+    }
+
+    match join_type {
+        JoinType::Right | JoinType::Full => {
+            if !produce_unmatched_probe_rows {
+                Ok((left_indices, right_indices))
+            } else {
+                let unmatched_count = adjust_range
+                    .clone()
+                    .filter(|&i| !bitmap.get_bit(i + offset))
+                    .count();

Review Comment:
   Unfortunately Arrow's BooleanBufferBuilder does not provide optimized 
methods for iterating over bit ranges, other join_utils code inherited from 
DataFusion also did this, so I sticked to using BooleanBufferBuilder for 
visited bitset to be consistent with the rest of the code.
   
   It didn't show up as a performance bottleneck when running outer joins 
before, perhaps the other parts of the join is far more heavy weight than 
bitmap traversal.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to