comphead commented on code in PR #10304:
URL: https://github.com/apache/datafusion/pull/10304#discussion_r1599204082


##########
datafusion/physical-plan/src/joins/sort_merge_join.rs:
##########
@@ -1363,6 +1380,57 @@ fn get_filter_column(
     filter_columns
 }
 
+// Get buffered data sliece by specific batch index and for specified column 
indices only
+#[inline(always)]
+fn get_buffered_columns(
+    buffered_data: &BufferedData,
+    buffered_batch_idx: usize,
+    buffered_indices: &UInt64Array,
+) -> Result<Vec<ArrayRef>, ArrowError> {
+    buffered_data.batches[buffered_batch_idx]
+        .batch
+        .columns()
+        .iter()
+        .map(|column| take(column, &buffered_indices, None))
+        .collect::<Result<Vec<_>, ArrowError>>()
+}
+
+// Calculate join filter bit mask considering join type specifics
+fn get_filtered_join_mask(
+    join_type: JoinType,
+    streamed_indices: UInt64Array,
+    mask: &BooleanArray,
+) -> Option<BooleanArray> {
+    // for LeftSemi Join the filter mask should be calculated in its own way:
+    // if we find at least one matching row for specific streaming index
+    // we dont need to check any others for the same index
+    if matches!(join_type, JoinType::LeftSemi) {

Review Comment:
   I will add more tests for RightSemi as followup



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to