UBarney commented on code in PR #16443:
URL: https://github.com/apache/datafusion/pull/16443#discussion_r2177770988


##########
datafusion/physical-plan/src/joins/nested_loop_join.rs:
##########
@@ -729,10 +716,26 @@ struct NestedLoopJoinStream<T> {
     right_side_ordered: bool,
     /// Current state of the stream
     state: NestedLoopJoinStreamState,
+    #[allow(dead_code)]
+    // TODO: remove this field ??
     /// Transforms the output batch before returning.
     batch_transformer: T,
     /// Result of the left data future
     left_data: Option<Arc<JoinLeftData>>,
+
+    // Tracks progress when building join result batches incrementally
+    // Contains (build_indices, probe_indices, processed_count) where:
+    // - build_indices: row indices from build-side table (left table)
+    // - probe_indices: row indices from probe-side table (right table)
+    // - processed_count: number of index pairs already processed into output 
batches
+    // We have completed join result for indices [0..processed_count)
+    join_result_status: Option<(

Review Comment:
   In a hash join, `ProcessProbeBatch` is solely responsible for tracking the 
join progress on the probe side. In contrast, `join_result_status` serves a 
broader purpose: it tracks progress for both the probe side and for the 
**unmatched rows** from the build side.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to