UBarney commented on code in PR #16443: URL: https://github.com/apache/datafusion/pull/16443#discussion_r2177770988
########## datafusion/physical-plan/src/joins/nested_loop_join.rs: ########## @@ -729,10 +716,26 @@ struct NestedLoopJoinStream<T> { right_side_ordered: bool, /// Current state of the stream state: NestedLoopJoinStreamState, + #[allow(dead_code)] + // TODO: remove this field ?? /// Transforms the output batch before returning. batch_transformer: T, /// Result of the left data future left_data: Option<Arc<JoinLeftData>>, + + // Tracks progress when building join result batches incrementally + // Contains (build_indices, probe_indices, processed_count) where: + // - build_indices: row indices from build-side table (left table) + // - probe_indices: row indices from probe-side table (right table) + // - processed_count: number of index pairs already processed into output batches + // We have completed join result for indices [0..processed_count) + join_result_status: Option<( Review Comment: In a hash join, `ProcessProbeBatch` is solely responsible for tracking the join progress on the probe side. In contrast, `join_result_status` serves a broader purpose: it tracks progress for both the probe side and for the **unmatched rows** from the build side. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org