viirya commented on code in PR #10304:
URL: https://github.com/apache/datafusion/pull/10304#discussion_r1599088624
##########
datafusion/physical-plan/src/joins/sort_merge_join.rs:
##########
@@ -991,6 +992,9 @@ impl SMJStream {
Ordering::Equal => {
if matches!(self.join_type, JoinType::LeftSemi) {
join_streamed = !self.streamed_joined;
+ // if the join filter specified there can be references to
buffered columns
+ // so its needed to join them
+ join_buffered = self.filter.is_some();
Review Comment:
I think above test doesn't hit the case I mean.
If last join loop already joined streamed row and buffered rows and the
output size reaches batch size, the loop will exit and go to output joined
paired.
After that, next join loop gets started again.
Now the `current_ordering` is `Equal` and `self.streamed_joined` is true,
the loop will join nulls and buffered rows. As the left side are nulls, the
join filter doesn't work.
This is not caught by the test you posted, because all joined pairs will be
output in one batch, I think.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]