comphead commented on code in PR #20482:
URL: https://github.com/apache/datafusion/pull/20482#discussion_r2842711380


##########
datafusion/physical-plan/src/joins/sort_merge_join/stream.rs:
##########
@@ -62,14 +62,16 @@ use 
datafusion_physical_expr_common::physical_expr::PhysicalExprRef;
 use futures::{Stream, StreamExt};
 
 /// State of SMJ stream
-#[derive(Debug, PartialEq, Eq)]
+#[derive(Debug, PartialEq, Eq, Clone)]
 pub(super) enum SortMergeJoinState {
     /// Init joining with a new streamed row or a new buffered batches
     Init,
     /// Polling one streamed row or one buffered batch, or both
     Polling,
     /// Joining polled data and making output
     JoinOutput,
+    /// Emit ready data if have any
+    EmitReady { next_state: Box<SortMergeJoinState> },

Review Comment:
   the `EmitReady` somewhat repeats `JoinOutput`.
   `JoinOutput` slightly tricky as it includes 2 sub states internally -> 
partial join and then output. 
   
   
   btw from this comment 
   ```
                           // For non-filtered joins, only output if we have a 
completed batch
                           // (opportunistic output when target batch size is 
reached)
   ```
   looks like the idea was correct releasing batches for nonfiltered SMJ as 
soon as they matched but somehow it didn't work. 
   
   Introducing new state would make the entire complicated mechanism even more 
complicated, however memory leak is important thing to consider. @rluvaton do 
you feel we can reuse `JoinOutput` state and see what is holding batches to be 
released? 
   
   If we can do it fast then nice, if no, we can go with this PR and refactor 
this later. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to