kosiew commented on code in PR #21962:
URL: https://github.com/apache/datafusion/pull/21962#discussion_r3193919876


##########
datafusion/physical-plan/src/joins/sort_merge_join/materializing_stream.rs:
##########
@@ -978,6 +996,20 @@ impl MaterializingSortMergeJoinStream {
                             .unwrap(); // Operation only return None if no 
batches are spilled, here we ensure that at least one batch is spilled
 
                         buffered_batch.batch = 
BufferedBatchState::Spilled(spill_file);
+
+                        // Track remaining in-memory data (join key arrays) 
that
+                        // stay in memory even after the batch is spilled. 
This is
+                        // much smaller than the full batch, so try_grow should
+                        // usually succeed. If it fails, reserved_amount stays 
0 -
+                        // best-effort tracking, free_reservation will safely 
be a no-op.
+                        let join_arrays_mem = buffered_batch.join_arrays_mem();
+                        if self.reservation.try_grow(join_arrays_mem).is_ok() {

Review Comment:
   I think the spill path can still leave retained join key arrays invisible to 
the memory pool.
   
   Right now, if the full batch `try_grow(size_estimation)` fails because the 
pool is full, and the follow-up `try_grow(join_arrays_mem)` also fails, we 
spill the IPC batch but still push `buffered_batch` with `reserved_amount = 0`.
   
   At that point the operator is still holding the retained `join_arrays`, but 
the pool is no longer aware of them when making spill decisions for other 
operators. This seems like the same invariant violation we were trying to avoid.
   
   I think this can still happen with concurrent reservations or when the 
memory limit is below a single join-array allocation, and in those cases many 
skewed spilled batches could accumulate untracked memory.
   
   Can we make the retained in-memory portion accounted deterministically here? 
For example, by growing or resizing the reservation after the physical memory 
is retained, or by returning an error instead of continuing untracked.
   
   It would also be great to add a regression test that covers the no-headroom 
path where `try_grow(join_arrays_mem)` fails, since the current test only 
exercises the successful reservation case.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to