Re: [PR] fix: track join_arrays memory in reservation after SMJ spill [datafusion]

via GitHub Wed, 06 May 2026 11:31:29 -0700


mbutrovich commented on PR #21962:
URL: https://github.com/apache/datafusion/pull/21962#issuecomment-4390861669


   Thanks for turning this around quickly, the force-grow version is easier to 
reason about than the conditional one.
   
   A few things I wanted to ask about before signing off:
   
   **grow() can push over the pool limit**
   
   `reservation.grow()` is infallible, so in the no-headroom case (which 
`spill_join_arrays_no_headroom` exercises) we'll end up with `pool.reserved() > 
memory_limit`. I think that's the right call since the memory is physically 
there, but it does mean the next operator to call `try_grow` sees a pool that's 
already over limit and fails. Could we add a one-line comment at the `grow()` 
site noting this is intentional? Future readers running into a "why is the pool 
over limit" question will appreciate the pointer.
   
   **Recomputing join_arrays memory**
   
   `BufferedBatch::new` already sums `get_array_memory_size()` over 
`join_arrays` as part of `size_estimation` (line 262-265). `join_arrays_mem()` 
walks the same arrays again on the spill path. Cheap, but would you be open to 
caching the sum as a field during construction? One less place for the two size 
calculations to drift if someone touches the estimation formula later.
   
   **try_shrink vs shrink on free**
   
   Given the invariant that we only shrink by what we grew, would `shrink()` 
read more symmetrically with the force-grow on the alloc side? The `?` on 
`try_shrink(reserved_amount)` is propagating a `Result` that shouldn't be 
reachable. Not a blocker, just noticed the asymmetry.
   
   **peak_mem assertion strength**
   
   The two new tests check `peak_mem > 0`, which catches the "never tracked at 
all" regression nicely. Would `peak_mem >= join_arrays_mem` be worth tightening 
to? That would also catch a future bug where we track something but not the 
right amount. Happy to leave as-is if you think the current form is enough.
   
   Perf-wise I looked at this against #20729 and I don't think the concerns 
there apply here. The extra `grow()` only fires on the spill branch, and 
spilling already pays for disk I/O, so one more pool call on a custom backend 
is in the noise. Non-spill workloads see the same pool-call count as before.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] fix: track join_arrays memory in reservation after SMJ spill [datafusion]

Reply via email to