ding-young commented on PR #16192:
URL: https://github.com/apache/datafusion/pull/16192#issuecomment-2936483760

   @2010YOUY01 
   Hi, I’ve been struggling a bit with tracking peak memory in SPM step, and I 
was wondering if I could ask for some help.
   
   ### 1. Can we add the memory for converted (row) batches to previous 
`peak_mem_used`?
   Since `ExternalSorter` creates `SortPreservingMergeStream` for 2nd step, 
SPM, so I tried updating the peak memory metric inside `maybe_poll_stream` in 
`SortPreservingMergeStream` (which internally calls `poll_next` where 
`convert_batch` is done, and pushes batches into a `BatchBuilder`). 
   But here’s my concern: if we keep adding the new reservation from this 
second step to the previous peak memory value, we might be overestimating. 
That’s because by the time the second step runs, some batches from the first 
step might have already been dropped. So, summing them might inflate the 
reported peak memory.
   I tried printing the total reserved size from the global memory pool 
manually (with tons of `println`) during execution, and it seems like there was 
a difference between the first and second steps, but it didn’t seem as large as 
the total size of all converted batches combined.
   
   ### 2. Parent Operator's memory reservation 
   Also, when the parent operator (e.g., `SortPreservingMergeExec`) executes, 
the reservation created by the earlier `SortExec` is not yet released. In this 
case, should `SortPreservingMergeExec` only track the peak memory of its own 
reservation? 
   
   And please let me know if I’ve misunderstood when the reservation is 
supposed to be dropped. Maybe that’s where my confusion is coming from.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to