Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

via GitHub Tue, 25 Nov 2025 09:20:56 -0800


LiaCastaneda commented on issue #16841:
URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3576710820


   Makes sense, that seems like the ideal final solution. Although, apart from 
redefining the MemoryPool model as it is, I imagine this would require multiple 
PRs across Arrow and DataFusion, maybe its worth breaking into steps or middle 
ground solutions? For instance, for now we're probably OK not accounting for 
every single Array used in DataFusion, since we don't do that in the current 
model anyways - only memory in specific operators is reported to the DF 
MemoryPool. 
   
   I think a middle ground solution could be to use something like an 
ArrowMemoryPool integration (proposed in #18928) in operators that already 
track memory (like GroupedHashAggregateStream, HashJoinExec, etc.) to fix in 
the short/middle term the buffer overaccounting problem we are seeing


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] [DISCUSSION] Memory accounting model discussion [datafusion]

Reply via email to