LiaCastaneda commented on issue #16841: URL: https://github.com/apache/datafusion/issues/16841#issuecomment-3576710820
Makes sense, that seems like the ideal final solution. Although, apart from redefining the MemoryPool model as it is, I imagine this would require multiple PRs across Arrow and DataFusion, maybe its worth breaking into steps or middle ground solutions? For instance, for now we're probably OK not accounting for every single Array used in DataFusion, since we don't do that in the current model anyways - only memory in specific operators is reported to the DF MemoryPool. I think a middle ground solution could be to use something like an ArrowMemoryPool integration (proposed in #18928) in operators that already track memory (like GroupedHashAggregateStream, HashJoinExec, etc.) to fix in the short/middle term the buffer overaccounting problem we are seeing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
