alamb commented on issue #19216: URL: https://github.com/apache/datafusion/issues/19216#issuecomment-3636719135
> For the above URL query, we have ~ 4 GB record batch in groupByHashAggregate which gets counted for each record batch that got added to topK Ah. that makes sense. > I think @Dandandan https://github.com/apache/datafusion/issues/9417#issuecomment-2431943283 force compaction when reaching memory limit, should we try that? Yes I think that would be a much better approach than trying to spill the entire 4GB batch (because the topk operator is only keeping a small number of rows -- 10, spilling 4GB just to read it back, is totally non ideal) > https://github.com/apache/datafusion/pull/15591 could help as well or do we have any latest issues which can help here ? The issue that was tracking that is: - https://github.com/apache/datafusion/issues/7065 However, that issue focuses more on performance rather than memory pressure coming from single large allocations. It is probably a good idea to file a new issue that focuses on the memory pressure area rather than performance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
