Re: [I] Sort ClickBench data using 4GB on standard laptop (spilling) [datafusion]

via GitHub Wed, 10 Dec 2025 03:52:17 -0800


alamb commented on issue #19216:
URL: https://github.com/apache/datafusion/issues/19216#issuecomment-3636719135


   > For the above URL query, we have ~ 4 GB record batch in 
groupByHashAggregate which gets counted for each record batch that got added to 
topK
   
   Ah. that makes sense. 
   
   > I think @Dandandan 
https://github.com/apache/datafusion/issues/9417#issuecomment-2431943283 force 
compaction when reaching memory limit, should we try that?
   
   Yes I think that would be a much better approach than trying to spill the 
entire 4GB batch (because the topk operator is only keeping a small number of 
rows -- 10, spilling 4GB just to read it back, is totally non ideal)
   
   
   > https://github.com/apache/datafusion/pull/15591 could help as well or do 
we have any latest issues which can help here ?
   
   The issue that was tracking that is:
   - https://github.com/apache/datafusion/issues/7065
   
   However, that issue focuses more on performance rather than memory pressure 
coming from single large allocations. It is probably a good idea to file a new 
issue that focuses on the memory pressure area rather than performance
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Sort ClickBench data using 4GB on standard laptop (spilling) [datafusion]

Reply via email to