bharath-techie commented on issue #19216:
URL: https://github.com/apache/datafusion/issues/19216#issuecomment-3633472687

   Hi @alamb @zhuqi-lucas ,
   We are doing similar experiments to run clickbench queries with datafusion 
in lower memory instances.
   
   Not sure if we have an EPIC to track all issues in common place. 
   
   What we noticed is that topK doesn't spill and hence all clickbench 
`groupBy` queries with `OrderBy` + `Limit` even with single target partition 
such as 
   Q13
   ```
   SELECT "SearchPhrase", COUNT(DISTINCT "UserID") AS u FROM hits WHERE 
"SearchPhrase" <> '' GROUP BY "SearchPhrase" ORDER BY u DESC LIMIT 10;
   ```
   
   Q33
   ```
   SELECT "URL", COUNT(*) AS c FROM hits GROUP BY "URL" ORDER BY c DESC LIMIT 
10;
   ```
   also fail with out of memory error for < 8 GB RAM allocated in DF-cli. 
   [ github.com/apache/datafusion/issues/9417 might be relevant issue ]
   
   @alchemist51 and I've been looking into improving queries in this area.
   
   @alchemist51 is looking into reviving 
https://github.com/apache/datafusion/pull/15591 and
    
   I was able to get a working spill in my fork for `topK` operator - 
https://github.com/bharath-techie/datafusion/tree/spilltest 
   
   Can you please share your views / suggestions on the same ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to