2010YOUY01 opened a new issue, #15538:
URL: https://github.com/apache/datafusion/issues/15538

   ### Is your feature request related to a problem or challenge?
   
   When running sort queries with large `LIMIT` and small memory limit, it can 
go out of memory
   ```
   select *
   from tbl
   order by c1
   limit 1000000000
   ```
   
   It's possible to enable spilling capability inside `TopK` data structure to 
let such queries complete.
   
   ### Describe the solution you'd like
   
   ### Approach 1
   Add back `fetch` field back to `ExternalSorter` to support external TopK 
queries, which is removed in https://github.com/apache/datafusion/pull/15525 
(because it's unused now)
   This approach can be slightly faster, but requires to add a configuration 
option to switch to `ExternalSorter` path for large `LIMIT`, instead of the 
default `TopK` path. (or let optimizer figure out when to switch automatically, 
though it's also tricky)
   
   ### Approach 2
   Add spilling capability inside 
[TopK](https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/topk/mod.rs)
 executor. When the memory limit is reached, it can fallback to out-of-core 
execution without introducing a new configuration.
   
   
   
   ### Describe alternatives you've considered
   
   _No response_
   
   ### Additional context
   
   The sort + limit query is usually run with a small `LIMIT` count, so it's 
mostly memory-efficient.
   @alamb is referring to this issue as a sort of exploratory idea, so perhaps 
someone with real usage knows better how to get it implemented 🤔 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to