2010YOUY01 opened a new issue, #15538: URL: https://github.com/apache/datafusion/issues/15538
### Is your feature request related to a problem or challenge? When running sort queries with large `LIMIT` and small memory limit, it can go out of memory ``` select * from tbl order by c1 limit 1000000000 ``` It's possible to enable spilling capability inside `TopK` data structure to let such queries complete. ### Describe the solution you'd like ### Approach 1 Add back `fetch` field back to `ExternalSorter` to support external TopK queries, which is removed in https://github.com/apache/datafusion/pull/15525 (because it's unused now) This approach can be slightly faster, but requires to add a configuration option to switch to `ExternalSorter` path for large `LIMIT`, instead of the default `TopK` path. (or let optimizer figure out when to switch automatically, though it's also tricky) ### Approach 2 Add spilling capability inside [TopK](https://github.com/apache/datafusion/blob/main/datafusion/physical-plan/src/topk/mod.rs) executor. When the memory limit is reached, it can fallback to out-of-core execution without introducing a new configuration. ### Describe alternatives you've considered _No response_ ### Additional context The sort + limit query is usually run with a small `LIMIT` count, so it's mostly memory-efficient. @alamb is referring to this issue as a sort of exploratory idea, so perhaps someone with real usage knows better how to get it implemented 🤔 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org