rluvaton opened a new pull request, #15700: URL: https://github.com/apache/datafusion/pull/15700
## Which issue does this PR close? - Closes #14692. ## Rationale for this change We need merge sort that does not fail with out of memory ## What changes are included in this PR? Implemented multi level merge sort on top of `SortPreservingMergeStream` that spill intermediate result when not enough memory. **How does it work:** When using the `MultiLevelMerge` you provide in memory streams and spill files, each spill file contain the memory size of the record batch with the largest memory consumption. **Why is this important?** `SortPreservingMergeStream` uses [`BatchBuilder`](https://github.com/apache/datafusion/blob/172cf8d8700dfcb62015f567e56e0bff27926812/datafusion/physical-plan/src/sorts/builder.rs) which grow and shrink memory based on the record batches that it get. however if there is not enough memory it will just fail. this solution will reserve beforehand for each spill file the worst case scenerio for the record batch size so there will be no way that there is not enough memory mid sorting. it will also try to reduce buffer size and number of streams to the minimum when there is not enough memory and will only fail if there is not enough memory for holding 2 record batches with no buffering in the stream It can also be easily adjusted to allow for predefined maximum memory to merge stream ## Are these changes tested? Existing tests ## Are there any user-facing changes? not really ------ Related to #15610 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org