2010YOUY01 commented on PR #15610: URL: https://github.com/apache/datafusion/pull/15610#issuecomment-2791500223
Thank you all for the review! @qstommyshu I agree with the implementation-level feedbacks. I will address them in the refactor. @alamb Regarding parallel merging: I was thinking if `max_spill_perge_degree` configured to 10, than the memory is limited so that in each partition, we can only hold 10 batches at the same time, so parallel merging is not possible in this case. However, @rluvaton 's PR has inspired me that, it's possible each operator is able to hold 100 batches under the memory limit at the same time, but we might still want to merge them 10 at a time for performance. I think the next steps are 1. Contribute benchmarks for external sort. 2. Refactor this PR to avoid always re-spill, also do parallel merging when possible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org