alamb commented on PR #15355: URL: https://github.com/apache/datafusion/pull/15355#issuecomment-2749165158
> > > 3. After we have collected 1MB of merged batch, one spill will be triggered. And this 1MB space will be cleared, the merging can continue. > > > **Inefficency:** Now `ExternalSorter` will create a new spill file for those 1MB merged batches, after spilling all intermediates, all spilled files will be merged at once, then there are too many files to merge. > > > **Ideal case:** All batches in a single sorted run can be incrementally appended to a single file. > > > > > > It seems to be a regression introduced by #14823. > > That's true, so I feel obligated to fix it. @2010YOUY01 is this something that should be tracked with a follow on ticket? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org