Lordworms commented on PR #14149: URL: https://github.com/apache/datafusion/pull/14149#issuecomment-2597120718
> I believe this feature is important to external sorting's performance, thank you. Left some suggestions. > > > I got two following PR for implement SortPreservingMergeStream in Row format and change the logics in SortExec > > Perhaps first let `SPM`'s input and output both support `Rows` format? This seems easier to do because only one operator needs to be changed. And larger sort query includes two levels of of `SPM`, we can get some performance improvement from it > I believe this feature is important to external sorting's performance, thank you. Left some suggestions. > > > I got two following PR for implement SortPreservingMergeStream in Row format and change the logics in SortExec > > Perhaps first let `SPM`'s input and output both support `Rows` format? This seems easier to do because only one operator needs to be changed. And larger sort query includes two levels of of `SPM`, we can get some performance I think we have to both change GroupHashExec and SortExec as well since these two Executions are using column format right now. > improvement from it Also since we keep column format for single column sort, I'm not sure whether change SortPreservingMergeStream should be a good choice over adding RowformatMergeStream. Kind of hard to choose here -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org