Lordworms commented on PR #14149:
URL: https://github.com/apache/datafusion/pull/14149#issuecomment-2597120718

   > I believe this feature is important to external sorting's performance, 
thank you. Left some suggestions.
   > 
   > > I got two following PR for implement SortPreservingMergeStream in Row 
format and change the logics in SortExec
   > 
   > Perhaps first let `SPM`'s input and output both support `Rows` format? 
This seems easier to do because only one operator needs to be changed. And 
larger sort query includes two levels of of `SPM`, we can get some performance 
improvement from it
   
   
   
   > I believe this feature is important to external sorting's performance, 
thank you. Left some suggestions.
   > 
   > > I got two following PR for implement SortPreservingMergeStream in Row 
format and change the logics in SortExec
   > 
   > Perhaps first let `SPM`'s input and output both support `Rows` format? 
This seems easier to do because only one operator needs to be changed. And 
larger sort query includes two levels of of `SPM`, we can get some performance 
   
   I think we have to both change GroupHashExec and SortExec as well since 
these two Executions are using column format right now.
   
   > improvement from it
   Also since we keep column format for single column sort, I'm not sure 
whether change SortPreservingMergeStream should be a good choice over adding 
RowformatMergeStream. Kind of hard to choose here
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to