zhuqi-lucas commented on PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2804362823

   > I think `concat` followed by `sort` is slower in some cases because
   > 
   > * Concat involves copying the entire batch (rather than only the keys to 
be sorted)
   > * `sort_batch_stream` Can be slower as `lexsort_to_indices` is in cases 
with many columns slower than the Row Format
   > 
   > I think for `ExternalSorter` we don't want any additional parallelism as 
the sort is already executed per partition (so additional parallelism is likely 
to hurt rather than help).
   > 
   > The core improvements that I think are important:
   > 
   > * Minimizing copying of the input batches to one (only once for the output)
   > * Sorting once on the input batches rather than sort followed by merge
   > * A good heuristic on when to switch from `lexsort_to_indices`-like 
sorting to RowConverter + sorting.
   
   Good explain.
   > I think for ExternalSorter we don't want any additional parallelism as the 
sort is already executed per partition (so additional parallelism is likely to 
hurt rather than help).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to