Dandandan commented on PR #15380:
URL: https://github.com/apache/datafusion/pull/15380#issuecomment-2993451425

   Thank you @zhuqi-lucas for experimenting on this. Maybe it's a good idea to 
do some profiling to see the hots spots?
   
   For example, this is the profile I get from the sort-tpch benchmark.
   
   <img width="1728" alt="image" 
src="https://github.com/user-attachments/assets/88a72c7b-472e-438f-964b-ee43101df958";
 />
   
   * You can see here most of the work is concentrated in SortPreservingMerge, 
rather than the sorts, so perhaps in this case making the `SortExec` faster 
won't help a ton to improve the total performance. Maybe we can use 
`target_partitions=1` to concentrate more work on `SortExec` so we can have a 
look. 
   
   * I made a change here that's https://github.com/apache/arrow-rs/pull/7695 
that will probably help a quite a bit with the performance of 
`SortPreserveMergeExec` and`SortExec`, maybe we can look at where the next 
hotspots after this change, I think probably a lot in converting to `Row`, 
doing comparison on byte slices and doing allocations. But also some parts seem 
related that we don't handle views as efficiently as possible.
   
   * One example I see is for example we do call `.gc()` which currently does a 
not-fast implementation.
   
   <img width="1179" alt="image" 
src="https://github.com/user-attachments/assets/07e3de93-9b3d-4f63-8d08-c328b8e39f73";
 />
   
   
   * Another one, compare_unchecked:
   
   <img width="1070" alt="image" 
src="https://github.com/user-attachments/assets/fdddbf69-c176-4adc-9a05-c8e44c23ad3d";
 />
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to