zhuqi-lucas commented on PR #15348: URL: https://github.com/apache/datafusion/pull/15348#issuecomment-2743243018
> Thank you for the work on better Utf8View support. I tried one sort benchmark with sort-preserving merging on a single `Utf8View` column, but it gets slower: > > Reproducer > > ``` > cargo run --profile release-nonlto --bin dfbench -- sort-tpch -p /Users/yongting/Code/datafusion/benchmarks/data/tpch_sf10 -q 3 > ``` > > main: 8s pr: 10s > > According to the flamegraph, an extra overhead of `libsystem_platform.dylib_platform_memcmp` showed up inside `SortPreservingMergeStream` It's not obvious why, I'll try to help figure it out later. > > [flamegraphs.zip](https://github.com/user-attachments/files/19388551/flamegraphs.zip) Thank you @2010YOUY01 for review, i may know the problem about the above Reproducer: 1. The q3 sort bench mark is a special case sort by l_comment which is always long string larger than 12 bytes, meanwhile it has many case with same prefix, it means the 4 bytes view are also same, so the compare logic will go to the last part to compare the buffer, it will make the compare regression. 2. You can try to sort the normal case which the string is mostly less than 12 bytes, and even larger than 12 bytes, we also will optimize use the 4 bytes view to compare, for example change the q3 to sql which will use the normal string to order by: ```rust SELECT l_shipmode, l_comment, l_partkey FROM lineitem ORDER BY l_shipmode; ``` It will show the performance improvement. And finally, i think we need to create a follow-up ticket to improve and investigate the regression case. It's will be valuable for us to improve it. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org