Mryange opened a new pull request, #63784:
URL: https://github.com/apache/doris/pull/63784

   ### What problem does this PR solve?
   
   Issue Number: None
   
   Related PR: None
   
   Problem Summary:
   
   This PR optimizes two hot BE string function families that spend significant 
time on repeated buffer growth and high-overhead character lookup.
   
   1. `repeat`
      - Replace the old per-row append loop with a shared two-pass execution 
path for both vector repeat counts and constant repeat counts.
      - Precompute `res_offsets` and total output size before writing results.
      - Write directly into `ColumnString::Chars` with 
`StringOP::fast_repeat()`.
   
   2. `trim_in` / `ltrim_in` / `rtrim_in`
      - Reuse SIMD-assisted ASCII symbol search for small trim sets.
      - Add reverse symbol search helpers for right trim.
      - Use a fixed-size UTF-8 small-set lookup path for common cases while 
preserving fallback paths for larger trim strings.
   
   3. Coverage and observability
      - Add a larger `repeat('a', 256)` correctness case.
      - Add `find_symbols` unit coverage for empty ranges, embedded NUL bytes, 
boundary lengths, and cross-check cases.
      - Add old/new microbenchmarks for both repeat and trim paths.
   
   ### Benchmark results
   
   Local Release benchmark results on this machine:
   
   #### repeat
   
   | Case | Old | New | Speedup |
   | --- | ---: | ---: | ---: |
   | RepeatVector 4096/16/8 | 86184 ns | 42878 ns | 2.01x |
   | RepeatVector 4096/16/64 | 411330 ns | 146498 ns | 2.81x |
   | RepeatVector 1024/128/16 | 84985 ns | 39620 ns | 2.15x |
   | RepeatVector 4096/0/64 | 222362 ns | 7382 ns | 30.12x |
   | RepeatConst 4096/16/8 | 127848 ns | 59495 ns | 2.15x |
   | RepeatConst 4096/16/64 | 764912 ns | 205294 ns | 3.73x |
   | RepeatConst 1024/128/16 | 158499 ns | 78090 ns | 2.03x |
   | RepeatConst 4096/0/64 | 395137 ns | 7107 ns | 55.60x |
   
   #### trim
   
   | Case | Old | New | Speedup |
   | --- | ---: | ---: | ---: |
   | ASCII, 4 trim chars, 65536 rows | 2398.526 us | 997.839 us | 2.40x |
   | ASCII, 8 trim chars, 65536 rows | 2452.269 us | 872.865 us | 2.81x |
   | ASCII, no match, 65536 rows | 318.894 us | 309.079 us | 1.03x |
   | UTF-8, 2 trim chars, 65536 rows | 7832.364 us | 6799.569 us | 1.15x |
   
   Note:
   The medium and large trim cases reproduced the expected improvement 
direction on this machine. One small-input case (`BM_TrimInAscii/1024`) was 
slightly slower at 0.81x, while the 4096-row and 65536-row cases remained 
clearly faster.
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [ ] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [ ] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [ ] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to