Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

via GitHub Wed, 20 Dec 2023 08:08:59 -0800


easyice commented on PR #12954:
URL: https://github.com/apache/lucene/pull/12954#issuecomment-1864749851


   
   I took several hours to confirm the results, the benchmark shows it became 
faster, this exceeded my expectation, we think the speedup is due to remove the 
loop that initializes the `freqBuffer` to 1 in `reset()` like below:
   
   ```
         if (indexHasFreq == false || needsFreq == false) {
           for (int i = 0; i < ForUtil.BLOCK_SIZE; ++i) {
             freqBuffer[i] = 1;
           }
         }
   ```
   
   Since if we always allocate the 128-size `freqBuffer` for this PR, the 
benchmark shows it still has a speedup. therefore, performance improvement has 
no relevance to reducing memory allocation. so maybe we can consider the other 
approach: try to avoid the for-loop in `reset()` if the instance can be reused. 
thanks for the suggestions from @gf2121 when i investigating the cause of the 
performance speedup.
   
   
   Benchmark output for the PR(using `wikimediumall`):
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                   HighSloppyPhrase        0.38      (5.3%)        0.37      
(4.6%)   -1.6% ( -10% -    8%) 0.323
                            MedTerm      226.03      (4.8%)      223.08      
(5.2%)   -1.3% ( -10% -    9%) 0.409
               HighIntervalsOrdered        2.19      (6.4%)        2.17      
(6.5%)   -0.9% ( -12% -   12%) 0.676
                    MedSloppyPhrase       18.52      (2.7%)       18.39      
(2.5%)   -0.7% (  -5% -    4%) 0.402
                             Fuzzy2       36.10      (1.8%)       35.86      
(1.6%)   -0.7% (  -3% -    2%) 0.219
                             Fuzzy1       43.40      (1.6%)       43.16      
(1.7%)   -0.6% (  -3% -    2%) 0.276
                            Respell       21.69      (1.8%)       21.58      
(1.8%)   -0.5% (  -4% -    3%) 0.375
                            LowTerm      232.03      (3.0%)      231.08      
(2.9%)   -0.4% (  -6% -    5%) 0.659
                    LowSloppyPhrase       18.26      (2.0%)       18.20      
(2.1%)   -0.3% (  -4% -    3%) 0.660
                           HighTerm      267.11      (5.2%)      266.50      
(5.6%)   -0.2% ( -10% -   11%) 0.893
                       HighSpanNear        1.85      (5.7%)        1.84      
(6.7%)   -0.2% ( -11% -   12%) 0.935
                       OrHighNotLow      167.52      (5.7%)      167.26      
(5.6%)   -0.2% ( -10% -   11%) 0.931
               HighTermTitleBDVSort        1.90      (3.7%)        1.90      
(4.5%)   -0.1% (  -7% -    8%) 0.915
                MedIntervalsOrdered        7.07      (3.4%)        7.06      
(3.8%)   -0.1% (  -7% -    7%) 0.910
                        MedSpanNear       24.97      (2.1%)       24.94      
(2.7%)   -0.1% (  -4% -    4%) 0.874
                         HighPhrase       10.67      (6.0%)       10.66      
(5.7%)   -0.1% ( -11% -   12%) 0.950
                          LowPhrase        4.70      (4.0%)        4.70      
(3.8%)   -0.0% (  -7% -    8%) 0.979
                       OrHighNotMed      130.98      (6.2%)      131.01      
(6.1%)    0.0% ( -11% -   13%) 0.989
                      OrNotHighHigh      171.61      (5.4%)      171.67      
(5.3%)    0.0% ( -10% -   11%) 0.984
                LowIntervalsOrdered       28.65      (4.3%)       28.68      
(4.3%)    0.1% (  -8% -    9%) 0.947
                         OrHighHigh       18.94      (2.9%)       19.00      
(3.6%)    0.3% (  -5% -    6%) 0.766
                      OrHighNotHigh      125.97      (5.5%)      126.41      
(6.0%)    0.3% ( -10% -   12%) 0.848
                       OrNotHighMed      181.48      (4.0%)      182.38      
(3.6%)    0.5% (  -6% -    8%) 0.679
                        LowSpanNear        6.89      (2.5%)        6.93      
(3.2%)    0.6% (  -5% -    6%) 0.516
                          MedPhrase      110.79      (2.8%)      111.45      
(2.9%)    0.6% (  -5% -    6%) 0.515
                          OrHighMed       38.51      (2.4%)       38.79      
(2.1%)    0.7% (  -3% -    5%) 0.311
                         AndHighMed       40.73      (2.4%)       41.06      
(2.5%)    0.8% (  -4% -    5%) 0.304
                         TermDTSort       74.72      (4.1%)       75.32      
(2.6%)    0.8% (  -5% -    7%) 0.460
                        AndHighHigh       10.24      (5.6%)       10.33      
(3.9%)    0.8% (  -8% -   11%) 0.600
                  HighTermMonthSort     1071.18      (2.9%)     1079.84      
(4.5%)    0.8% (  -6% -    8%) 0.499
                         AndHighLow      167.91      (5.2%)      170.10      
(5.6%)    1.3% (  -9% -   12%) 0.446
                             IntNRQ       13.84      (4.0%)       14.05      
(3.3%)    1.5% (  -5% -    9%) 0.208
                       OrNotHighLow      241.06      (4.5%)      244.91      
(4.8%)    1.6% (  -7% -   11%) 0.276
                          OrHighLow      175.82      (4.0%)      178.70      
(3.5%)    1.6% (  -5% -    9%) 0.174
              HighTermDayOfYearSort      188.33      (3.3%)      191.56      
(3.7%)    1.7% (  -5% -    8%) 0.121
                            Prefix3      391.46      (4.0%)      404.16      
(3.5%)    3.2% (  -4% -   11%) 0.006
                  HighTermTitleSort       87.08      (3.9%)       91.84      
(4.1%)    5.5% (  -2% -   14%) 0.000
                           Wildcard       33.16      (3.3%)       37.56      
(3.3%)   13.3% (   6% -   20%) 0.000
                           PKLookup       92.19      (1.8%)      104.90      
(2.8%)   13.8% (   9% -   18%) 0.000
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Reduce frequencies buffer size when they are not needed [lucene]

Reply via email to