Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

via GitHub Sat, 13 Jan 2024 03:15:34 -0800


stefanvodita commented on PR #12966:
URL: https://github.com/apache/lucene/pull/12966#issuecomment-1890420978


   I've also run the benchmarks (`python3 src/python/localrun.py -source 
wikimediumall`). There is measurable regression in the 
`BrowseRandomLabelTaxoFacets` task, but not in other taxonomy tasks. The 
benchmarker also reports improvements in `PKLookup`, `Wildcard`, `Respell`, 
`Fuzzy2`, `Fuzzy1`.
   
   The regression in the taxo task is explained in the profiler. Boxing is not 
cheap:
   `11.24%        10402M        java.lang.Integer#valueOf()`
   
   @mikecan (thank you for the review!) - how should I interpret the other 
tasks which show a significant change? Are they just noisy?
   
   ```
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
        BrowseRandomLabelTaxoFacets        3.75      (1.8%)        3.53      
(1.6%)   -6.0% (  -9% -   -2%) 0.000
             OrHighMedDayTaxoFacets        1.35      (7.4%)        1.31      
(9.2%)   -2.7% ( -17% -   15%) 0.308
                             IntNRQ       21.64      (7.0%)       21.35      
(7.4%)   -1.3% ( -14% -   14%) 0.561
                         AndHighLow      366.49     (11.2%)      362.21     
(10.3%)   -1.2% ( -20% -   22%) 0.731
                       OrHighNotLow      271.40      (5.3%)      269.03      
(4.5%)   -0.9% ( -10% -    9%) 0.573
                            LowTerm      604.77      (5.9%)      599.96      
(4.8%)   -0.8% ( -10% -   10%) 0.640
                         TermDTSort      140.65      (2.3%)      139.58      
(1.4%)   -0.8% (  -4% -    3%) 0.210
                        LowSpanNear        5.00      (2.8%)        4.96      
(4.1%)   -0.7% (  -7% -    6%) 0.522
                       HighSpanNear        4.77      (3.0%)        4.74      
(3.6%)   -0.7% (  -7% -    6%) 0.522
                        MedSpanNear       11.24      (2.1%)       11.18      
(2.5%)   -0.6% (  -5% -    4%) 0.432
                          MedPhrase      242.61      (2.2%)      241.23      
(2.0%)   -0.6% (  -4% -    3%) 0.386
                         HighPhrase       83.17      (2.1%)       82.75      
(2.9%)   -0.5% (  -5% -    4%) 0.538
                      OrHighNotHigh      160.48      (4.5%)      159.81      
(3.5%)   -0.4% (  -8% -    7%) 0.744
              HighTermDayOfYearSort      215.60      (2.2%)      214.81      
(2.0%)   -0.4% (  -4% -    3%) 0.576
                    MedSloppyPhrase       14.07      (2.0%)       14.03      
(2.4%)   -0.3% (  -4% -    4%) 0.655
                          LowPhrase       21.15      (1.3%)       21.09      
(1.5%)   -0.3% (  -3% -    2%) 0.508
           AndHighHighDayTaxoFacets       10.49      (1.2%)       10.46      
(1.6%)   -0.3% (  -3% -    2%) 0.547
                   HighSloppyPhrase       13.80      (3.0%)       13.77      
(3.1%)   -0.3% (  -6% -    5%) 0.791
                            MedTerm      479.88      (5.1%)      478.82      
(4.8%)   -0.2% (  -9% -   10%) 0.887
                       OrHighNotMed      329.08      (4.5%)      328.39      
(3.5%)   -0.2% (  -7% -    8%) 0.870
                           HighTerm      264.78      (5.3%)      264.27      
(5.2%)   -0.2% ( -10% -   10%) 0.908
                  HighTermMonthSort     1930.74      (4.4%)     1928.03      
(5.2%)   -0.1% (  -9% -    9%) 0.926
                       OrNotHighMed      217.72      (2.9%)      217.51      
(2.2%)   -0.1% (  -5% -    5%) 0.905
               MedTermDayTaxoFacets       16.72      (2.1%)       16.71      
(1.7%)   -0.1% (  -3% -    3%) 0.892
          BrowseDayOfYearSSDVFacets        4.12      (2.7%)        4.11      
(2.9%)   -0.1% (  -5% -    5%) 0.931
               BrowseDateTaxoFacets        4.68      (5.1%)        4.67      
(4.6%)   -0.1% (  -9% -   10%) 0.970
                      OrNotHighHigh      231.09      (4.5%)      230.99      
(3.5%)   -0.0% (  -7% -    8%) 0.975
            AndHighMedDayTaxoFacets       16.88      (1.1%)       16.88      
(1.5%)   -0.0% (  -2% -    2%) 0.963
          BrowseDayOfYearTaxoFacets        4.76      (5.2%)        4.76      
(4.6%)    0.0% (  -9% -   10%) 1.000
                       OrNotHighLow      464.54      (2.6%)      464.56      
(2.3%)    0.0% (  -4% -    5%) 0.995
               HighIntervalsOrdered        1.81      (4.6%)        1.81      
(5.0%)    0.0% (  -9% -   10%) 0.990
               HighTermTitleBDVSort        5.39      (4.8%)        5.40      
(4.4%)    0.1% (  -8% -    9%) 0.968
              BrowseMonthSSDVFacets        4.40      (2.6%)        4.40      
(2.6%)    0.1% (  -4% -    5%) 0.873
                MedIntervalsOrdered        1.84      (5.5%)        1.84      
(5.8%)    0.2% ( -10% -   12%) 0.918
                LowIntervalsOrdered       32.12      (5.4%)       32.18      
(5.6%)    0.2% ( -10% -   11%) 0.913
                          OrHighMed       67.77      (3.1%)       67.97      
(3.4%)    0.3% (  -5% -    6%) 0.779
        BrowseRandomLabelSSDVFacets        2.89      (2.0%)        2.90      
(1.4%)    0.3% (  -3% -    3%) 0.569
              BrowseMonthTaxoFacets        9.36     (10.9%)        9.40     
(10.4%)    0.4% ( -18% -   24%) 0.896
                  HighTermTitleSort      132.89      (1.9%)      133.56      
(3.9%)    0.5% (  -5% -    6%) 0.600
                         OrHighHigh       20.24      (3.5%)       20.37      
(3.9%)    0.6% (  -6% -    8%) 0.608
                         AndHighMed       81.65      (8.6%)       82.65      
(9.8%)    1.2% ( -15% -   21%) 0.676
                    LowSloppyPhrase        4.92      (5.9%)        5.01      
(6.4%)    1.6% ( -10% -   14%) 0.397
               BrowseDateSSDVFacets        1.20     (11.5%)        1.22      
(9.1%)    2.1% ( -16% -   25%) 0.529
                            Prefix3      138.46      (4.9%)      141.54      
(4.5%)    2.2% (  -6% -   12%) 0.138
                          OrHighLow      167.60      (7.5%)      171.65      
(4.2%)    2.4% (  -8% -   15%) 0.211
                           PKLookup      169.39      (4.5%)      174.22      
(4.5%)    2.9% (  -5% -   12%) 0.043
                        AndHighHigh       31.23      (9.5%)       32.15     
(12.4%)    2.9% ( -17% -   27%) 0.399
                           Wildcard       66.79      (3.4%)       69.28      
(3.6%)    3.7% (  -3% -   11%) 0.001
                            Respell       48.03      (2.0%)       50.35      
(2.3%)    4.8% (   0% -    9%) 0.000
                             Fuzzy2       68.13      (1.3%)       71.67      
(1.4%)    5.2% (   2% -    7%) 0.000
                             Fuzzy1       74.70      (1.5%)       79.47      
(1.8%)    6.4% (   3% -    9%) 0.000
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Reduce duplication in taxonomy facets; always do counts [lucene]

Reply via email to