Re: [PR] Add support for intra-segment search concurrency [lucene]

via GitHub Thu, 05 Sep 2024 09:04:12 -0700


javanna commented on PR #13542:
URL: https://github.com/apache/lucene/pull/13542#issuecomment-2332114836


   Hey all, I have done some benchmarking with two main goals: 
   
   1) ensure there are no regressions introduced by the proposed change 
   2) ensure there is some performance gain when intra-segment is activated, as 
basic as its support is in this initial proposed step.
   
   
   I ran `wikimediumall` benchmarks with the default parameters and manually 
added count queries to the tasks executed. The default search concurrency is 
automatic, meaning it will create an executor based on the number of CPUs 
available. The index is not force merged, there are multiple segments.
   
   The first run is main (baseline) against my current branch 
(my_modified_version):
   
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                          CountTerm     4252.98      (6.3%)     3928.20      
(5.2%)   -7.6% ( -18% -    4%) 0.000
               HighIntervalsOrdered       18.28      (5.7%)       17.80      
(5.4%)   -2.7% ( -12% -    8%) 0.129
                     CountOrHighMed      319.35      (5.3%)      311.55      
(4.5%)   -2.4% ( -11% -    7%) 0.118
                  HighTermMonthSort     1225.80      (3.8%)     1200.00      
(5.3%)   -2.1% ( -10% -    7%) 0.148
                        LowSpanNear       17.17      (5.3%)       16.89      
(4.4%)   -1.7% ( -10% -    8%) 0.280
                MedIntervalsOrdered       62.72      (4.4%)       61.69      
(5.7%)   -1.6% ( -11% -    8%) 0.310
                             Fuzzy1       65.29      (6.7%)       64.36      
(4.6%)   -1.4% ( -11% -   10%) 0.431
               MedTermDayTaxoFacets       17.35      (6.9%)       17.11      
(7.0%)   -1.4% ( -14% -   13%) 0.537
               HighTermTitleBDVSort       19.73      (5.3%)       19.50      
(4.1%)   -1.2% ( -10% -    8%) 0.441
                         AndHighLow      954.69      (4.2%)      943.83      
(4.1%)   -1.1% (  -9% -    7%) 0.384
          BrowseDayOfYearTaxoFacets        3.87      (8.4%)        3.83      
(4.9%)   -1.1% ( -13% -   13%) 0.603
               BrowseDateTaxoFacets        3.83      (8.8%)        3.79      
(5.9%)   -1.1% ( -14% -   14%) 0.641
           AndHighHighDayTaxoFacets       11.99      (4.9%)       11.86      
(6.5%)   -1.1% ( -11% -   10%) 0.556
                          LowPhrase      176.29      (3.3%)      174.85      
(3.7%)   -0.8% (  -7% -    6%) 0.460
        BrowseRandomLabelTaxoFacets        3.20      (3.9%)        3.18      
(4.3%)   -0.8% (  -8% -    7%) 0.547
                  HighTermTitleSort       97.01      (5.0%)       96.30      
(4.6%)   -0.7% (  -9% -    9%) 0.628
                           HighTerm      410.73      (5.6%)      407.91      
(7.1%)   -0.7% ( -12% -   12%) 0.736
                         HighPhrase       68.29      (4.3%)       67.87      
(4.0%)   -0.6% (  -8% -    7%) 0.641
                    CountOrHighHigh       34.41     (25.3%)       34.22     
(20.2%)   -0.5% ( -36% -   60%) 0.942
                      OrHighNotHigh      326.55      (6.4%)      324.92      
(5.6%)   -0.5% ( -11% -   12%) 0.793
                         AndHighMed      232.64      (3.5%)      231.61      
(4.8%)   -0.4% (  -8% -    8%) 0.739
                           PKLookup      163.46      (8.6%)      162.75      
(6.9%)   -0.4% ( -14% -   16%) 0.860
                       OrNotHighLow     1042.34      (3.9%)     1039.01      
(4.2%)   -0.3% (  -8% -    8%) 0.803
                   HighSloppyPhrase       19.17      (4.4%)       19.12      
(5.7%)   -0.3% (  -9% -   10%) 0.855
              BrowseMonthTaxoFacets        4.08      (5.0%)        4.07      
(7.4%)   -0.2% ( -11% -   12%) 0.908
                       HighSpanNear       17.99      (5.9%)       18.00      
(6.0%)    0.0% ( -11% -   12%) 0.984
             OrHighMedDayTaxoFacets        2.49      (7.3%)        2.50      
(6.7%)    0.2% ( -12% -   15%) 0.936
                       OrNotHighMed      296.13      (5.0%)      297.20      
(5.9%)    0.4% (  -9% -   11%) 0.833
                          OrHighMed      350.64      (4.2%)      352.22      
(4.6%)    0.5% (  -8% -    9%) 0.748
                          MedPhrase       60.88      (3.8%)       61.18      
(4.4%)    0.5% (  -7% -    8%) 0.695
                    CountAndHighMed      272.12      (3.4%)      273.55      
(4.8%)    0.5% (  -7% -    9%) 0.691
                            Respell       35.73      (4.9%)       35.93      
(6.7%)    0.6% ( -10% -   12%) 0.763
                    MedSloppyPhrase       19.80      (6.6%)       19.92      
(7.3%)    0.6% ( -12% -   15%) 0.778
                    LowSloppyPhrase       17.75      (4.3%)       17.86      
(3.7%)    0.6% (  -7% -    9%) 0.622
                            Prefix3     1050.17      (3.9%)     1057.01      
(4.8%)    0.7% (  -7% -    9%) 0.636
                           Wildcard      143.63      (4.1%)      144.58      
(4.3%)    0.7% (  -7% -    9%) 0.618
                        CountPhrase       13.59      (4.2%)       13.69      
(4.3%)    0.7% (  -7% -    9%) 0.608
                         OrHighHigh       60.33      (8.5%)       60.77     
(10.6%)    0.7% ( -16% -   21%) 0.810
                      OrNotHighHigh      311.92      (5.7%)      314.20      
(5.2%)    0.7% (  -9% -   12%) 0.672
                       OrHighNotMed      389.61      (5.9%)      394.22      
(5.1%)    1.2% (  -9% -   12%) 0.497
                          OrHighLow      640.22      (3.9%)      648.85      
(5.0%)    1.3% (  -7% -   10%) 0.343
                            MedTerm      796.68      (6.3%)      807.58      
(7.8%)    1.4% ( -11% -   16%) 0.542
                            LowTerm      779.26      (4.8%)      790.22      
(4.2%)    1.4% (  -7% -   10%) 0.324
                   CountAndHighHigh       73.71      (5.1%)       74.75      
(6.7%)    1.4% (  -9% -   13%) 0.453
                LowIntervalsOrdered       76.85      (4.9%)       77.98      
(5.9%)    1.5% (  -8% -   12%) 0.395
               BrowseDateSSDVFacets        1.01      (8.6%)        1.03      
(9.7%)    1.6% ( -15% -   21%) 0.584
            AndHighMedDayTaxoFacets       17.64      (5.3%)       17.96      
(4.7%)    1.8% (  -7% -   12%) 0.268
              HighTermDayOfYearSort      474.18      (6.2%)      482.93      
(5.1%)    1.8% (  -8% -   13%) 0.303
                        AndHighHigh      120.18      (5.4%)      122.60      
(5.0%)    2.0% (  -7% -   13%) 0.219
                        MedSpanNear       94.97      (4.1%)       97.02      
(4.6%)    2.2% (  -6% -   11%) 0.117
              BrowseMonthSSDVFacets        5.99     (17.5%)        6.12     
(11.0%)    2.3% ( -22% -   37%) 0.618
                       OrHighNotLow      611.75      (4.3%)      628.18      
(6.9%)    2.7% (  -8% -   14%) 0.143
        BrowseRandomLabelSSDVFacets        4.02      (7.4%)        4.17      
(7.4%)    3.6% ( -10% -   19%) 0.129
                         TermDTSort      173.73      (6.3%)      180.06      
(7.2%)    3.6% (  -9% -   18%) 0.087
          BrowseDayOfYearSSDVFacets        5.83     (14.1%)        6.07     
(15.6%)    4.1% ( -22% -   39%) 0.382
                             IntNRQ      595.22     (10.5%)      621.42      
(7.2%)    4.4% ( -12% -   24%) 0.122
                             Fuzzy2       50.46      (6.7%)       52.94      
(9.2%)    4.9% ( -10% -   22%) 0.054
   
   
   
   This makes sense as the only place where we can expect a bit of overhead is 
the additional `TotalHitCountCollectorManager` overhead associated with the 
customized `TotalHitCountCollector` that the manager returns to support 
intra-segment concurrency.
   
   
   
   
   The second run is main (baseline) against my current branch 
(my_modified_version) with the overhead in `TotalHitCountCollectorManager` 
removed:
   
   ```
   diff --git 
a/lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollectorManager.java
 
b/lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollectorManager.java
   index 50956725cb1..d95cef8c05d 100644
   --- 
a/lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollectorManager.java
   +++ 
b/lucene/core/src/java/org/apache/lucene/search/TotalHitCountCollectorManager.java
   @@ -51,7 +51,8 @@ public class TotalHitCountCollectorManager
    
      @Override
      public TotalHitCountCollector newCollector() throws IOException {
   -    return new LeafPartitionAwareTotalHitCountCollector(earlyTerminatedMap);
   +    return new TotalHitCountCollector();
   +    //return new 
LeafPartitionAwareTotalHitCountCollector(earlyTerminatedMap);
      }
   ```
   
   
                               TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                         AndHighLow     1215.06      (5.0%)     1168.26      
(4.3%)   -3.9% ( -12% -    5%) 0.009
                       OrNotHighLow      934.09      (3.8%)      914.26      
(3.9%)   -2.1% (  -9% -    5%) 0.078
              BrowseMonthTaxoFacets        4.09      (5.0%)        4.04      
(6.8%)   -1.2% ( -12% -   11%) 0.511
                          CountTerm     4521.12      (8.9%)     4467.11      
(7.2%)   -1.2% ( -15% -   16%) 0.642
                         TermDTSort       83.56      (9.3%)       82.71     
(10.7%)   -1.0% ( -19% -   20%) 0.750
              BrowseMonthSSDVFacets        5.89     (12.4%)        5.86     
(13.4%)   -0.5% ( -23% -   28%) 0.907
                           HighTerm      560.23      (7.3%)      558.11      
(5.8%)   -0.4% ( -12% -   13%) 0.856
                           PKLookup      160.43      (6.9%)      159.88      
(6.0%)   -0.3% ( -12% -   13%) 0.868
                  HighTermMonthSort     1183.43      (4.5%)     1180.78      
(3.6%)   -0.2% (  -7% -    8%) 0.862
                         AndHighMed      480.37      (4.8%)      479.74      
(2.8%)   -0.1% (  -7% -    7%) 0.916
                         HighPhrase       93.71      (4.5%)       93.75      
(4.3%)    0.0% (  -8% -    9%) 0.972
                MedIntervalsOrdered       46.65      (5.2%)       46.69      
(5.1%)    0.1% (  -9% -   10%) 0.968
              HighTermDayOfYearSort      309.90      (5.0%)      310.31      
(4.3%)    0.1% (  -8% -    9%) 0.928
          BrowseDayOfYearSSDVFacets        5.37     (10.4%)        5.38     
(10.8%)    0.2% ( -19% -   23%) 0.957
               BrowseDateSSDVFacets        1.01      (8.7%)        1.01      
(9.3%)    0.3% ( -16% -   20%) 0.919
                           Wildcard      153.80      (5.2%)      154.33      
(5.0%)    0.3% (  -9% -   11%) 0.833
             OrHighMedDayTaxoFacets        4.12      (6.8%)        4.14      
(6.7%)    0.4% ( -12% -   14%) 0.865
                          OrHighMed      299.87      (3.8%)      301.12      
(3.3%)    0.4% (  -6% -    7%) 0.711
                      OrNotHighHigh      392.91      (5.3%)      394.61      
(5.2%)    0.4% (  -9% -   11%) 0.793
                    LowSloppyPhrase      109.01      (4.1%)      109.53      
(4.6%)    0.5% (  -7% -    9%) 0.731
                    CountAndHighMed       94.97      (4.0%)       95.42      
(6.4%)    0.5% (  -9% -   11%) 0.777
                          LowPhrase      437.29      (3.4%)      439.41      
(5.3%)    0.5% (  -8% -    9%) 0.733
                             Fuzzy1       69.22      (4.5%)       69.59      
(6.1%)    0.5% (  -9% -   11%) 0.755
                            LowTerm      691.97      (4.4%)      695.79      
(3.2%)    0.6% (  -6% -    8%) 0.649
                        MedSpanNear        9.42      (3.3%)        9.48      
(2.8%)    0.6% (  -5% -    7%) 0.514
                          MedPhrase       91.74      (4.8%)       92.34      
(5.4%)    0.7% (  -9% -   11%) 0.687
                        LowSpanNear       15.79      (4.6%)       15.91      
(4.8%)    0.8% (  -8% -   10%) 0.612
                            Respell       38.77      (8.4%)       39.08      
(7.3%)    0.8% ( -13% -   17%) 0.754
                   CountAndHighHigh       60.15      (4.8%)       60.64      
(4.6%)    0.8% (  -8% -   10%) 0.582
           AndHighHighDayTaxoFacets        7.59      (4.2%)        7.65      
(5.3%)    0.8% (  -8% -   10%) 0.574
                  HighTermTitleSort       95.06      (4.0%)       95.88      
(3.2%)    0.9% (  -6% -    8%) 0.448
                        CountPhrase       12.09      (3.3%)       12.21      
(2.5%)    1.0% (  -4% -    7%) 0.277
                             Fuzzy2       46.03      (5.4%)       46.50      
(5.8%)    1.0% (  -9% -   12%) 0.567
                             IntNRQ       42.07      (3.8%)       42.54      
(3.6%)    1.1% (  -5% -    8%) 0.333
                       HighSpanNear       25.89      (4.0%)       26.23      
(3.8%)    1.3% (  -6% -    9%) 0.289
                          OrHighLow      665.60      (3.8%)      674.34      
(2.8%)    1.3% (  -5% -    8%) 0.218
        BrowseRandomLabelTaxoFacets        3.14      (3.4%)        3.18      
(5.4%)    1.5% (  -6% -   10%) 0.283
        BrowseRandomLabelSSDVFacets        4.00      (8.3%)        4.06      
(9.8%)    1.5% ( -15% -   21%) 0.591
                    MedSloppyPhrase       75.80      (4.8%)       76.96      
(3.8%)    1.5% (  -6% -   10%) 0.262
          BrowseDayOfYearTaxoFacets        3.85      (9.0%)        3.91      
(8.2%)    1.6% ( -14% -   20%) 0.564
               HighIntervalsOrdered        7.43      (6.3%)        7.55      
(4.3%)    1.6% (  -8% -   12%) 0.338
                       OrNotHighMed      451.36      (6.9%)      458.85      
(7.2%)    1.7% ( -11% -   16%) 0.456
                   HighSloppyPhrase       24.07      (7.6%)       24.52      
(5.4%)    1.9% ( -10% -   16%) 0.362
               BrowseDateTaxoFacets        3.88      (8.3%)        3.96      
(8.4%)    2.0% ( -13% -   20%) 0.443
            AndHighMedDayTaxoFacets       32.87      (6.1%)       33.56      
(6.6%)    2.1% (  -9% -   15%) 0.293
                        AndHighHigh       80.31      (7.0%)       82.00      
(6.6%)    2.1% ( -10% -   16%) 0.329
               HighTermTitleBDVSort       32.34      (3.9%)       33.08      
(2.6%)    2.3% (  -4% -    9%) 0.028
                            Prefix3      131.67      (4.4%)      134.74      
(3.9%)    2.3% (  -5% -   11%) 0.075
                       OrHighNotMed      370.91      (7.0%)      380.92      
(5.7%)    2.7% (  -9% -   16%) 0.180
                       OrHighNotLow      445.51      (7.9%)      458.42      
(5.9%)    2.9% ( -10% -   18%) 0.189
                      OrHighNotHigh      396.82      (4.8%)      410.46      
(5.5%)    3.4% (  -6% -   14%) 0.035
                            MedTerm      528.83      (6.4%)      548.26      
(6.2%)    3.7% (  -8% -   17%) 0.066
                         OrHighHigh      103.91      (6.2%)      108.10      
(6.6%)    4.0% (  -8% -   17%) 0.046
               MedTermDayTaxoFacets       11.93      (4.8%)       12.49      
(6.5%)    4.7% (  -6% -   16%) 0.010
                LowIntervalsOrdered       86.39      (8.9%)       90.77      
(7.5%)    5.1% ( -10% -   23%) 0.052
                    CountOrHighHigh       53.24     (25.6%)       60.63     
(28.9%)   13.9% ( -32% -   91%) 0.108
                     CountOrHighMed       46.21     (25.8%)       53.82     
(31.0%)   16.5% ( -32% -   98%) 0.068
   
   
   Conclusion is that the small overhead on count is caused by the 
`TotalHitCountCollectorManager` changes. We have an option to opt-in on the new 
version of the manager, especially as the additional overhead is only needed 
for intra-segment concurrency. That does complicate a bit the API hence it's 
not entirely which direction we should go. In short: add an option to the 
manager to signal that you rely on intra-segment slicing when needed. It would 
be great to make this automatic (can be done in IndexSearcher#count) but there 
is not direct link between a collector manager and the searcher where it is 
used, so whether segments partitions are being searched is not known in the 
manager.
   
   
   The third and last run is main (baseline) against my current branch 
(my_modified_version) but enabling intra-segment concurrency as follows:
   
   ```
   diff --git 
a/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java 
b/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java
   index 1c0bff93aa6..5f05177faca 100644
   --- a/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java
   +++ b/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java
   @@ -331,7 +331,7 @@ public class IndexSearcher {
       * MAX_DOCS_PER_SLICE will get their own thread
       */
      protected LeafSlice[] slices(List<LeafReaderContext> leaves) {
   -    return slices(leaves, MAX_DOCS_PER_SLICE, MAX_SEGMENTS_PER_SLICE);
   +    return slicesWithPartitions(leaves, MAX_DOCS_PER_SLICE, 
MAX_SEGMENTS_PER_SLICE);
      }
    ```
   
                                TaskQPS baseline      StdDevQPS 
my_modified_version      StdDev                Pct diff p-value
                             IntNRQ       53.67     (30.0%)       22.99     
(13.1%)  -57.2% ( -77% -  -20%) 0.000
                  HighTermMonthSort     1152.98      (4.0%)      521.96      
(2.8%)  -54.7% ( -59% -  -49%) 0.000
                           Wildcard      191.62      (4.0%)       89.97      
(2.8%)  -53.0% ( -57% -  -48%) 0.000
                            Prefix3      583.57      (4.1%)      296.86      
(3.4%)  -49.1% ( -54% -  -43%) 0.000
              HighTermDayOfYearSort      376.24      (6.5%)      198.80      
(2.6%)  -47.2% ( -52% -  -40%) 0.000
                  HighTermTitleSort      106.06      (2.3%)       63.12      
(2.4%)  -40.5% ( -44% -  -36%) 0.000
                           HighTerm      600.13      (4.0%)      428.53      
(3.3%)  -28.6% ( -34% -  -22%) 0.000
                       OrHighNotLow      504.45      (4.8%)      369.99      
(3.7%)  -26.7% ( -33% -  -19%) 0.000
                          CountTerm     4185.03      (8.0%)     3094.54      
(5.7%)  -26.1% ( -36% -  -13%) 0.000
                            LowTerm      738.23      (3.9%)      564.97      
(2.9%)  -23.5% ( -29% -  -17%) 0.000
                       OrNotHighLow      873.49      (4.0%)      686.63      
(2.8%)  -21.4% ( -27% -  -15%) 0.000
                            MedTerm      591.39      (6.0%)      476.60      
(4.6%)  -19.4% ( -28% -   -9%) 0.000
               BrowseDateSSDVFacets        1.01      (9.3%)        0.86      
(6.5%)  -15.0% ( -28% -    0%) 0.000
                         AndHighLow     1216.35      (3.7%)     1036.91      
(4.4%)  -14.8% ( -21% -   -6%) 0.000
        BrowseRandomLabelSSDVFacets        3.99      (9.0%)        3.52      
(9.2%)  -11.7% ( -27% -    7%) 0.000
                       OrNotHighMed      392.27      (5.2%)      347.95      
(2.7%)  -11.3% ( -18% -   -3%) 0.000
          BrowseDayOfYearSSDVFacets        5.36     (16.0%)        4.76      
(8.5%)  -11.2% ( -30% -   15%) 0.006
                          OrHighLow      552.26      (7.5%)      490.96      
(3.1%)  -11.1% ( -20% -    0%) 0.000
              BrowseMonthTaxoFacets        4.01      (6.8%)        3.59      
(4.7%)  -10.6% ( -20% -    1%) 0.000
                       OrHighNotMed      314.53      (6.1%)      285.65      
(4.7%)   -9.2% ( -18% -    1%) 0.000
             OrHighMedDayTaxoFacets        4.91      (5.5%)        4.49      
(6.9%)   -8.6% ( -19% -    3%) 0.000
              BrowseMonthSSDVFacets        5.68     (15.8%)        5.19     
(11.7%)   -8.6% ( -31% -   22%) 0.051
                             Fuzzy1       83.62      (5.1%)       76.44      
(6.5%)   -8.6% ( -19% -    3%) 0.000
               BrowseDateTaxoFacets        3.75      (6.6%)        3.50     
(12.8%)   -6.8% ( -24% -   13%) 0.034
                           PKLookup      163.08      (4.8%)      153.32      
(4.5%)   -6.0% ( -14% -    3%) 0.000
          BrowseDayOfYearTaxoFacets        3.72      (6.0%)        3.51     
(13.6%)   -5.5% ( -23% -   14%) 0.095
        BrowseRandomLabelTaxoFacets        3.17      (3.3%)        3.02     
(13.7%)   -4.9% ( -21% -   12%) 0.118
            AndHighMedDayTaxoFacets       58.87      (6.9%)       56.23      
(7.0%)   -4.5% ( -17% -   10%) 0.041
                            Respell       32.63      (6.5%)       31.20      
(5.0%)   -4.4% ( -14% -    7%) 0.017
           AndHighHighDayTaxoFacets       10.71      (3.2%)       10.26      
(4.2%)   -4.2% ( -11% -    3%) 0.000
                         TermDTSort      139.91      (6.5%)      134.73      
(4.1%)   -3.7% ( -13% -    7%) 0.031
               MedTermDayTaxoFacets       20.89      (5.6%)       20.69      
(5.0%)   -1.0% ( -10% -   10%) 0.566
                             Fuzzy2       49.43      (5.4%)       50.64      
(5.8%)    2.4% (  -8% -   14%) 0.167
                      OrNotHighHigh      256.44      (5.6%)      262.89      
(4.5%)    2.5% (  -7% -   13%) 0.118
                      OrHighNotHigh      299.79      (5.5%)      332.13      
(6.2%)   10.8% (   0% -   23%) 0.000
                          OrHighMed      283.53      (4.2%)      354.04      
(5.1%)   24.9% (  14% -   35%) 0.000
                         HighPhrase      150.49      (3.7%)      198.04      
(8.7%)   31.6% (  18% -   45%) 0.000
                    MedSloppyPhrase      135.03      (4.9%)      182.50      
(6.2%)   35.2% (  23% -   48%) 0.000
                     CountOrHighMed      313.83      (3.8%)      435.20     
(10.5%)   38.7% (  23% -   55%) 0.000
                         AndHighMed      283.90      (2.8%)      422.91      
(4.9%)   49.0% (  40% -   58%) 0.000
                         OrHighHigh      114.64      (5.4%)      178.83     
(10.1%)   56.0% (  38% -   75%) 0.000
                    CountOrHighHigh       74.36     (24.4%)      125.75     
(40.5%)   69.1% (   3% -  177%) 0.000
                    CountAndHighMed      269.50      (5.1%)      472.96      
(7.9%)   75.5% (  59% -   93%) 0.000
                          LowPhrase       34.91      (4.2%)       63.27      
(8.0%)   81.2% (  66% -   97%) 0.000
                        AndHighHigh       34.49      (7.4%)       63.06      
(9.6%)   82.9% (  61% -  107%) 0.000
               HighTermTitleBDVSort       43.49      (5.3%)       79.60      
(6.9%)   83.1% (  67% -  100%) 0.000
                          MedPhrase       88.33      (4.0%)      167.26      
(6.9%)   89.4% (  75% -  104%) 0.000
                        MedSpanNear       24.23      (2.9%)       46.89      
(5.7%)   93.5% (  82% -  105%) 0.000
                   CountAndHighHigh       59.20      (4.2%)      117.07     
(12.4%)   97.8% (  77% -  119%) 0.000
                        CountPhrase       30.21      (4.2%)       60.12      
(9.8%)   99.0% (  81% -  118%) 0.000
                   HighSloppyPhrase       33.61      (5.4%)       67.40     
(10.6%)  100.5% (  80% -  123%) 0.000
                       HighSpanNear       15.22      (3.1%)       30.96      
(9.3%)  103.4% (  88% -  119%) 0.000
               HighIntervalsOrdered       31.78     (10.9%)       66.81     
(15.1%)  110.2% (  75% -  152%) 0.000
                LowIntervalsOrdered       97.00      (7.8%)      209.73     
(10.8%)  116.2% (  90% -  146%) 0.000
                        LowSpanNear       13.38      (3.1%)       29.03      
(8.5%)  117.0% ( 102% -  132%) 0.000
                    LowSloppyPhrase       44.38      (4.8%)      102.93     
(11.7%)  131.9% ( 110% -  155%) 0.000
                MedIntervalsOrdered        5.80      (6.0%)       14.09     
(12.5%)  142.7% ( 117% -  171%) 0.000
   
   
   I think that the regressions make sense in that they come from queries that 
require computation ahead of time at the segment level, which gets duplicated 
across segment partitions. In that case parallelizing makes things worse and 
will need additional work to address that. For other queries, there is quite a 
bit of value already. I'd say this is quite promising, given that the current 
slicing approach is pretty basic and I have not put any effort into optimizing 
it to get better benchmark results, and this index is not force merged. The max 
number of documents per slice is set at 250_000.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] Add support for intra-segment search concurrency [lucene]

Reply via email to