[jira] [Comment Edited] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

Luca Cavanna (JIRA) Fri, 31 May 2019 03:09:35 -0700


    [ 
https://issues.apache.org/jira/browse/LUCENE-8796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16852876#comment-16852876
 ]


Luca Cavanna edited comment on LUCENE-8796 at 5/31/19 10:08 AM:
----------------------------------------------------------------

I updated the PR and addressed all the comments, here are the latest benchmark 
results (with bitset optimization disabled on both ends):
{noformat}
Report after iter 19:
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
                 MedTerm     1510.74      (6.8%)     1457.20      (8.4%)   
-3.5% ( -17% -   12%)
                  Fuzzy1       70.49      (8.5%)       68.11      (9.8%)   
-3.4% ( -19% -   16%)
            OrHighNotMed      650.57      (5.8%)      629.81      (6.0%)   
-3.2% ( -14% -    9%)
               OrHighLow      447.13      (4.2%)      433.05      (4.5%)   
-3.2% ( -11% -    5%)
            OrNotHighMed      623.22      (6.3%)      605.19      (6.1%)   
-2.9% ( -14% -   10%)
            OrHighNotLow      720.89      (7.0%)      701.26      (7.9%)   
-2.7% ( -16% -   13%)
           OrNotHighHigh      558.43      (6.3%)      544.82      (4.9%)   
-2.4% ( -12% -    9%)
                 LowTerm     1279.34      (4.9%)     1248.60      (5.2%)   
-2.4% ( -11% -    8%)
              AndHighLow      690.75      (4.0%)      675.22      (5.3%)   
-2.2% ( -11% -    7%)
               LowPhrase      358.90      (2.3%)      351.28      (4.0%)   
-2.1% (  -8% -    4%)
                PKLookup      139.97      (3.0%)      137.32      (3.5%)   
-1.9% (  -8% -    4%)
            OrNotHighLow      728.48      (6.8%)      714.79      (6.5%)   
-1.9% ( -14% -   12%)
                HighTerm     1222.38      (6.3%)     1199.77      (7.1%)   
-1.8% ( -14% -   12%)
             AndHighHigh       58.93      (6.2%)       58.01      (5.8%)   
-1.6% ( -12% -   11%)
                 Prefix3      152.21      (4.5%)      150.00      (5.0%)   
-1.5% ( -10% -    8%)
       IntNRQConjMedTerm       79.15     (10.7%)       78.06     (10.5%)   
-1.4% ( -20% -   22%)
   HighTermDayOfYearSort       95.28      (5.1%)       94.10      (7.8%)   
-1.2% ( -13% -   12%)
                Wildcard       64.23      (2.3%)       63.45      (2.3%)   
-1.2% (  -5% -    3%)
             MedSpanNear       81.15      (2.2%)       80.19      (2.8%)   
-1.2% (  -6% -    3%)
            HighSpanNear       10.20      (3.9%)       10.08      (4.2%)   
-1.2% (  -8% -    7%)
    HighIntervalsOrdered        4.07      (1.8%)        4.03      (2.2%)   
-1.1% (  -4% -    2%)
             LowSpanNear       41.62      (3.1%)       41.20      (3.6%)   
-1.0% (  -7% -    5%)
       IntNRQConjLowTerm       20.36      (4.1%)       20.15      (4.5%)   
-1.0% (  -9% -    7%)
      IntNRQConjHighTerm       64.84      (9.6%)       64.21      (9.4%)   
-1.0% ( -18% -   19%)
              AndHighMed      229.08      (2.8%)      227.00      (2.5%)   
-0.9% (  -6% -    4%)
               MedPhrase       18.73      (1.5%)       18.57      (2.3%)   
-0.8% (  -4% -    2%)
         LowSloppyPhrase      124.52      (2.3%)      123.48      (2.6%)   
-0.8% (  -5% -    4%)
                 Respell       69.26      (3.0%)       68.68      (2.9%)   
-0.8% (  -6% -    5%)
              HighPhrase       12.98      (1.6%)       12.88      (2.2%)   
-0.7% (  -4% -    3%)
       PrefixConjLowTerm       42.11      (2.6%)       41.81      (3.0%)   
-0.7% (  -6% -    5%)
           OrHighNotHigh      680.34      (6.1%)      676.16      (7.6%)   
-0.6% ( -13% -   13%)
         MedSloppyPhrase       34.06      (4.9%)       33.89      (4.5%)   
-0.5% (  -9% -    9%)
                  IntNRQ       89.97     (12.4%)       89.62     (12.0%)   
-0.4% ( -22% -   27%)
        HighSloppyPhrase        8.28      (4.0%)        8.25      (3.9%)   
-0.3% (  -7% -    7%)
     WildcardConjLowTerm       36.35      (2.7%)       36.26      (2.7%)   
-0.3% (  -5% -    5%)
              OrHighHigh       27.89      (2.6%)       27.85      (3.1%)   
-0.1% (  -5% -    5%)
                  Fuzzy2       44.19      (3.8%)       44.17      (3.1%)   
-0.1% (  -6% -    7%)
               OrHighMed       90.42      (2.8%)       90.57      (2.8%)    
0.2% (  -5% -    6%)
       PrefixConjMedTerm       45.56      (2.8%)       45.79      (2.9%)    
0.5% (  -5% -    6%)
    WildcardConjHighTerm       33.08      (2.6%)       33.47      (3.0%)    
1.2% (  -4% -    6%)
      PrefixConjHighTerm       83.65      (2.6%)       86.23      (3.7%)    
3.1% (  -3% -    9%)
       HighTermMonthSort      130.35     (15.8%)      135.08     (12.1%)    
3.6% ( -20% -   37%)
     WildcardConjMedTerm       99.19      (3.6%)      103.37      (4.1%)    
4.2% (  -3% -   12%)
{noformat}


was (Author: lucacavanna):
I updated the PR and addressed all the comments, here are the latest benchmark 
results:

{noformat}
Report after iter 19:
                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
                 MedTerm     1510.74      (6.8%)     1457.20      (8.4%)   
-3.5% ( -17% -   12%)
                  Fuzzy1       70.49      (8.5%)       68.11      (9.8%)   
-3.4% ( -19% -   16%)
            OrHighNotMed      650.57      (5.8%)      629.81      (6.0%)   
-3.2% ( -14% -    9%)
               OrHighLow      447.13      (4.2%)      433.05      (4.5%)   
-3.2% ( -11% -    5%)
            OrNotHighMed      623.22      (6.3%)      605.19      (6.1%)   
-2.9% ( -14% -   10%)
            OrHighNotLow      720.89      (7.0%)      701.26      (7.9%)   
-2.7% ( -16% -   13%)
           OrNotHighHigh      558.43      (6.3%)      544.82      (4.9%)   
-2.4% ( -12% -    9%)
                 LowTerm     1279.34      (4.9%)     1248.60      (5.2%)   
-2.4% ( -11% -    8%)
              AndHighLow      690.75      (4.0%)      675.22      (5.3%)   
-2.2% ( -11% -    7%)
               LowPhrase      358.90      (2.3%)      351.28      (4.0%)   
-2.1% (  -8% -    4%)
                PKLookup      139.97      (3.0%)      137.32      (3.5%)   
-1.9% (  -8% -    4%)
            OrNotHighLow      728.48      (6.8%)      714.79      (6.5%)   
-1.9% ( -14% -   12%)
                HighTerm     1222.38      (6.3%)     1199.77      (7.1%)   
-1.8% ( -14% -   12%)
             AndHighHigh       58.93      (6.2%)       58.01      (5.8%)   
-1.6% ( -12% -   11%)
                 Prefix3      152.21      (4.5%)      150.00      (5.0%)   
-1.5% ( -10% -    8%)
       IntNRQConjMedTerm       79.15     (10.7%)       78.06     (10.5%)   
-1.4% ( -20% -   22%)
   HighTermDayOfYearSort       95.28      (5.1%)       94.10      (7.8%)   
-1.2% ( -13% -   12%)
                Wildcard       64.23      (2.3%)       63.45      (2.3%)   
-1.2% (  -5% -    3%)
             MedSpanNear       81.15      (2.2%)       80.19      (2.8%)   
-1.2% (  -6% -    3%)
            HighSpanNear       10.20      (3.9%)       10.08      (4.2%)   
-1.2% (  -8% -    7%)
    HighIntervalsOrdered        4.07      (1.8%)        4.03      (2.2%)   
-1.1% (  -4% -    2%)
             LowSpanNear       41.62      (3.1%)       41.20      (3.6%)   
-1.0% (  -7% -    5%)
       IntNRQConjLowTerm       20.36      (4.1%)       20.15      (4.5%)   
-1.0% (  -9% -    7%)
      IntNRQConjHighTerm       64.84      (9.6%)       64.21      (9.4%)   
-1.0% ( -18% -   19%)
              AndHighMed      229.08      (2.8%)      227.00      (2.5%)   
-0.9% (  -6% -    4%)
               MedPhrase       18.73      (1.5%)       18.57      (2.3%)   
-0.8% (  -4% -    2%)
         LowSloppyPhrase      124.52      (2.3%)      123.48      (2.6%)   
-0.8% (  -5% -    4%)
                 Respell       69.26      (3.0%)       68.68      (2.9%)   
-0.8% (  -6% -    5%)
              HighPhrase       12.98      (1.6%)       12.88      (2.2%)   
-0.7% (  -4% -    3%)
       PrefixConjLowTerm       42.11      (2.6%)       41.81      (3.0%)   
-0.7% (  -6% -    5%)
           OrHighNotHigh      680.34      (6.1%)      676.16      (7.6%)   
-0.6% ( -13% -   13%)
         MedSloppyPhrase       34.06      (4.9%)       33.89      (4.5%)   
-0.5% (  -9% -    9%)
                  IntNRQ       89.97     (12.4%)       89.62     (12.0%)   
-0.4% ( -22% -   27%)
        HighSloppyPhrase        8.28      (4.0%)        8.25      (3.9%)   
-0.3% (  -7% -    7%)
     WildcardConjLowTerm       36.35      (2.7%)       36.26      (2.7%)   
-0.3% (  -5% -    5%)
              OrHighHigh       27.89      (2.6%)       27.85      (3.1%)   
-0.1% (  -5% -    5%)
                  Fuzzy2       44.19      (3.8%)       44.17      (3.1%)   
-0.1% (  -6% -    7%)
               OrHighMed       90.42      (2.8%)       90.57      (2.8%)    
0.2% (  -5% -    6%)
       PrefixConjMedTerm       45.56      (2.8%)       45.79      (2.9%)    
0.5% (  -5% -    6%)
    WildcardConjHighTerm       33.08      (2.6%)       33.47      (3.0%)    
1.2% (  -4% -    6%)
      PrefixConjHighTerm       83.65      (2.6%)       86.23      (3.7%)    
3.1% (  -3% -    9%)
       HighTermMonthSort      130.35     (15.8%)      135.08     (12.1%)    
3.6% ( -20% -   37%)
     WildcardConjMedTerm       99.19      (3.6%)      103.37      (4.1%)    
4.2% (  -3% -   12%)
{noformat}

> Use exponential search in IntArrayDocIdSet advance method
> ---------------------------------------------------------
>
>                 Key: LUCENE-8796
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8796
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Luca Cavanna
>            Priority: Minor
>
> Chatting with [~jpountz] , he suggested to improve IntArrayDocIdSet by making 
> its advance method use exponential search instead of binary search. This 
> should help performance of queries including conjunctions: given that 
> ConjunctionDISI uses leap frog, it advances through doc ids in small steps, 
> hence exponential search should be faster when advancing on average compared 
> to binary search.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-8796) Use exponential search in IntArrayDocIdSet advance method

Reply via email to