[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Fri, 04 Jul 2014 20:28:58 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Da Huang updated LUCENE-4396:
-----------------------------

    Attachment: LUCENE-4396.patch

This is a patch based on the git mirror commit 
7f66461aea7bc2cb6f31a993cba77734e5e0f9d9.

In this patch, I take the bucketTable as an array but not a hash table.
It seems that its perf. is better than former patches' on most cases.

As you know, after putting required docs into bucketTable, I have to scan both 
the table and optional docs. Here, I have tried skipping to scan the 
bucketTable to improve the perf. The results is as follows.


{code}
No skip
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
       HighAndTonsLowNot        6.56      (3.1%)        2.59      (1.0%)  
-60.5% ( -62% -  -58%)
        HighAndTonsLowOr        6.43      (3.3%)        2.58      (0.8%)  
-59.9% ( -61% -  -57%)
        HighAndSomeLowOr        8.49      (8.5%)        4.05      (1.8%)  
-52.3% ( -57% -  -45%)
       HighAndSomeLowNot        6.17      (8.6%)        3.16      (2.1%)  
-48.8% ( -54% -  -41%)
         LowAndSomeLowOr      250.58      (2.0%)      194.86      (1.6%)  
-22.2% ( -25% -  -18%)
        LowAndSomeLowNot      178.66      (1.6%)      147.67      (2.2%)  
-17.3% ( -20% -  -13%)
        LowAndSomeHighOr       40.71      (2.8%)       41.50      (1.8%)    
2.0% (  -2% -    6%)
                PKLookup       97.59      (3.0%)       99.52      (4.6%)    
2.0% (  -5% -    9%)
       LowAndSomeHighNot       20.76      (3.0%)       21.54      (2.3%)    
3.7% (  -1% -    9%)
      HighAndSomeHighNot        2.22      (1.7%)        2.67      (4.4%)   
20.3% (  13% -   26%)
       LowAndTonsHighNot        3.81      (2.3%)        4.60      (2.1%)   
20.8% (  15% -   25%)
        LowAndTonsHighOr        2.87      (2.3%)        3.48      (2.6%)   
21.2% (  15% -   26%)
       HighAndSomeHighOr        1.74      (2.1%)        2.16      (3.5%)   
24.0% (  18% -   30%)
         LowAndTonsLowOr       18.66      (1.3%)       23.68      (1.9%)   
26.9% (  23% -   30%)
        LowAndTonsLowNot       16.01      (1.4%)       22.16      (2.8%)   
38.4% (  33% -   43%)
       HighAndTonsHighOr        0.04      (0.9%)        0.11      (9.8%)  
158.2% ( 146% -  170%)
      HighAndTonsHighNot        0.06      (1.1%)        0.15     (13.5%)  
166.2% ( 149% -  182%)
      
---------------------------------------------------
Binary search skip
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
       HighAndTonsLowNot        6.22      (3.8%)        2.45      (0.9%)  
-60.6% ( -62% -  -58%)
        HighAndSomeLowOr        8.29     (11.2%)        4.40      (3.0%)  
-46.9% ( -54% -  -36%)
       HighAndSomeLowNot       12.34      (7.1%)        6.65      (2.6%)  
-46.1% ( -52% -  -39%)
         LowAndSomeLowOr      232.38      (2.9%)      165.05      (1.8%)  
-29.0% ( -32% -  -24%)
        HighAndTonsLowOr        5.17      (6.2%)        3.75      (3.0%)  
-27.4% ( -34% -  -19%)
        LowAndSomeLowNot      227.71      (2.6%)      171.13      (3.2%)  
-24.8% ( -29% -  -19%)
       HighAndSomeHighOr        1.35      (3.9%)        1.14      (3.5%)  
-16.1% ( -22% -   -9%)
        LowAndSomeHighOr       50.17      (3.6%)       48.84      (3.7%)   
-2.7% (  -9% -    4%)
       LowAndSomeHighNot       52.71      (3.0%)       51.55      (3.8%)   
-2.2% (  -8% -    4%)
                PKLookup       90.17      (3.5%)       91.38      (3.3%)    
1.3% (  -5% -    8%)
      HighAndSomeHighNot        1.69      (2.9%)        2.00      (6.3%)   
18.5% (   8% -   28%)
         LowAndTonsLowOr       15.61      (1.9%)       18.59      (2.8%)   
19.0% (  14% -   24%)
        LowAndTonsHighOr        1.82      (2.7%)        2.20      (4.6%)   
20.7% (  13% -   28%)
        LowAndTonsLowNot       15.51      (1.7%)       20.14      (3.8%)   
29.8% (  23% -   35%)
       LowAndTonsHighNot        1.01      (2.9%)        1.34      (6.5%)   
31.7% (  21% -   42%)
       HighAndTonsHighOr        0.07      (0.9%)        0.12      (6.9%)   
77.7% (  69% -   86%)
      HighAndTonsHighNot        0.07      (1.4%)        0.19     (11.9%)  
162.4% ( 146% -  178%)
      
---------------------------------------------------
8 steps skip
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
       HighAndTonsLowNot        5.45      (3.3%)        1.69      (1.3%)  
-69.0% ( -71% -  -66%)
        HighAndSomeLowOr        5.46     (11.0%)        2.76      (4.4%)  
-49.5% ( -58% -  -38%)
       HighAndSomeLowNot       17.94      (5.7%)       10.40      (3.8%)  
-42.1% ( -48% -  -34%)
         LowAndSomeLowOr      306.62      (1.7%)      231.45      (1.5%)  
-24.5% ( -27% -  -21%)
        LowAndSomeLowNot      286.30      (1.7%)      218.13      (2.0%)  
-23.8% ( -27% -  -20%)
        HighAndTonsLowOr        6.34      (3.5%)        5.31      (4.5%)  
-16.3% ( -23% -   -8%)
        LowAndSomeHighOr       33.53      (2.1%)       33.85      (2.2%)    
1.0% (  -3% -    5%)
                PKLookup       97.39      (1.9%)       98.40      (3.9%)    
1.0% (  -4% -    6%)
       LowAndSomeHighNot       42.16      (2.0%)       42.73      (2.1%)    
1.4% (  -2% -    5%)
       HighAndSomeHighOr        2.43      (2.4%)        2.76      (4.8%)   
13.4% (   6% -   21%)
      HighAndSomeHighNot        2.74      (1.4%)        3.17      (4.6%)   
15.7% (   9% -   21%)
        LowAndTonsHighOr        3.45      (1.8%)        4.21      (3.2%)   
22.0% (  16% -   27%)
       LowAndTonsHighNot        2.37      (1.8%)        2.95      (3.0%)   
24.6% (  19% -   30%)
         LowAndTonsLowOr       17.21      (1.1%)       22.50      (2.6%)   
30.7% (  26% -   34%)
        LowAndTonsLowNot       13.60      (1.4%)       19.97      (2.4%)   
46.8% (  42% -   51%)
       HighAndTonsHighOr        0.08      (0.5%)        0.19      (9.9%)  
140.3% ( 129% -  151%)
      HighAndTonsHighNot        0.06      (1.7%)        0.15     (12.0%)  
163.9% ( 147% -  180%)

---------------------------------------------------
16 steps skip
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
       HighAndTonsLowNot        6.69      (2.0%)        2.71      (0.8%)  
-59.5% ( -61% -  -57%)
        HighAndTonsLowOr        1.69     (10.1%)        0.89      (2.1%)  
-47.1% ( -53% -  -38%)
        HighAndSomeLowOr        7.28     (11.5%)        3.96      (1.9%)  
-45.6% ( -52% -  -36%)
       HighAndSomeLowNot       14.38      (5.2%)        8.09      (1.5%)  
-43.7% ( -47% -  -39%)
         LowAndSomeLowOr      295.60      (2.3%)      223.80      (2.0%)  
-24.3% ( -27% -  -20%)
        LowAndSomeLowNot      171.52      (1.7%)      140.82      (1.5%)  
-17.9% ( -20% -  -14%)
        LowAndSomeHighOr       40.12      (2.1%)       41.32      (3.2%)    
3.0% (  -2% -    8%)
                PKLookup       96.15      (2.4%)       99.15      (6.0%)    
3.1% (  -5% -   11%)
       LowAndSomeHighNot       31.53      (2.3%)       32.64      (2.9%)    
3.5% (  -1% -    8%)
      HighAndSomeHighNot        2.67      (1.3%)        3.04      (3.4%)   
13.9% (   9% -   18%)
       HighAndSomeHighOr        2.11      (2.1%)        2.58      (3.3%)   
22.5% (  16% -   28%)
        LowAndTonsHighOr        2.17      (1.8%)        2.67      (3.2%)   
23.1% (  17% -   28%)
       LowAndTonsHighNot        2.53      (1.6%)        3.16      (3.3%)   
25.2% (  20% -   30%)
        LowAndTonsLowNot       14.68      (0.9%)       20.97      (3.6%)   
42.8% (  38% -   47%)
         LowAndTonsLowOr       14.04      (1.1%)       20.09      (2.6%)   
43.0% (  38% -   47%)
       HighAndTonsHighOr        0.06      (0.7%)        0.15      (9.4%)  
152.0% ( 141% -  163%)
      HighAndTonsHighNot        0.05      (0.8%)        0.14     (12.1%)  
167.3% ( 153% -  181%)
      
---------------------------------------------------
32 steps skip
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
       HighAndTonsLowNot        6.50      (2.6%)        3.24      (1.1%)  
-50.1% ( -52% -  -47%)
       HighAndSomeLowNot        9.51      (6.4%)        4.87      (3.2%)  
-48.8% ( -54% -  -41%)
        HighAndSomeLowOr       14.87     (11.6%)        8.81      (3.6%)  
-40.8% ( -50% -  -28%)
         LowAndSomeLowOr      311.27      (2.6%)      241.43      (1.6%)  
-22.4% ( -25% -  -18%)
        LowAndSomeLowNot      231.96      (2.4%)      181.95      (2.0%)  
-21.6% ( -25% -  -17%)
        HighAndTonsLowOr        5.60      (5.7%)        4.45      (3.7%)  
-20.5% ( -28% -  -11%)
       LowAndSomeHighNot       62.10      (2.6%)       60.59      (2.5%)   
-2.4% (  -7% -    2%)
        LowAndSomeHighOr       49.36      (3.0%)       48.87      (3.1%)   
-1.0% (  -6% -    5%)
                PKLookup       96.38      (2.0%)       95.91      (2.5%)   
-0.5% (  -4% -    4%)
      HighAndSomeHighNot        2.08      (1.6%)        2.34      (5.2%)   
12.7% (   5% -   19%)
       HighAndSomeHighOr        2.30      (2.6%)        2.63      (5.7%)   
14.2% (   5% -   23%)
        LowAndTonsHighOr        1.88      (2.5%)        2.35      (4.2%)   
25.5% (  18% -   33%)
       LowAndTonsHighNot        1.10      (2.5%)        1.45      (5.0%)   
31.1% (  23% -   39%)
         LowAndTonsLowOr       14.38      (1.0%)       20.24      (3.2%)   
40.8% (  36% -   45%)
        LowAndTonsLowNot       12.98      (1.0%)       18.82      (2.9%)   
45.0% (  40% -   49%)
       HighAndTonsHighOr        0.08      (0.8%)        0.18     (12.3%)  
138.0% ( 123% -  152%)
      HighAndTonsHighNot        0.08      (1.1%)        0.21     (12.5%)  
157.6% ( 142% -  172%)
{code}

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> luceneutil-score-equal.patch, luceneutil-score-equal.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to