[jira] [Comment Edited] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Thu, 24 Jul 2014 01:56:23 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072693#comment-14072693
 ]


Da Huang edited comment on LUCENE-4396 at 7/24/14 8:53 AM:
-----------------------------------------------------------

This patch is based on the git mirror commit 
ce7d0578b30981d15687bf76aec595274efccbad .
This is the first try to merge scorers, so that we can get a better perf of 
boolean retrieval.

I create a new class named "BooleanMixedScorerDecider" to choose the best 
scorer.
Rules for choosing remains to be improved. I have been working on it to find an 
elegant way to define rules.
{code}
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
       HighAndSomeLowNot       11.53      (7.3%)       10.75     (10.1%)   
-6.8% ( -22% -   11%)
       HighAndTonsLowNot        4.87      (4.0%)        4.64      (6.0%)   
-4.9% ( -14% -    5%)
         LowAndSomeLowOr      306.20      (2.2%)      299.06      (2.8%)   
-2.3% (  -7% -    2%)
        HighAndSomeLowOr       13.67      (9.4%)       13.38      (2.7%)   
-2.1% ( -13% -   11%)
        HighAndTonsLowOr        4.04      (6.4%)        3.96      (1.9%)   
-1.9% (  -9% -    6%)
        LowAndSomeLowNot      215.18      (1.9%)      211.14      (2.2%)   
-1.9% (  -5% -    2%)
                PKLookup       96.26      (2.3%)       94.56      (2.8%)   
-1.8% (  -6% -    3%)
      HighAndTonsHighNot        0.06      (2.3%)        0.06      (2.6%)   
-1.0% (  -5% -    4%)
       HighAndTonsHighOr        0.06      (0.6%)        0.06      (1.3%)    
0.9% (   0% -    2%)
      HighAndSomeHighNot        1.59      (2.2%)        1.62      (2.9%)    
1.7% (  -3% -    6%)
       LowAndSomeHighNot       66.33      (2.1%)       68.77      (2.1%)    
3.7% (   0% -    8%)
        LowAndSomeHighOr       53.75      (1.6%)       56.86      (2.1%)    
5.8% (   1% -    9%)
        LowAndTonsLowNot       14.00      (1.7%)       14.84      (1.5%)    
6.1% (   2% -    9%)
       HighAndSomeHighOr        2.39      (2.2%)        2.68      (3.5%)   
12.4% (   6% -   18%)
         LowAndTonsLowOr       17.69      (0.9%)       21.64      (1.7%)   
22.3% (  19% -   25%)
        LowAndTonsHighOr        1.83      (1.3%)        2.33      (2.4%)   
27.2% (  23% -   31%)
       LowAndTonsHighNot        1.15      (1.5%)        1.51      (3.1%)   
30.9% (  25% -   36%)
{code}


was (Author: dhuang):
This is the first try to merge scorers, so that we can get a better perf of 
boolean retrieval.

I create a new class named "BooleanMixedScorerDecider" to choose the best 
scorer.
Rules for choosing remains to be improved. I have been working on it to find an 
elegant way to define rules.
{code}
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
       HighAndSomeLowNot       11.53      (7.3%)       10.75     (10.1%)   
-6.8% ( -22% -   11%)
       HighAndTonsLowNot        4.87      (4.0%)        4.64      (6.0%)   
-4.9% ( -14% -    5%)
         LowAndSomeLowOr      306.20      (2.2%)      299.06      (2.8%)   
-2.3% (  -7% -    2%)
        HighAndSomeLowOr       13.67      (9.4%)       13.38      (2.7%)   
-2.1% ( -13% -   11%)
        HighAndTonsLowOr        4.04      (6.4%)        3.96      (1.9%)   
-1.9% (  -9% -    6%)
        LowAndSomeLowNot      215.18      (1.9%)      211.14      (2.2%)   
-1.9% (  -5% -    2%)
                PKLookup       96.26      (2.3%)       94.56      (2.8%)   
-1.8% (  -6% -    3%)
      HighAndTonsHighNot        0.06      (2.3%)        0.06      (2.6%)   
-1.0% (  -5% -    4%)
       HighAndTonsHighOr        0.06      (0.6%)        0.06      (1.3%)    
0.9% (   0% -    2%)
      HighAndSomeHighNot        1.59      (2.2%)        1.62      (2.9%)    
1.7% (  -3% -    6%)
       LowAndSomeHighNot       66.33      (2.1%)       68.77      (2.1%)    
3.7% (   0% -    8%)
        LowAndSomeHighOr       53.75      (1.6%)       56.86      (2.1%)    
5.8% (   1% -    9%)
        LowAndTonsLowNot       14.00      (1.7%)       14.84      (1.5%)    
6.1% (   2% -    9%)
       HighAndSomeHighOr        2.39      (2.2%)        2.68      (3.5%)   
12.4% (   6% -   18%)
         LowAndTonsLowOr       17.69      (0.9%)       21.64      (1.7%)   
22.3% (  19% -   25%)
        LowAndTonsHighOr        1.83      (1.3%)        2.33      (2.4%)   
27.2% (  23% -   31%)
       LowAndTonsHighNot        1.15      (1.5%)        1.51      (3.1%)   
30.9% (  25% -   36%)
{code}

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch, 
> luceneutil-score-equal.patch, stat.cpp, stat.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to