[jira] [Comment Edited] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Sun, 25 May 2014 01:45:20 -0700

    [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14008299#comment-14008299
 ]


Da Huang edited comment on LUCENE-4396 at 5/25/14 8:44 AM:
-----------------------------------------------------------

The patch is based on lucene github mirror commit 
cfb408ff6788e6fea8215098a785d72fb4e95c5b.

The following things have been done:

1. Rename TestBooleanNovelScorer to TestBooleanUnevenly, and this test suit 
test both BNS and BS when hit documents' distribution is unevenly.

2. Following Robert's advice, I sum scores into a double and cast to float in 
ConjunctionScorer. However, it seems to take little effect. Scores difference 
problem still remain.

3. Add a comment to scores difference within tolerance on luceneutil.

4. Make a new tasks file, which can test "AndSomeOR" cases.

5. Run luceneutil for "BNS vs BS2" and "BS vs BS2". The result is showed as 
follows.

P.S. BS has the same problem with score difference as BNS.
Althrough there's no BS2 now as the architecture has changed, here I still call 
it BS2 for convenience.

{code}

BNS vs BS2

                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
        HighAndTonsLowOr       10.95      (3.5%)        1.52      (0.3%)  
-86.1% ( -86% -  -85%)
        HighAndSomeLowOr       29.98      (6.7%)       11.84      (2.9%)  
-60.5% ( -65% -  -54%)
         LowAndSomeLowOr      756.81      (1.4%)      503.21      (2.8%)  
-33.5% ( -37% -  -29%)
        LowAndSomeHighOr       54.25      (2.1%)       53.26      (2.1%)   
-1.8% (  -5% -    2%)
                PKLookup      241.74      (2.8%)      241.96      (2.3%)    
0.1% (  -4% -    5%)
         LowAndTonsLowOr       40.23      (1.2%)       43.19      (7.2%)    
7.4% (   0% -   15%)
        LowAndTonsHighOr        2.63      (2.1%)        2.99      (2.3%)   
13.8% (   9% -   18%)
       HighAndSomeHighOr        4.99      (1.8%)        5.86      (4.7%)   
17.4% (  10% -   24%)
       HighAndTonsHighOr        0.09      (1.5%)        0.22      (8.1%)  
145.4% ( 133% -  157%)


BS vs BS2

                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
        HighAndTonsLowOr       16.54      (2.4%)        3.70      (0.2%)  
-77.6% ( -78% -  -76%)
        HighAndSomeLowOr       11.95      (8.5%)        4.29      (0.8%)  
-64.1% ( -67% -  -59%)
         LowAndSomeLowOr      839.11      (1.9%)      540.83      (2.5%)  
-35.5% ( -39% -  -31%)
        LowAndSomeHighOr      149.50      (2.6%)      136.71      (3.4%)   
-8.6% ( -14% -   -2%)
       HighAndSomeHighOr        3.72      (1.7%)        3.51      (1.7%)   
-5.6% (  -8% -   -2%)
                PKLookup      240.32      (2.8%)      238.87      (2.8%)   
-0.6% (  -6% -    5%)
        LowAndTonsHighOr        4.96      (2.3%)        5.35      (3.8%)    
7.8% (   1% -   14%)
         LowAndTonsLowOr       35.28      (1.2%)       39.00      (5.2%)   
10.6% (   4% -   17%)
       HighAndTonsHighOr        0.16      (1.1%)        0.36      (4.0%)  
122.6% ( 116% -  129%)
{code}


was (Author: dhuang):
The patch is based on lucene github mirror commit 
cfb408ff6788e6fea8215098a785d72fb4e95c5b.

The following things have been done:

1. Rename TestBooleanNovelScorer to TestBooleanUnevenly, and this test suit 
test both BNS and BS when hit documents' distribution is unevenly.

2. Following Robert's advice, I sum scores into a double and cast to float in 
ConjunctionScorer. However, it seems to take little effect. Scores difference 
problem still remain.

3. Add a comment to scores difference within tolerance on luceneutil.

4. Make a new tasks file, which can test "AndSomeOR" cases.

5. Run luceneutil for "BNS vs BS2" and "BS vs BS2". The result is showed as 
follows.


{code}

BNS vs BS2

                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
        HighAndTonsLowOr       10.95      (3.5%)        1.52      (0.3%)  
-86.1% ( -86% -  -85%)
        HighAndSomeLowOr       29.98      (6.7%)       11.84      (2.9%)  
-60.5% ( -65% -  -54%)
         LowAndSomeLowOr      756.81      (1.4%)      503.21      (2.8%)  
-33.5% ( -37% -  -29%)
        LowAndSomeHighOr       54.25      (2.1%)       53.26      (2.1%)   
-1.8% (  -5% -    2%)
                PKLookup      241.74      (2.8%)      241.96      (2.3%)    
0.1% (  -4% -    5%)
         LowAndTonsLowOr       40.23      (1.2%)       43.19      (7.2%)    
7.4% (   0% -   15%)
        LowAndTonsHighOr        2.63      (2.1%)        2.99      (2.3%)   
13.8% (   9% -   18%)
       HighAndSomeHighOr        4.99      (1.8%)        5.86      (4.7%)   
17.4% (  10% -   24%)
       HighAndTonsHighOr        0.09      (1.5%)        0.22      (8.1%)  
145.4% ( 133% -  157%)


BS vs BS2

                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
        HighAndTonsLowOr       16.54      (2.4%)        3.70      (0.2%)  
-77.6% ( -78% -  -76%)
        HighAndSomeLowOr       11.95      (8.5%)        4.29      (0.8%)  
-64.1% ( -67% -  -59%)
         LowAndSomeLowOr      839.11      (1.9%)      540.83      (2.5%)  
-35.5% ( -39% -  -31%)
        LowAndSomeHighOr      149.50      (2.6%)      136.71      (3.4%)   
-8.6% ( -14% -   -2%)
       HighAndSomeHighOr        3.72      (1.7%)        3.51      (1.7%)   
-5.6% (  -8% -   -2%)
                PKLookup      240.32      (2.8%)      238.87      (2.8%)   
-0.6% (  -6% -    5%)
        LowAndTonsHighOr        4.96      (2.3%)        5.35      (3.8%)    
7.8% (   1% -   14%)
         LowAndTonsLowOr       35.28      (1.2%)       39.00      (5.2%)   
10.6% (   4% -   17%)
       HighAndTonsHighOr        0.16      (1.1%)        0.36      (4.0%)  
122.6% ( 116% -  129%)
{code}

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> luceneutil-score-equal.patch, luceneutil-score-equal.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to