[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Sun, 25 May 2014 01:42:20 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Da Huang updated LUCENE-4396:
-----------------------------

    Attachment: LUCENE-4396.patch
                luceneutil-score-equal.patch
                AndOr.tasks

The patch is based on lucene github mirror commit 
cfb408ff6788e6fea8215098a785d72fb4e95c5b.

The following things have been done:

1. Rename TestBooleanNovelScorer to TestBooleanUnevenly, and this test suit 
test both BNS and BS when hit documents' distribution is unevenly.

2. Following Robert's advice, I sum scores into a double and cast to float in 
ConjunctionScorer. However, it seems to take little effect. Scores difference 
problem still remain.

3. Add a comment to scores difference within tolerance on luceneutil.

4. Make a new tasks file, which can test "AndSomeOR" cases.

5. Run luceneutil for "BNS vs BS2" and "BS vs BS2". The result is showed as 
follows.


{code}

BNS vs BS2

                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
        HighAndTonsLowOr       10.95      (3.5%)        1.52      (0.3%)  
-86.1% ( -86% -  -85%)
        HighAndSomeLowOr       29.98      (6.7%)       11.84      (2.9%)  
-60.5% ( -65% -  -54%)
         LowAndSomeLowOr      756.81      (1.4%)      503.21      (2.8%)  
-33.5% ( -37% -  -29%)
        LowAndSomeHighOr       54.25      (2.1%)       53.26      (2.1%)   
-1.8% (  -5% -    2%)
                PKLookup      241.74      (2.8%)      241.96      (2.3%)    
0.1% (  -4% -    5%)
         LowAndTonsLowOr       40.23      (1.2%)       43.19      (7.2%)    
7.4% (   0% -   15%)
        LowAndTonsHighOr        2.63      (2.1%)        2.99      (2.3%)   
13.8% (   9% -   18%)
       HighAndSomeHighOr        4.99      (1.8%)        5.86      (4.7%)   
17.4% (  10% -   24%)
       HighAndTonsHighOr        0.09      (1.5%)        0.22      (8.1%)  
145.4% ( 133% -  157%)


BS vs BS2

                    TaskQPS baseline      StdDevQPS my_modified_version      
StdDev                Pct diff
        HighAndTonsLowOr       16.54      (2.4%)        3.70      (0.2%)  
-77.6% ( -78% -  -76%)
        HighAndSomeLowOr       11.95      (8.5%)        4.29      (0.8%)  
-64.1% ( -67% -  -59%)
         LowAndSomeLowOr      839.11      (1.9%)      540.83      (2.5%)  
-35.5% ( -39% -  -31%)
        LowAndSomeHighOr      149.50      (2.6%)      136.71      (3.4%)   
-8.6% ( -14% -   -2%)
       HighAndSomeHighOr        3.72      (1.7%)        3.51      (1.7%)   
-5.6% (  -8% -   -2%)
                PKLookup      240.32      (2.8%)      238.87      (2.8%)   
-0.6% (  -6% -    5%)
        LowAndTonsHighOr        4.96      (2.3%)        5.35      (3.8%)    
7.8% (   1% -   14%)
         LowAndTonsLowOr       35.28      (1.2%)       39.00      (5.2%)   
10.6% (   4% -   17%)
       HighAndTonsHighOr        0.16      (1.1%)        0.36      (4.0%)  
122.6% ( 116% -  129%)
{code}

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> luceneutil-score-equal.patch, luceneutil-score-equal.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to