[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Fri, 20 Jun 2014 18:27:06 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Da Huang updated LUCENE-4396:
-----------------------------

    Attachment: LUCENE-4396.patch

This is a patch based on git mirror commit 
8f9b823db1d6fba2cc7ec61b0596970f3c8bbe85.
The following things are done in this patch.

1. Solve the problem of score diff. between pure DAAT(ie. BS2, as BS2 does not 
exist now, I think it may be better to call it pure DAAT) and BS completely.

2. Add a new Scorer named BooleanScorerInOrder which uses only bitset but not 
linked list to collect docs.
I create this new Scorer but not change the old BS, because I think BS may be 
more useful in some cases.
For now, BSIO does not support the cases where there is no any MUST clause, 
because the procedure for such cases is totally different from cases with MUST 
clause.

The perf. of BSIO is as follows.
{code}
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
         LowAndSomeLowOr      259.82      (2.3%)      102.70      (2.8%)  
-60.5% ( -64% -  -56%)
        LowAndSomeLowNot      184.38      (2.8%)       80.26      (2.3%)  
-56.5% ( -59% -  -52%)
       HighAndSomeLowNot       10.44      (7.2%)        4.70      (4.3%)  
-55.0% ( -61% -  -46%)
        HighAndSomeLowOr       18.11      (8.0%)        8.83      (4.0%)  
-51.2% ( -58% -  -42%)
       HighAndTonsLowNot        3.03      (5.4%)        1.62      (4.7%)  
-46.8% ( -53% -  -38%)
        LowAndTonsLowNot       14.59      (1.2%)        8.86      (2.0%)  
-39.3% ( -41% -  -36%)
         LowAndTonsLowOr       14.11      (1.1%)        8.74      (3.0%)  
-38.1% ( -41% -  -34%)
        HighAndTonsLowOr        5.52      (4.3%)        3.85      (5.2%)  
-30.2% ( -38% -  -21%)
        LowAndSomeHighOr       24.97      (3.5%)       21.10      (3.2%)  
-15.5% ( -21% -   -9%)
       LowAndSomeHighNot       25.51      (3.3%)       23.22      (3.4%)   
-9.0% ( -15% -   -2%)
        LowAndTonsHighOr        1.66      (2.6%)        1.64      (2.8%)   
-1.1% (  -6% -    4%) 
                PKLookup       95.22      (5.5%)       96.64      (6.1%)    
1.5% (  -9% -   13%)
      HighAndSomeHighNot        2.37      (2.0%)        2.55      (6.9%)    
7.4% (  -1% -   16%)
       HighAndSomeHighOr        2.25      (2.7%)        2.43      (6.0%)    
7.8% (   0% -   16%)
       LowAndTonsHighNot        2.72      (2.3%)        5.94      (5.8%)  
118.4% ( 107% -  129%)
       HighAndTonsHighOr        0.05      (0.8%)        0.12     (17.0%)  
162.4% ( 143% -  181%)
      HighAndTonsHighNot        0.08      (1.3%)        0.48     (23.4%)  
507.0% ( 476% -  538%)
{code}

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, AndOr.tasks, AndOr.tasks, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, luceneutil-score-equal.patch, 
> luceneutil-score-equal.patch
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to