[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Sun, 10 Aug 2014 18:14:27 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Da Huang updated LUCENE-4396:
-----------------------------

    Attachment: LUCENE-4396.patch

This is a patch based on git mirror commit 
d707f783ab068b70752a3f9cfdc0dabb7f4fbadf .

In this patch, I tried to fix the .getChildren() problem in BAS and BLS.

I have tried to make .bulkScorer() choose DAAT, when scoreDocsInOrder is true.
However, I discovered that I have to copy the scorer choosing logics to 
.scoreDocsOutOfOrder() 
to make things right.

I have also tried to implement the .getChildren() method for BAS and BLS,
but the TAAT strategy will make scorers exhausted at the beginning.

Finally, I just throw UnsupportedOperationException in BAS.getChildren() and 
BLS.getChildren().


Besides, I have run more tests to make sure everything is right.
As you can see, the performance of HighAnd.\*Low.\* cases showed in merge.png 
is not good.
Therefore, I ran HighAnd.\*Low.\* cases with luceneutil's pattern filter, and 
the result is as follows.
{code}
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
           HighAnd6LowOr        9.44      (6.4%)        9.19      (4.8%)   
-2.6% ( -12% -    9%)
           HighAnd5LowOr        9.00      (8.8%)        8.85      (7.4%)   
-1.6% ( -16% -   16%)
           HighAnd3LowOr       11.89      (8.9%)       11.71      (7.8%)   
-1.6% ( -16% -   16%)
           HighAnd4LowOr       10.78      (7.4%)       10.61      (6.3%)   
-1.5% ( -14% -   13%)
           HighAnd7LowOr        9.08      (7.2%)        8.94      (5.8%)   
-1.5% ( -13% -   12%)
           HighAnd8LowOr        6.32      (8.6%)        6.23      (6.9%)   
-1.4% ( -15% -   15%)
           HighAnd9LowOr        5.71      (5.7%)        5.65      (4.5%)   
-1.1% ( -10% -    9%)
                PKLookup       98.95      (4.5%)       98.38      (2.4%)   
-0.6% (  -7% -    6%)
          HighAnd9LowNot        7.49      (3.7%)        7.46      (3.2%)   
-0.4% (  -7% -    6%)
          HighAnd4LowNot       10.33      (6.4%)       10.31      (6.1%)   
-0.2% ( -11% -   13%)
          HighAnd8LowNot        6.69      (5.3%)        6.70      (4.9%)    
0.1% (  -9% -   10%)
          HighAnd7LowNot        6.82      (5.1%)        6.84      (5.0%)    
0.3% (  -9% -   10%)
          HighAnd6LowNot        9.45      (5.5%)        9.48      (4.7%)    
0.3% (  -9% -   11%)
          HighAnd3LowNot       10.80      (6.7%)       10.87      (6.1%)    
0.6% ( -11% -   14%)
          HighAnd5LowNot        4.28      (7.4%)        4.32      (7.1%)    
1.0% ( -12% -   16%)
{code}
Everything looks right.

I have also run tests for more complicate tasks.
{code}
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
     LowAnd6LowOr6LowNot       31.59      (1.0%)       28.52      (2.4%)   
-9.7% ( -12% -   -6%)
    HighAnd6LowOr6LowNot        6.10      (2.7%)        5.76      (4.0%)   
-5.6% ( -11% -    1%)
     MedAnd6LowOr6LowNot        7.33      (2.3%)        7.03      (3.1%)   
-4.0% (  -9% -    1%)
    HighAnd6MedOr6LowNot        3.51      (1.5%)        3.49      (2.6%)   
-0.6% (  -4% -    3%)
                PKLookup       95.99      (5.1%)       95.48      (4.9%)   
-0.5% ( -10% -    9%)
    HighAnd6MedOr6MedNot        1.96      (1.3%)        1.97      (2.5%)    
0.4% (  -3% -    4%)
     MedAnd6MedOr6MedNot        2.34      (1.2%)        2.35      (2.3%)    
0.5% (  -2% -    4%)
   HighAnd6LowOr6HighNot        1.31      (1.1%)        1.33      (2.4%)    
0.9% (  -2% -    4%)
    HighAnd6LowOr6MedNot        3.08      (1.5%)        3.12      (2.7%)    
1.2% (  -2% -    5%)
     MedAnd6LowOr6MedNot        3.72      (1.4%)        3.89      (2.6%)    
4.8% (   0% -    8%)
   HighAnd6MedOr6HighNot        1.40      (1.0%)        1.53      (2.4%)    
9.3% (   5% -   12%)
     LowAnd6LowOr6MedNot        9.23      (2.1%)       10.19      (2.7%)   
10.4% (   5% -   15%)
    LowAnd6LowOr6HighNot        6.04      (2.5%)        6.74      (2.9%)   
11.6% (   6% -   17%)
   LowAnd6HighOr6HighNot        4.15      (3.4%)        4.72      (4.2%)   
13.8% (   5% -   22%)
    MedAnd6MedOr6HighNot        1.65      (1.2%)        1.91      (2.2%)   
15.7% (  12% -   19%)
    MedAnd6LowOr6HighNot        2.42      (1.7%)        2.80      (2.7%)   
16.0% (  11% -   20%)
    LowAnd6HighOr6LowNot        4.69      (2.9%)        5.45      (3.7%)   
16.1% (   9% -   23%)
     MedAnd6MedOr6LowNot        3.45      (1.2%)        4.04      (2.1%)   
17.1% (  13% -   20%)
     LowAnd6MedOr6LowNot        8.77      (1.6%)       10.38      (2.4%)   
18.4% (  14% -   22%)
     LowAnd6MedOr6MedNot        6.36      (2.6%)        7.55      (3.5%)   
18.6% (  12% -   25%)
    LowAnd6MedOr6HighNot        5.48      (3.1%)        6.51      (3.9%)   
18.8% (  11% -   26%)
    LowAnd6HighOr6MedNot        5.77      (3.1%)        6.86      (4.3%)   
18.9% (  11% -   27%)
   MedAnd6HighOr6HighNot        1.22      (1.0%)        1.46      (2.0%)   
19.8% (  16% -   23%)
   HighAnd6HighOr6MedNot        1.32      (1.1%)        1.59      (2.0%)   
20.7% (  17% -   24%)
    MedAnd6HighOr6MedNot        1.72      (1.5%)        2.09      (2.2%)   
21.3% (  17% -   25%)
  HighAnd6HighOr6HighNot        1.26      (1.2%)        1.56      (2.1%)   
24.0% (  20% -   27%)
   HighAnd6HighOr6LowNot        1.54      (1.3%)        1.92      (2.0%)   
24.7% (  21% -   28%)
    MedAnd6HighOr6LowNot        2.26      (1.5%)        2.85      (1.9%)   
26.3% (  22% -   30%)
{code}
All look good.

If no other problems, I will begin to clean up those unused logics in the code 
such as BLS, 
and refine the javadoc.

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, SIZE.perf, all.perf, 
> luceneutil-score-equal.patch, luceneutil-score-equal.patch, merge.perf, 
> merge.png, perf.png, stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to