[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Da Huang (JIRA) Sun, 03 Aug 2014 22:39:27 -0700

     [ 
https://issues.apache.org/jira/browse/LUCENE-4396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Da Huang updated LUCENE-4396:
-----------------------------

    Attachment: tasks.cpp
                LUCENE-4396.patch
                And.tasks

The patch based on git mirror commit 67d17eb81b754fa242bb91e1b91070fd8b38ecd9 .

In this patch, I remove those unused classes, encapsulate some functions and 
fix some bugs.

Besides, the tasks file used before has heavy relevance between cases.
I think it's not good. Therefore, I generate a new tasks file.

The file And.tasks is the new tasks file, while 'tasks.cpp' is the program to 
generate this tasks file.
You can generate tasks file by running
{code}
g++ tasks.cpp -std=c++0x -o tasks
./tasks < wikimedium.10M.nostopwords.tasks > And.tasks
{code}

The perf. on the new tasks file is as follows.
{code}
                    TaskQPS baseline      StdDevQPS my_version      StdDev      
          Pct diff
          HighAnd5LowNot        5.40      (5.1%)        4.88      (4.2%)   
-9.6% ( -18% -    0%)
           HighAnd5LowOr        7.05     (10.2%)        6.87      (3.8%)   
-2.6% ( -15% -   12%)
           LowAnd5LowNot       27.17      (2.1%)       26.47      (2.6%)   
-2.6% (  -7% -    2%)
          HighAnd5HighOr        1.13      (3.8%)        1.11      (2.2%)   
-1.8% (  -7% -    4%)
            LowAnd5LowOr       31.82      (2.6%)       31.35      (2.3%)   
-1.5% (  -6% -    3%)
                PKLookup       98.80      (5.2%)      102.02      (6.3%)    
3.3% (  -7% -   15%)
         HighAnd5HighNot        1.95      (1.0%)        2.04      (2.1%)    
4.7% (   1% -    7%)
          LowAnd5HighNot        9.46      (2.9%)       10.32      (2.7%)    
9.0% (   3% -   15%)
           LowAnd5HighOr        7.56      (2.8%)        8.42      (2.8%)   
11.4% (   5% -   17%)
          LowAnd60HighOr        0.51      (2.5%)        0.82      (4.8%)   
58.7% (  50% -   67%)
          LowAnd60LowNot        2.61      (1.0%)        4.64      (3.4%)   
78.0% (  72% -   83%)
         HighAnd60LowNot        1.30      (1.2%)        2.36      (3.7%)   
81.1% (  75% -   87%)
          HighAnd60LowOr        1.18      (1.3%)        2.15      (3.7%)   
82.0% (  76% -   88%)
           LowAnd60LowOr        2.25      (0.6%)        4.61      (4.2%)  
104.7% (  99% -  110%)
         HighAnd60HighOr        0.10      (0.7%)        0.26      (4.8%)  
151.2% ( 144% -  157%)
         LowAnd60HighNot        0.53      (2.5%)        1.62      (8.0%)  
204.0% ( 188% -  220%)
        HighAnd60HighNot        0.14      (0.9%)        0.59      (8.9%)  
328.4% ( 315% -  341%)
{code}

My next step is to do more tests to get better rules and make sure the 
correctness. I think it can be finished by this Friday.

As the suggested pencil down date is comming, I will begin to scrub the code, 
improve the comments, and write document in conclusion.

> BooleanScorer should sometimes be used for MUST clauses
> -------------------------------------------------------
>
>                 Key: LUCENE-4396
>                 URL: https://issues.apache.org/jira/browse/LUCENE-4396
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Michael McCandless
>         Attachments: And.tasks, And.tasks, AndOr.tasks, AndOr.tasks, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, LUCENE-4396.patch, 
> LUCENE-4396.patch, SIZE.perf, all.perf, luceneutil-score-equal.patch, 
> luceneutil-score-equal.patch, stat.cpp, stat.cpp, tasks.cpp
>
>
> Today we only use BooleanScorer if the query consists of SHOULD and MUST_NOT.
> If there is one or more MUST clauses we always use BooleanScorer2.
> But I suspect that unless the MUST clauses have very low hit count compared 
> to the other clauses, that BooleanScorer would perform better than 
> BooleanScorer2.  BooleanScorer still has some vestiges from when it used to 
> handle MUST so it shouldn't be hard to bring back this capability ... I think 
> the challenging part might be the heuristics on when to use which (likely we 
> would have to use firstDocID as proxy for total hit count).
> Likely we should also have BooleanScorer sometimes use .advance() on the subs 
> in this case, eg if suddenly the MUST clause skips 1000000 docs then you want 
> to .advance() all the SHOULD clauses.
> I won't have near term time to work on this so feel free to take it if you 
> are inspired!



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-4396) BooleanScorer should sometimes be used for MUST clauses

Reply via email to