[jira] [Updated] (LUCENE-6184) BooleanScorer should better deal with sparse clauses

Adrien Grand (JIRA) Mon, 19 Jan 2015 02:48:02 -0800

     [ 
https://issues.apache.org/jira/browse/LUCENE-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Adrien Grand updated LUCENE-6184:
---------------------------------
    Attachment: LUCENE-6184.patch

Same patch, just adding the suggested API in order to make BulkScorer able to 
skip. Results of the luceneutil benchmark still look similar:

{code}
              AndHighLow      883.42      (3.5%)      872.51      (3.3%)   
-1.2% (  -7% -    5%)
            OrNotHighLow     1052.93      (4.4%)     1048.44      (4.5%)   
-0.4% (  -8% -    8%)
                PKLookup      277.07      (2.0%)      276.65      (2.1%)   
-0.2% (  -4% -    4%)
              AndHighMed      137.40      (1.9%)      137.30      (2.4%)   
-0.1% (  -4% -    4%)
            HighSpanNear       34.67      (3.1%)       34.65      (3.0%)   
-0.0% (  -5% -    6%)
         LowSloppyPhrase      215.69      (2.5%)      215.61      (2.5%)   
-0.0% (  -4% -    5%)
         MedSloppyPhrase      183.08      (2.5%)      183.11      (2.0%)    
0.0% (  -4% -    4%)
              HighPhrase       26.33      (6.8%)       26.34      (6.8%)    
0.0% ( -12% -   14%)
             AndHighHigh       51.61      (1.8%)       51.64      (2.0%)    
0.0% (  -3% -    3%)
               LowPhrase       74.61      (1.3%)       74.68      (1.4%)    
0.1% (  -2% -    2%)
        HighSloppyPhrase       14.94      (5.7%)       14.97      (5.0%)    
0.2% (  -9% -   11%)
               MedPhrase       31.42      (1.1%)       31.47      (1.1%)    
0.2% (  -1% -    2%)
             LowSpanNear       55.89      (2.5%)       56.00      (2.5%)    
0.2% (  -4% -    5%)
                 Respell       73.38      (2.4%)       73.54      (2.2%)    
0.2% (  -4% -    4%)
            OrNotHighMed      118.20      (1.6%)      118.66      (1.7%)    
0.4% (  -2% -    3%)
             MedSpanNear       78.17      (3.2%)       78.62      (3.5%)    
0.6% (  -5% -    7%)
           OrHighNotHigh       31.47      (1.8%)       31.66      (1.9%)    
0.6% (  -2% -    4%)
           OrNotHighHigh       50.29      (1.6%)       50.63      (2.0%)    
0.7% (  -2% -    4%)
            OrHighNotMed       82.27      (2.3%)       83.17      (2.3%)    
1.1% (  -3% -    5%)
                 VeryLow     6149.21      (4.7%)     6223.22      (5.4%)    
1.2% (  -8% -   11%)
            OrHighNotLow       55.30      (3.2%)       56.25      (2.5%)    
1.7% (  -3% -    7%)
                 LowTerm      808.21      (7.3%)      824.32      (4.5%)    
2.0% (  -9% -   14%)
                HighTerm      106.18      (4.3%)      108.63      (3.0%)    
2.3% (  -4% -   10%)
                 MedTerm      296.65      (4.2%)      304.42      (2.7%)    
2.6% (  -4% -   10%)
                Wildcard       20.85      (7.5%)       21.50      (5.3%)    
3.1% (  -8% -   17%)
                 Prefix3       95.63      (6.2%)       98.81      (5.3%)    
3.3% (  -7% -   15%)
                  Fuzzy2       62.12      (9.0%)       64.44     (10.2%)    
3.7% ( -14% -   25%)
                  IntNRQ        8.85      (8.9%)        9.21      (6.7%)    
4.1% ( -10% -   21%)
                  Fuzzy1      105.42     (11.2%)      116.28      (4.8%)   
10.3% (  -5% -   29%)
               OrHighLow       51.75      (8.2%)       59.92      (8.2%)   
15.8% (   0% -   35%)
              OrHighHigh       32.34      (8.5%)       37.53      (8.5%)   
16.0% (   0% -   36%)
               OrHighMed       16.79      (8.7%)       19.62      (8.8%)   
16.8% (   0% -   37%)
          VeryLowVeryLow     2053.12      (2.3%)     2399.38      (3.2%)   
16.9% (  11% -   22%)
{code}

> BooleanScorer should better deal with sparse clauses
> ----------------------------------------------------
>
>                 Key: LUCENE-6184
>                 URL: https://issues.apache.org/jira/browse/LUCENE-6184
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Adrien Grand
>            Assignee: Adrien Grand
>            Priority: Minor
>             Fix For: Trunk, 5.1
>
>         Attachments: LUCENE-6184.patch, LUCENE-6184.patch, LUCENE-6184.patch
>
>
> The way that BooleanScorer works looks like this:
> {code}
> for each (window of 2048 docs) {
>   for each (optional scorer) {
>     scorer.score(window)
>   }
> }
> {code}
> This is not efficient for very sparse clauses (doc freq much lower than 
> maxDoc/2048) since we keep on scoring windows of documents that do not match 
> anything. BooleanScorer2 currently performs better in those cases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Updated] (LUCENE-6184) BooleanScorer should better deal with sparse clauses

Reply via email to