[
https://issues.apache.org/jira/browse/LUCENE-6184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Adrien Grand updated LUCENE-6184:
---------------------------------
Attachment: LUCENE-6184.patch
Same patch, just adding the suggested API in order to make BulkScorer able to
skip. Results of the luceneutil benchmark still look similar:
{code}
AndHighLow 883.42 (3.5%) 872.51 (3.3%)
-1.2% ( -7% - 5%)
OrNotHighLow 1052.93 (4.4%) 1048.44 (4.5%)
-0.4% ( -8% - 8%)
PKLookup 277.07 (2.0%) 276.65 (2.1%)
-0.2% ( -4% - 4%)
AndHighMed 137.40 (1.9%) 137.30 (2.4%)
-0.1% ( -4% - 4%)
HighSpanNear 34.67 (3.1%) 34.65 (3.0%)
-0.0% ( -5% - 6%)
LowSloppyPhrase 215.69 (2.5%) 215.61 (2.5%)
-0.0% ( -4% - 5%)
MedSloppyPhrase 183.08 (2.5%) 183.11 (2.0%)
0.0% ( -4% - 4%)
HighPhrase 26.33 (6.8%) 26.34 (6.8%)
0.0% ( -12% - 14%)
AndHighHigh 51.61 (1.8%) 51.64 (2.0%)
0.0% ( -3% - 3%)
LowPhrase 74.61 (1.3%) 74.68 (1.4%)
0.1% ( -2% - 2%)
HighSloppyPhrase 14.94 (5.7%) 14.97 (5.0%)
0.2% ( -9% - 11%)
MedPhrase 31.42 (1.1%) 31.47 (1.1%)
0.2% ( -1% - 2%)
LowSpanNear 55.89 (2.5%) 56.00 (2.5%)
0.2% ( -4% - 5%)
Respell 73.38 (2.4%) 73.54 (2.2%)
0.2% ( -4% - 4%)
OrNotHighMed 118.20 (1.6%) 118.66 (1.7%)
0.4% ( -2% - 3%)
MedSpanNear 78.17 (3.2%) 78.62 (3.5%)
0.6% ( -5% - 7%)
OrHighNotHigh 31.47 (1.8%) 31.66 (1.9%)
0.6% ( -2% - 4%)
OrNotHighHigh 50.29 (1.6%) 50.63 (2.0%)
0.7% ( -2% - 4%)
OrHighNotMed 82.27 (2.3%) 83.17 (2.3%)
1.1% ( -3% - 5%)
VeryLow 6149.21 (4.7%) 6223.22 (5.4%)
1.2% ( -8% - 11%)
OrHighNotLow 55.30 (3.2%) 56.25 (2.5%)
1.7% ( -3% - 7%)
LowTerm 808.21 (7.3%) 824.32 (4.5%)
2.0% ( -9% - 14%)
HighTerm 106.18 (4.3%) 108.63 (3.0%)
2.3% ( -4% - 10%)
MedTerm 296.65 (4.2%) 304.42 (2.7%)
2.6% ( -4% - 10%)
Wildcard 20.85 (7.5%) 21.50 (5.3%)
3.1% ( -8% - 17%)
Prefix3 95.63 (6.2%) 98.81 (5.3%)
3.3% ( -7% - 15%)
Fuzzy2 62.12 (9.0%) 64.44 (10.2%)
3.7% ( -14% - 25%)
IntNRQ 8.85 (8.9%) 9.21 (6.7%)
4.1% ( -10% - 21%)
Fuzzy1 105.42 (11.2%) 116.28 (4.8%)
10.3% ( -5% - 29%)
OrHighLow 51.75 (8.2%) 59.92 (8.2%)
15.8% ( 0% - 35%)
OrHighHigh 32.34 (8.5%) 37.53 (8.5%)
16.0% ( 0% - 36%)
OrHighMed 16.79 (8.7%) 19.62 (8.8%)
16.8% ( 0% - 37%)
VeryLowVeryLow 2053.12 (2.3%) 2399.38 (3.2%)
16.9% ( 11% - 22%)
{code}
> BooleanScorer should better deal with sparse clauses
> ----------------------------------------------------
>
> Key: LUCENE-6184
> URL: https://issues.apache.org/jira/browse/LUCENE-6184
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Adrien Grand
> Assignee: Adrien Grand
> Priority: Minor
> Fix For: Trunk, 5.1
>
> Attachments: LUCENE-6184.patch, LUCENE-6184.patch, LUCENE-6184.patch
>
>
> The way that BooleanScorer works looks like this:
> {code}
> for each (window of 2048 docs) {
> for each (optional scorer) {
> scorer.score(window)
> }
> }
> {code}
> This is not efficient for very sparse clauses (doc freq much lower than
> maxDoc/2048) since we keep on scoring windows of documents that do not match
> anything. BooleanScorer2 currently performs better in those cases.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]