[
https://issues.apache.org/jira/browse/LUCENE-7462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15588818#comment-15588818
]
Michael McCandless commented on LUCENE-7462:
--------------------------------------------
I also see good speedups to the otherwise "lightweight" queries:
{noformat}
Report after iter 19:
Task QPS base StdDev QPS comp StdDev
Pct diff
Prefix3 43.40 (5.2%) 42.48 (8.8%)
-2.1% ( -15% - 12%)
IntNRQ 10.05 (8.8%) 9.87 (10.5%)
-1.8% ( -19% - 19%)
HighSpanNear 19.38 (5.2%) 19.14 (6.6%)
-1.2% ( -12% - 11%)
LowPhrase 19.34 (1.9%) 19.21 (3.6%)
-0.7% ( -6% - 4%)
PKLookup 350.45 (1.3%) 348.51 (2.8%)
-0.6% ( -4% - 3%)
MedSpanNear 41.12 (4.5%) 40.98 (4.7%)
-0.4% ( -9% - 9%)
Fuzzy1 115.35 (2.3%) 115.06 (2.8%)
-0.2% ( -5% - 5%)
LowSpanNear 85.93 (2.1%) 85.78 (2.3%)
-0.2% ( -4% - 4%)
MedPhrase 77.08 (2.7%) 77.03 (2.9%)
-0.1% ( -5% - 5%)
Respell 62.22 (2.2%) 62.26 (1.4%)
0.1% ( -3% - 3%)
Wildcard 37.39 (4.4%) 37.43 (5.8%)
0.1% ( -9% - 10%)
Fuzzy2 100.18 (2.0%) 100.31 (1.6%)
0.1% ( -3% - 3%)
LowSloppyPhrase 14.75 (4.9%) 14.79 (4.2%)
0.2% ( -8% - 9%)
HighPhrase 3.81 (5.2%) 3.82 (6.2%)
0.4% ( -10% - 12%)
AndHighLow 912.50 (2.5%) 916.11 (3.8%)
0.4% ( -5% - 6%)
OrNotHighLow 957.24 (2.5%) 963.91 (2.7%)
0.7% ( -4% - 6%)
MedSloppyPhrase 48.46 (4.8%) 48.80 (4.3%)
0.7% ( -8% - 10%)
AndHighMed 46.40 (1.7%) 46.87 (1.6%)
1.0% ( -2% - 4%)
AndHighHigh 43.36 (1.9%) 43.80 (1.9%)
1.0% ( -2% - 4%)
LowTerm 449.83 (2.5%) 454.76 (5.1%)
1.1% ( -6% - 8%)
HighSloppyPhrase 16.13 (6.8%) 16.34 (6.3%)
1.3% ( -11% - 15%)
OrNotHighMed 98.19 (3.2%) 99.56 (3.1%)
1.4% ( -4% - 7%)
OrNotHighHigh 21.69 (4.5%) 22.16 (4.8%)
2.2% ( -6% - 12%)
OrHighNotHigh 18.16 (7.7%) 18.75 (8.0%)
3.2% ( -11% - 20%)
OrHighNotMed 61.81 (9.4%) 64.27 (9.5%)
4.0% ( -13% - 25%)
MedTerm 123.87 (4.5%) 129.22 (3.3%)
4.3% ( -3% - 12%)
OrHighNotLow 25.19 (11.2%) 26.28 (11.5%)
4.4% ( -16% - 30%)
OrHighHigh 12.29 (7.4%) 12.96 (8.7%)
5.5% ( -9% - 23%)
OrHighMed 12.36 (7.4%) 13.09 (8.5%)
5.9% ( -9% - 23%)
HighTerm 38.51 (5.7%) 40.80 (4.4%)
5.9% ( -3% - 17%)
OrHighLow 19.42 (8.6%) 20.66 (9.7%)
6.4% ( -10% - 26%)
{noformat}
> Faster search APIs for doc values
> ---------------------------------
>
> Key: LUCENE-7462
> URL: https://issues.apache.org/jira/browse/LUCENE-7462
> Project: Lucene - Core
> Issue Type: Improvement
> Affects Versions: master (7.0)
> Reporter: Adrien Grand
> Priority: Minor
> Attachments: LUCENE-7462-advanceExact.patch
>
>
> While the iterator API helps deal with sparse doc values more efficiently, it
> also makes search-time operations more costly. For instance, the old
> random-access API allowed to compute facets on a given segment without any
> conditionals, by just incrementing the counter at index {{ordinal+1}} while
> the new API requires to advance the iterator if necessary and then check
> whether it is exactly on the right document or not.
> Since it is very common for fields to exist across most documents, I suspect
> codecs will keep an internal structure that is similar to the current codec
> in the dense case, by having a dense representation of the data and just
> making the iterator skip over the minority of documents that do not have a
> value.
> I suggest that we add APIs that make things cheaper at search time. For
> instance in the case of SORTED doc values, it could look like
> {{LegacySortedDocValues}} with the additional restriction that documents can
> only be consumed in order. Codecs that can implement this API efficiently
> would hide it behind a {{SortedDocValues}} adapter, and then at search time
> facets and comparators (which liked the {{LegacySortedDocValues}} API better)
> would either unwrap or hide the SortedDocValues they got behind a more
> random-access API (which would only happen in the truly sparse case if the
> codec optimizes the dense case).
> One challenge is that we already use the same idea for hiding single-valued
> impls behind multi-valued impls, so we would need to enforce the order in
> which the wrapping needs to happen. At first sight, it seems that it would be
> best to do the single-value-behind-multi-value-API wrapping above the
> random-access-behind-iterator-API wrapping. The complexity of
> wrapping/unwrapping in the right order could be contained in the
> {{DocValues}} helper class.
> I think this change would also simplify search-time consumption of doc
> values, which currently needs to spend several lines of code positioning the
> iterator everytime it needs to do something interesting with doc values.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]