[
https://issues.apache.org/jira/browse/LUCENE-8834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Underwood updated LUCENE-8834:
----------------------------------
Description:
While troubleshooting some multi-valued facet performance problems in Solr I
noticed that caching the SortedNumericDocValues.docValueCount() value when used
as a loop condition provides a performance improvement.
Specifically, going from something like this:
{code:java}
for (int i = 1; i < longs.docValueCount(); i++) {
...
}
{code}
to this:
{code:java}
final int docValueCount = longs.docValueCount();
for (int i = 1; i < docValueCount; i++) {
...
}
{code}
or this:
{code:java}
for (int i = 1, count = longs.docValueCount(); i < count; i++) {
...
}
{code}
provides a faceting performance improvement when trying to facet using doc
values on a multi-valued field with more than a few values per document.
This patch modifies most of the places in Lucene/Solr that were not already
using this pattern.
h2. Unscientific Manual Benchmarks
I focused on the change to NumericFacets.java and
FacetFieldProcessorByHashDV.java since that is what I was specifically trying
to improve.
Details about my setup:
* Index was created using Lucene/Solr 7.6.0 (I'm in the process of upgrading to
8.1.1)
* Total Docs: 5,736,951
* I'm faceting on a single multi-valued field that has 63,070,176 total values
indexed (10.99 values on average per document.)
* OpenJDK 11
h3. Lucene/Solr 7.6.0:
||Facet Type||QTime Before Patch||QTime After Patch||
|Legacy Facets|1042ms|854ms|
|JSON Facets|823ms|783ms|
h3. Lucene/Solr 8.1.1 (using the 7.6.0 index):
||Facet Type||QTime Before Patch||QTime After Patch||
|Legacy Facets|1043ms|777ms|
|JSON Facets|827ms|792ms|
The reported QTime is simply the lowest QTime I was able to get after repeating
the query a few dozen times. So not very scientific but it was repeatable
(removing the patch increased the times, reapplying the patch decreased the
times).
The patch touches both Lucene and Solr code which is why I have filed this as
a LUCENE issue. I can re-organized and break it apart if needed.
was:
While troubleshooting some multi-valued facet performance problems in Solr I
noticed that caching the SortedNumericDocValues.docValueCount() value when used
as a loop condition provides a performance improvement.
Specifically, going from something like this:
{code:java}
for (int i = 1; i < longs.docValueCount(); i++) {
...
}
{code}
to this:
{code:java}
final int docValueCount = longs.docValueCount();
for (int i = 1; i < docValueCount; i++) {
...
}
{code}
provides a faceting performance improvement when trying to facet using doc
values on a multi-valued field with more than a few values per document.
This patch modifies most of the places in Lucene/Solr that were not already
using this pattern.
h2. Unscientific Manual Benchmarks
I focused on the change to NumericFacets.java and
FacetFieldProcessorByHashDV.java since that is what I was specifically trying
to improve.
Details about my setup:
* Index was created using Lucene/Solr 7.6.0 (I'm in the process of upgrading to
8.1.1)
* Total Docs: 5,736,951
* I'm faceting on a single multi-valued field that has 63,070,176 total values
indexed (10.99 values on average per document.)
* OpenJDK 11
h3. Lucene/Solr 7.6.0:
||Facet Type||QTime Before Patch||QTime After Patch||
|Legacy Facets|1042ms|854ms|
|JSON Facets|823ms|783ms|
h3. Lucene/Solr 8.1.1 (using the 7.6.0 index):
||Facet Type||QTime Before Patch||QTime After Patch||
|Legacy Facets|1043ms|777ms|
|JSON Facets|827ms|792ms|
The reported QTime is simply the lowest QTime I was able to get after repeating
the query a few dozen times. So not very scientific but it was repeatable
(removing the patch increased the times, reapplying the patch decreased the
times).
The patch touches both Lucene and Solr code which is why I have filed this as
a LUCENE issue. I can re-organized and break it apart if needed.
> Cache the SortedNumericDocValues.docValueCount() value whenever it is used in
> a loop
> ------------------------------------------------------------------------------------
>
> Key: LUCENE-8834
> URL: https://issues.apache.org/jira/browse/LUCENE-8834
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Tim Underwood
> Priority: Minor
> Time Spent: 1h
> Remaining Estimate: 0h
>
> While troubleshooting some multi-valued facet performance problems in Solr I
> noticed that caching the SortedNumericDocValues.docValueCount() value when
> used as a loop condition provides a performance improvement.
> Specifically, going from something like this:
> {code:java}
> for (int i = 1; i < longs.docValueCount(); i++) {
> ...
> }
> {code}
> to this:
> {code:java}
> final int docValueCount = longs.docValueCount();
> for (int i = 1; i < docValueCount; i++) {
> ...
> }
> {code}
> or this:
> {code:java}
> for (int i = 1, count = longs.docValueCount(); i < count; i++) {
> ...
> }
> {code}
> provides a faceting performance improvement when trying to facet using doc
> values on a multi-valued field with more than a few values per document.
> This patch modifies most of the places in Lucene/Solr that were not already
> using this pattern.
> h2. Unscientific Manual Benchmarks
> I focused on the change to NumericFacets.java and
> FacetFieldProcessorByHashDV.java since that is what I was specifically trying
> to improve.
> Details about my setup:
> * Index was created using Lucene/Solr 7.6.0 (I'm in the process of upgrading
> to 8.1.1)
> * Total Docs: 5,736,951
> * I'm faceting on a single multi-valued field that has 63,070,176 total
> values indexed (10.99 values on average per document.)
> * OpenJDK 11
> h3. Lucene/Solr 7.6.0:
> ||Facet Type||QTime Before Patch||QTime After Patch||
> |Legacy Facets|1042ms|854ms|
> |JSON Facets|823ms|783ms|
> h3. Lucene/Solr 8.1.1 (using the 7.6.0 index):
> ||Facet Type||QTime Before Patch||QTime After Patch||
> |Legacy Facets|1043ms|777ms|
> |JSON Facets|827ms|792ms|
> The reported QTime is simply the lowest QTime I was able to get after
> repeating the query a few dozen times. So not very scientific but it was
> repeatable (removing the patch increased the times, reapplying the patch
> decreased the times).
> The patch touches both Lucene and Solr code which is why I have filed this
> as a LUCENE issue. I can re-organized and break it apart if needed.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]