[ 
https://issues.apache.org/jira/browse/LUCENE-8834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Underwood updated LUCENE-8834:
----------------------------------
    Description: 
While troubleshooting some multi-valued facet performance problems in Solr I 
noticed that caching the SortedNumericDocValues.docValueCount() value when used 
as a loop condition provides a performance improvement.

Specifically, going from something like this:
{code:java}
for (int i = 1; i < longs.docValueCount(); i++) {
  ...
}
{code}
to this:
{code:java}
final int docValueCount = longs.docValueCount();
for (int i = 1; i < docValueCount; i++) {
  ...
}
{code}

or this:

{code:java}
for (int i = 1, count = longs.docValueCount(); i < count; i++) {
  ...
}
{code}

provides a faceting performance improvement when trying to facet using doc 
values on a multi-valued field with more than a few values per document.

This patch modifies most of the places in Lucene/Solr that were not already 
using this pattern.
h2. Unscientific Manual Benchmarks

I focused on the change to NumericFacets.java and 
FacetFieldProcessorByHashDV.java since that is what I was specifically trying 
to improve.

Details about my setup:

* Index was created using Lucene/Solr 7.6.0 (I'm in the process of upgrading to 
8.1.1)
* Total Docs: 5,736,951
*  I'm faceting on a single multi-valued field that has 63,070,176 total values 
indexed (10.99 values on average per document.)
* OpenJDK 11

h3. Lucene/Solr 7.6.0:
||Facet Type||QTime Before Patch||QTime After Patch||
|Legacy Facets|1042ms|854ms|
|JSON Facets|823ms|783ms|
h3. Lucene/Solr 8.1.1 (using the 7.6.0 index):
||Facet Type||QTime Before Patch||QTime After Patch||
|Legacy Facets|1043ms|777ms|
|JSON Facets|827ms|792ms|

The reported QTime is simply the lowest QTime I was able to get after repeating 
the query a few dozen times. So not very scientific but it was repeatable 
(removing the patch increased the times, reapplying the patch decreased the 
times).

 The patch touches both Lucene and Solr code which is why I have filed this as 
a LUCENE issue.  I can re-organized and break it apart if needed.

  was:
While troubleshooting some multi-valued facet performance problems in Solr I 
noticed that caching the SortedNumericDocValues.docValueCount() value when used 
as a loop condition provides a performance improvement.

Specifically, going from something like this:
{code:java}
for (int i = 1; i < longs.docValueCount(); i++) {
  ...
}
{code}
to this:
{code:java}
final int docValueCount = longs.docValueCount();
for (int i = 1; i < docValueCount; i++) {
  ...
}
{code}
provides a faceting performance improvement when trying to facet using doc 
values on a multi-valued field with more than a few values per document.

This patch modifies most of the places in Lucene/Solr that were not already 
using this pattern.
h2. Unscientific Manual Benchmarks

I focused on the change to NumericFacets.java and 
FacetFieldProcessorByHashDV.java since that is what I was specifically trying 
to improve.

Details about my setup:

* Index was created using Lucene/Solr 7.6.0 (I'm in the process of upgrading to 
8.1.1)
* Total Docs: 5,736,951
*  I'm faceting on a single multi-valued field that has 63,070,176 total values 
indexed (10.99 values on average per document.)
* OpenJDK 11

h3. Lucene/Solr 7.6.0:
||Facet Type||QTime Before Patch||QTime After Patch||
|Legacy Facets|1042ms|854ms|
|JSON Facets|823ms|783ms|
h3. Lucene/Solr 8.1.1 (using the 7.6.0 index):
||Facet Type||QTime Before Patch||QTime After Patch||
|Legacy Facets|1043ms|777ms|
|JSON Facets|827ms|792ms|

The reported QTime is simply the lowest QTime I was able to get after repeating 
the query a few dozen times. So not very scientific but it was repeatable 
(removing the patch increased the times, reapplying the patch decreased the 
times).

 The patch touches both Lucene and Solr code which is why I have filed this as 
a LUCENE issue.  I can re-organized and break it apart if needed.


> Cache the SortedNumericDocValues.docValueCount() value whenever it is used in 
> a loop
> ------------------------------------------------------------------------------------
>
>                 Key: LUCENE-8834
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8834
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Tim Underwood
>            Priority: Minor
>          Time Spent: 1h
>  Remaining Estimate: 0h
>
> While troubleshooting some multi-valued facet performance problems in Solr I 
> noticed that caching the SortedNumericDocValues.docValueCount() value when 
> used as a loop condition provides a performance improvement.
> Specifically, going from something like this:
> {code:java}
> for (int i = 1; i < longs.docValueCount(); i++) {
>   ...
> }
> {code}
> to this:
> {code:java}
> final int docValueCount = longs.docValueCount();
> for (int i = 1; i < docValueCount; i++) {
>   ...
> }
> {code}
> or this:
> {code:java}
> for (int i = 1, count = longs.docValueCount(); i < count; i++) {
>   ...
> }
> {code}
> provides a faceting performance improvement when trying to facet using doc 
> values on a multi-valued field with more than a few values per document.
> This patch modifies most of the places in Lucene/Solr that were not already 
> using this pattern.
> h2. Unscientific Manual Benchmarks
> I focused on the change to NumericFacets.java and 
> FacetFieldProcessorByHashDV.java since that is what I was specifically trying 
> to improve.
> Details about my setup:
> * Index was created using Lucene/Solr 7.6.0 (I'm in the process of upgrading 
> to 8.1.1)
> * Total Docs: 5,736,951
> *  I'm faceting on a single multi-valued field that has 63,070,176 total 
> values indexed (10.99 values on average per document.)
> * OpenJDK 11
> h3. Lucene/Solr 7.6.0:
> ||Facet Type||QTime Before Patch||QTime After Patch||
> |Legacy Facets|1042ms|854ms|
> |JSON Facets|823ms|783ms|
> h3. Lucene/Solr 8.1.1 (using the 7.6.0 index):
> ||Facet Type||QTime Before Patch||QTime After Patch||
> |Legacy Facets|1043ms|777ms|
> |JSON Facets|827ms|792ms|
> The reported QTime is simply the lowest QTime I was able to get after 
> repeating the query a few dozen times. So not very scientific but it was 
> repeatable (removing the patch increased the times, reapplying the patch 
> decreased the times).
>  The patch touches both Lucene and Solr code which is why I have filed this 
> as a LUCENE issue.  I can re-organized and break it apart if needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to