[
https://issues.apache.org/jira/browse/SOLR-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
md updated SOLR-7036:
---------------------
Attachment: SOLR-7036.patch
Update the patch to work against trunk.
Group faceting was very slow for our data set and when the number of docs or
terms was high, the latency spiked to multiple second requests. The
UninvertedField provides better overall performance.
On the original implementation, the DocValues are segment based; Solr reads the
facet VALUES from the DISK in order to merge the terms between the segments.
This method shows us bad performance for facets with high cardinality.
When we use a top level structure (like UninvertedField), we don't need the
facet VALUES but only their ORDINAL index which are stored in MEMORY.
A new group facet method that leverages UninvertedField was added.
The two methods for grouping facets are now:
* group.facet.method=uif to use the UninvertedField method
* group.facet.method=original which is the default if not specified.
This patch uses the new JSON faceting API as this is the way SOLR-8466 was
implemented. (Add support for UnInvertedField based faceting to FacetComponent)
It can also be the first step to implement SOLR-8023 (Add Support for
group.facet in json facet API).
>>> Major code changes:
1. SimpleFacet - If the group.facet.method=uif, call the json facet api
with the group.facet field.
2. FacetField - If the group.facet field is defined, create
FacetFieldProcessorUIF (UIF is the only method that supports grouping on json
API)
3. UninvertedField - The FacetFieldProcessorUIF calls a new method
getGroupedCounts that returns the facet counts grouped by group.field. (like
getCounts but with grouping)
>>> Performance tests on our data set:
Index size: 6.3 million
Number of unique facets: 877,000
Number of unique groups: 5.5 million
Time in ms.:
+---------------+-----------------+-----------------+-----------------+-----------------+------------------+
| | 50th percentile | 75th percentile | 95th percentile |
99th percentile | 100th percentile |
+---------------+-----------------+-----------------+-----------------+-----------------+------------------+
| without patch | 6,542 | 6,999 | 8,427 |
14,750 | 70,113 |
+---------------+-----------------+-----------------+-----------------+-----------------+------------------+
| with patch | 298 | 563 | 1,495 |
2,901 | 19,202 |
+---------------+-----------------+-----------------+-----------------+-----------------+------------------+
>>> Unit tests:
Tests were added on two unit test classes:
SimpleFacetTest - comparing the results of the two methods
(original and uiv)
GroupingSearchTest - grouping tests with multiple grouping
options (taken from the previous patch of this issue)
Comments are welcome
> Faster method for group.facet
> -----------------------------
>
> Key: SOLR-7036
> URL: https://issues.apache.org/jira/browse/SOLR-7036
> Project: Solr
> Issue Type: Improvement
> Components: faceting
> Affects Versions: 4.10.3
> Reporter: Jim Musil
> Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch,
> SOLR-7036.patch, performance.txt
>
>
> This is a patch that speeds up the performance of requests made with
> group.facet=true. The original code that collects and counts unique facet
> values for each group does not use the same improved field cache methods that
> have been added for normal faceting in recent versions.
> Specifically, this approach leverages the UninvertedField class which
> provides a much faster way to look up docs that contain a term. I've also
> added a simple grouping map so that when a term is found for a doc, it can
> quickly look up the group to which it belongs.
> Group faceting was very slow for our data set and when the number of docs or
> terms was high, the latency spiked to multiple second requests. This solution
> provides better overall performance -- from an average of 54ms to 32ms. It
> also dropped our slowest performing queries way down -- from 6012ms to 991ms.
> I also added a few tests.
> I added an additional parameter so that you can choose to use this method or
> the original. Add group.facet.method=fc to use the improved method or
> group.facet.method=original which is the default if not specified.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]