[ 
https://issues.apache.org/jira/browse/SOLR-7036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

md updated SOLR-7036:
---------------------
    Attachment: SOLR-7036.patch

Update the patch to work against trunk.

Group faceting was very slow for our data set and when the number of docs or 
terms was high, the latency spiked to multiple second requests. The 
UninvertedField provides better overall performance.

On the original implementation, the DocValues are segment based; Solr reads the 
facet VALUES from the DISK in order to merge the terms between the segments. 
This method shows us bad performance for facets with high cardinality.
When we use a top level structure (like UninvertedField), we don't need the 
facet VALUES but only their ORDINAL index which are stored in MEMORY.

A new group facet method that leverages UninvertedField was added.

The two methods for grouping facets are now:
        * group.facet.method=uif to use the UninvertedField method 
        * group.facet.method=original which is the default if not specified.

This patch uses the new JSON faceting API as this is the way SOLR-8466 was 
implemented. (Add support for UnInvertedField based faceting to FacetComponent)
It can also be the first step to implement SOLR-8023 (Add Support for 
group.facet in json facet API).

>>> Major code changes:

        1. SimpleFacet - If the group.facet.method=uif, call the json facet api 
with the group.facet field.
        2. FacetField - If the group.facet field is defined, create 
FacetFieldProcessorUIF (UIF is the only method that supports grouping on json 
API)
        3. UninvertedField - The FacetFieldProcessorUIF calls a new method 
getGroupedCounts that returns the facet counts grouped by group.field. (like 
getCounts but with grouping)

>>> Performance tests on our data set:

        Index size: 6.3 million
        Number of unique facets: 877,000
        Number of unique groups: 5.5 million

        Time in ms.:
        
+---------------+-----------------+-----------------+-----------------+-----------------+------------------+
        |               | 50th percentile | 75th percentile | 95th percentile | 
99th percentile | 100th percentile |
        
+---------------+-----------------+-----------------+-----------------+-----------------+------------------+
        | without patch |      6,542      |      6,999      |      8,427      | 
     14,750     |      70,113      |
        
+---------------+-----------------+-----------------+-----------------+-----------------+------------------+
        |   with patch  |       298       |       563       |      1,495      | 
     2,901      |      19,202      |
        
+---------------+-----------------+-----------------+-----------------+-----------------+------------------+

>>> Unit tests:
        Tests were added on two unit test classes:
                SimpleFacetTest - comparing the results of the two methods 
(original and uiv)
                GroupingSearchTest - grouping tests with multiple grouping 
options (taken from the previous patch of this issue)

Comments are welcome


> Faster method for group.facet
> -----------------------------
>
>                 Key: SOLR-7036
>                 URL: https://issues.apache.org/jira/browse/SOLR-7036
>             Project: Solr
>          Issue Type: Improvement
>          Components: faceting
>    Affects Versions: 4.10.3
>            Reporter: Jim Musil
>            Assignee: Erick Erickson
>             Fix For: 5.5, 6.0
>
>         Attachments: SOLR-7036.patch, SOLR-7036.patch, SOLR-7036.patch, 
> SOLR-7036.patch, performance.txt
>
>
> This is a patch that speeds up the performance of requests made with 
> group.facet=true. The original code that collects and counts unique facet 
> values for each group does not use the same improved field cache methods that 
> have been added for normal faceting in recent versions.
> Specifically, this approach leverages the UninvertedField class which 
> provides a much faster way to look up docs that contain a term. I've also 
> added a simple grouping map so that when a term is found for a doc, it can 
> quickly look up the group to which it belongs.
> Group faceting was very slow for our data set and when the number of docs or 
> terms was high, the latency spiked to multiple second requests. This solution 
> provides better overall performance -- from an average of 54ms to 32ms. It 
> also dropped our slowest performing queries way down -- from 6012ms to 991ms.
> I also added a few tests.
> I added an additional parameter so that you can choose to use this method or 
> the original. Add group.facet.method=fc to use the improved method or 
> group.facet.method=original which is the default if not specified.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to