[
https://issues.apache.org/jira/browse/SOLR-6216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14048153#comment-14048153
]
Tomás Fernández Löbbe commented on SOLR-6216:
---------------------------------------------
I did some very basic performance testing to compare interval faceting vs facet
queries:
Dataset: Geonames.org dataset (added 4 times to make it a 33M docs)
Query Set: 4960 boolean queries using terms from the dataset
1 document updated every second
autoSoftCommit every second.
HW: MacBook Pro Core i7, 2.7 GHz with 8 GB of RAM with spinning disk (5400 RPM)
All times are in milliseconds
Repeated the test with different number of intervals (on the “population” field
of the geonames dataset)
|| || Num Intervals || 1 || 2 || 3 || 4 || 5 || 10 ||
| Min |Intervals | 25 | 23 | 26 | 23 | 24 | 26 |
| | Facet Query | 2 | 2 | 3 | 4 | 4 | 6 |
|Max | Intervals | 1885 | 2254 | 2508 | 2800 | 2749 | 3031 |
| | Facet Query | 2199 | 2414 | 3957 | 2766 | 1869 | 5975 |
| Average | Intervals | 181 | 177 | 191 | 183 | 148 | 174 |
| |Facet Query| 156| 277| 359| 299| 216| 408|
|P10 |Intervals |53 |54 |54 |54 |54 |56|
| |Facet Query |26 |30 |33 |31 |29 |35|
|P50 |Intervals |96 |95 |98 |97 |88 |96|
| |Facet Query |54 |211 |293 |188 |58 |74|
|P90 |Intervals |453 |940 |467 |458 |350 |438|
| |Facet Query |432 |656 |794 |749 |660 |1066|
|P99 |Intervals |809 |884 |968 |877 |857 |897|
| |Facet Query |867 |1041 |1354 |1219 |1116 |1784|
There is some variation between the tests with different number of intervals
(with the same method) that I don’t understand very well. For each test, I’d
restart the jetty (index files are probably cached between tests though).
In general what I see is that the average is similar or lower than facet query,
the p10 and p50 similar or higher than facet query (these are probably the
cases where the facet queries hit cache), and lower p90 p99 for the Intervals
impl. This probably because of facet query missing cache.
“Max” variates a lot, I don’t think it’s a very representative number, I just
left it for completeness. Min is very similar for all cases, it’s obvious that
in the best case (all cache hits), facet query is much faster than intervals.
I also did a quick test on an internal collection with around 100M docs in a
single shard, ran around 6000 queries with around 40 intervals each, for this
test I got:
|Min |Intervals |122|
| |Facet Query |124|
|Max |Intervals |6626|
| |Facet Query |61009|
|Average |Intervals |238|
| |Facet Query |620|
|P10 |Intervals |155|
| |Facet Query |151|
|P50 |Intervals |201|
| |Facet Query |202|
|P90 |Intervals |324|
| |Facet Query |461|
|P99 |Intervals |836|
| |Facet Query |23662|
This domain has updates and soft commits.
I don’t have numbers for distributed tests, but from what I could see, the
result was even better on wide domains, because of the lower p90/p99 I assume.
> Better faceting for multiple intervals on DV fields
> ---------------------------------------------------
>
> Key: SOLR-6216
> URL: https://issues.apache.org/jira/browse/SOLR-6216
> Project: Solr
> Issue Type: Improvement
> Reporter: Tomás Fernández Löbbe
> Attachments: SOLR-6216.patch
>
>
> There are two ways to have faceting on values ranges in Solr right now:
> “Range Faceting” and “Query Faceting” (doing range queries). They both end up
> doing something similar:
> {code:java}
> searcher.numDocs(rangeQ , docs)
> {code}
> The good thing about this implementation is that it can benefit from caching.
> The bad thing is that it may be slow with cold caches, and that there will be
> a query for each of the ranges.
> A different implementation would be one that works similar to regular field
> faceting, using doc values and validating ranges for each value of the
> matching documents. This implementation would sometimes be faster than Range
> Faceting / Query Faceting, specially on cases where caches are not very
> effective, like on a high update rate, or where ranges change frequently.
> Functionally, the result should be exactly the same as the one obtained by
> doing a facet query for every interval
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]