[
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924348#comment-13924348
]
Gilad Barkai commented on LUCENE-5476:
--------------------------------------
{quote}
Btw. Is there an easy way to retrieve the total facet counts for a ordinal?
When correcting facet counts it would a quick win to limit the number of
estimated documents to the actual number of documents in the index that match
that facet. (And maybe use the distribution as well, to make better estimates)
{quote}
That's a great idea!
The {{docFreq}} of the category drill-down term is an upper bound - and could
be used as a limit.
It's cheap, but might not be the exact number as it also take under account
deleted documents.
The limit should also take under account the total number of hits for the
query, otherwise the estimate and the multiplication with the sampling factor
may yield a larger number than the actual results.
> Facet sampling
> --------------
>
> Key: LUCENE-5476
> URL: https://issues.apache.org/jira/browse/LUCENE-5476
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Rob Audenaerde
> Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
> LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch,
> SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java
>
>
> With LUCENE-5339 facet sampling disappeared.
> When trying to display facet counts on large datasets (>10M documents)
> counting facets is rather expensive, as all the hits are collected and
> processed.
> Sampling greatly reduced this and thus provided a nice speedup. Could it be
> brought back?
--
This message was sent by Atlassian JIRA
(v6.2#6252)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]