[jira] [Commented] (LUCENE-5476) Facet sampling

Gilad Barkai (JIRA) Fri, 07 Mar 2014 13:07:36 -0800

    [ 
https://issues.apache.org/jira/browse/LUCENE-5476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13924348#comment-13924348
 ]


Gilad Barkai commented on LUCENE-5476:
--------------------------------------

{quote}
Btw. Is there an easy way to retrieve the total facet counts for a ordinal? 
When correcting facet counts it would a quick win to limit the number of 
estimated documents to the actual number of documents in the index that match 
that facet. (And maybe use the distribution as well, to make better estimates)
{quote}

That's a great idea!

The {{docFreq}} of the category drill-down term is an upper bound - and could 
be used as a limit.
It's cheap, but might not be the exact number as it also take under account 
deleted documents.

The limit should also take under account the total number of hits for the 
query, otherwise the estimate and the multiplication with the sampling factor 
may yield a larger number than the actual results.

> Facet sampling
> --------------
>
>                 Key: LUCENE-5476
>                 URL: https://issues.apache.org/jira/browse/LUCENE-5476
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Rob Audenaerde
>         Attachments: LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
> LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, LUCENE-5476.patch, 
> SamplingComparison_SamplingFacetsCollector.java, SamplingFacetsCollector.java
>
>
> With LUCENE-5339 facet sampling disappeared. 
> When trying to display facet counts on large datasets (>10M documents) 
> counting facets is rather expensive, as all the hits are collected and 
> processed. 
> Sampling greatly reduced this and thus provided a nice speedup. Could it be 
> brought back?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (LUCENE-5476) Facet sampling

Reply via email to