Thanks Shai, I'm trying your solution and it's working, I need to check some number to test it. As I said we are aware having big indexes, than I use facets only on subsets, but if it will result in performances issues too than I'll for sure take a look into facet sampling.
Nicola. On Wed, 2013-01-23 at 13:13 +0200, Shai Erera wrote: > > > > I think we should open an issue to provide support for distributed > > faceting? > > > > Opened https://issues.apache.org/jira/browse/LUCENE-4710. > > BTW Nicola, I remember you said something about TBs of indexes. I just > wanted to point out that if you have really large indexes, with many > documents, then you may want to look at facets sampling. I.e., instead of > working hard to get exact counts, you can sample the result set and get an > approximation to the top-K categories. You can then choose to either 'fully > count the approximated top-K', or stick w/ their partial counts and display > pctg (%) to the user. > > In fact, when the number of results is so big, think about the following > result: > > A (456,873,234) > A/1 (143,548,034) > A/1 (137,323,452) > > These numbers are too big for a human to process the value behind them. > Following the big numbers rule, these just denote "lots of results" to > anyone. > Rather, it may be better if it displayed A/1 (87%) and A/2 (85%). > This is something you may want to consider too. > > Sampling improves the performance of faceted search, especially on large > result sets. > Displaying % counts clarifies the returned top-K categories better, IMO, to > the common user. > > Shai > > > On Tue, Jan 22, 2013 at 4:57 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > > > On Mon, Jan 21, 2013 at 11:20 PM, Shai Erera <ser...@gmail.com> wrote: > > > > > (unfortunately, there's still no tool in Lucene to do that for you). > > > > I think we should open an issue to provide support for distributed > > faceting? > > > > For example, we already provide support for distributed searching > > (TopDocs.merge), and distributed grouping (TopGroups.merge) ... seems > > like we should do the same for distributed faceting (even though its > > somewhat tricky)? > > > > Mike McCandless > > > > http://blog.mikemccandless.com > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org