> > I think we should open an issue to provide support for distributed > faceting? >
Opened https://issues.apache.org/jira/browse/LUCENE-4710. BTW Nicola, I remember you said something about TBs of indexes. I just wanted to point out that if you have really large indexes, with many documents, then you may want to look at facets sampling. I.e., instead of working hard to get exact counts, you can sample the result set and get an approximation to the top-K categories. You can then choose to either 'fully count the approximated top-K', or stick w/ their partial counts and display pctg (%) to the user. In fact, when the number of results is so big, think about the following result: A (456,873,234) A/1 (143,548,034) A/1 (137,323,452) These numbers are too big for a human to process the value behind them. Following the big numbers rule, these just denote "lots of results" to anyone. Rather, it may be better if it displayed A/1 (87%) and A/2 (85%). This is something you may want to consider too. Sampling improves the performance of faceted search, especially on large result sets. Displaying % counts clarifies the returned top-K categories better, IMO, to the common user. Shai On Tue, Jan 22, 2013 at 4:57 PM, Michael McCandless < luc...@mikemccandless.com> wrote: > On Mon, Jan 21, 2013 at 11:20 PM, Shai Erera <ser...@gmail.com> wrote: > > > (unfortunately, there's still no tool in Lucene to do that for you). > > I think we should open an issue to provide support for distributed > faceting? > > For example, we already provide support for distributed searching > (TopDocs.merge), and distributed grouping (TopGroups.merge) ... seems > like we should do the same for distributed faceting (even though its > somewhat tricky)? > > Mike McCandless > > http://blog.mikemccandless.com > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >