Nicolas, Thank you for the reply! The problem that my categories are not static they are generated at runtime, they are added and removed all the time. For that reason I have no way to pre-generate the filters.
Dima On 3/18/07, Nicolas Lalevée <[EMAIL PROTECTED]> wrote:
Le dimanche 18 mars 2007 06:55, Dima May a écrit: > <java-user@lucene.apache.org> > > I have a Lucene related questions/problem. > > My search results can potentially get very large 200,000+. I want to > categorize my results. So for example if I have an indexed field "type" > that has such things as CDs, books, videos, power drills, or anything else > in the world, I would want to display how many results are found for each > unique "type". So I may get 200 CDs and 3000 books and so on. > > The brute-force way of doing that is traversing the returned > documents/results and counting each unique type. Unfortunately this > solution is too slow. I am having trouble figuring out a faster approach > and was hoping someone can suggest one. The solution I use is using filters. So before doing the search, you have to already compute a pool a filters, one filter for each category. Then use the search(Query, HitCollector), and use a collector which will dispatch in different queues the results, depending of which filter matched. The collect function of the collector should look like that : public void collect(int doc, float score) { total++; for (int i = 0; i < filters.length; i++) { if (filters[i].get(doc)) { DocScoreCategory docScoreCategory = new DocScoreCategory(doc, score, i); queue[i].insert(docScoreCategory); subTotals[i]++; } } } It is nearly as fast as a normal search but has the background that some filters have to be computed, and it can take some time. In our application updates are done not so often, so we can generate them in background. With an index permanently updated this will be very time consuming. Nicolas --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]