Nicolas,

Thank you for the reply! The problem that my categories are not static they
are generated at runtime, they are added and removed all the time. For that
reason I have no way to pre-generate the filters.

Dima

On 3/18/07, Nicolas Lalevée <[EMAIL PROTECTED]> wrote:

Le dimanche 18 mars 2007 06:55, Dima May a écrit:
>  <java-user@lucene.apache.org>
>
> I have a Lucene related questions/problem.
>
> My search results can potentially get very large 200,000+. I want to
> categorize my results. So for example if I have an indexed field "type"
> that has such things as CDs, books, videos, power drills, or anything
else
> in the world, I would want to display how many results are found for
each
> unique "type". So I may get 200 CDs and 3000 books and so on.
>
> The brute-force way of doing that is traversing the returned
> documents/results and counting each unique type. Unfortunately this
> solution is too slow. I am having trouble figuring out a faster approach
> and was hoping someone can suggest one.

The solution I use is using filters. So before doing the search, you have
to
already compute a pool a filters, one filter for each category.
Then use the search(Query, HitCollector), and use a collector which will
dispatch in different queues the results, depending of which filter
matched.
The collect function of the collector should look like that :

        public void collect(int doc, float score) {
            total++;
            for (int i = 0; i < filters.length; i++) {
                if (filters[i].get(doc)) {
                    DocScoreCategory docScoreCategory = new
DocScoreCategory(doc, score, i);
                    queue[i].insert(docScoreCategory);
                    subTotals[i]++;
                }
            }
        }

It is nearly as fast as a normal search but has the background that some
filters have to be computed, and it can take some time. In our application
updates are done not so often, so we can generate them in background. With
an
index permanently updated this will be very time consuming.

Nicolas

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Reply via email to