Re: Re : Stop search process when a given number of hits is reached

Erick Erickson Sat, 09 Aug 2008 13:03:14 -0700

Ok, I'm not near any documentation now, but I think
throwing an exception is overkill. As I remember
all you have to do is return false from your collector
and that'll stop the search. But verify that.


Best
Erick

On Thu, Aug 7, 2008 at 12:00 PM, renou oki <[EMAIL PROTECTED]> wrote:

> Thanks a lot for your responses...
>
> I have tried the HitCollector and throw an exception when the limit of hits
> is reached...
> It works fine and the search time is really reduce when there is a lot of
> docs which are matching the query...
>
> I did that :
>
> public class CountCollector extends HitCollector{
>    public int cpt;
>    private int _maxHit;
>    public CountCollector(int maxHit)
>    {
>        cpt = 0;
>        _maxHit = maxHit
>    }
>    public void collect(int arg0, float arg1)
>    {
>        cpt++;
>        if (cpt > _max_Hit)
>        {
>            throw new LimitIsReachedException();
>        }
>    }
> }
>
> With a simple try catch, I catch the exception, and display "cpt" (the
> counter)...
>
> Best regards
>
>
>
>
>
> ----- Message d'origine ----
> De : Andrzej Bialecki <[EMAIL PROTECTED]>
> À : java-user@lucene.apache.org
> Envoyé le : Jeudi, 7 Août 2008, 14h29mn 31s
> Objet : Re: Stop search process when a given number of hits is reached
>
> Doron Cohen wrote:
> > Nothing built in that I'm aware of will do this, but it can be done by
> > searching with your own HitCollector.
> > There is a related feature - stop search after a specified time - using
> > TimeLimitedCollector.
> > It is not released yet, see issue LUCENE-997.
> > In short, the collector's collect() method is invoked in the search
> process
> > for each matching document.
> > Once 500 docs were collected, your collector can cause the search to stop
> by
> > throwing an exception.
> > Upon catching the exception you know that 500 docs were collected.
>
> Two additional comments:
>
> * the topN results from such incomplete search may be way off, if there
> were some high scoring documents somewhere beyond the limit.
>
> * if you know that there are more important and less important documents
> in your corpus, and their relative weight is independent of the query
> (e.g. PageRank-type score), then you can restructure your index so that
> postings belonging to highly-scoring documents come first on the posting
> lists - this way you have a better chance to collect highly relevant
> documents first, even though the search is incomplete. You can find an
> implementation of this concept in Nutch
> (org.apache.nutch.indexer.IndexSorter).
>
> --
> Best regards,
> Andrzej Bialecki     <><
>  ___. ___ ___ ___ _ _   __________________________________
> [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
> ___|||__||  \|  ||  |  Embedded Unix, System Integration
> http://www.sigram.com  Contact: info at sigram dot com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>
>
>
>  _____________________________________________________________________________
>  Envoyez avec Yahoo! Mail. Une boite mail plus intelligente
> http://mail.yahoo.fr
>

Re: Re : Stop search process when a given number of hits is reached

Reply via email to