Ok, I'm not near any documentation now, but I think throwing an exception is overkill. As I remember all you have to do is return false from your collector and that'll stop the search. But verify that.
Best Erick On Thu, Aug 7, 2008 at 12:00 PM, renou oki <[EMAIL PROTECTED]> wrote: > Thanks a lot for your responses... > > I have tried the HitCollector and throw an exception when the limit of hits > is reached... > It works fine and the search time is really reduce when there is a lot of > docs which are matching the query... > > I did that : > > public class CountCollector extends HitCollector{ > public int cpt; > private int _maxHit; > public CountCollector(int maxHit) > { > cpt = 0; > _maxHit = maxHit > } > public void collect(int arg0, float arg1) > { > cpt++; > if (cpt > _max_Hit) > { > throw new LimitIsReachedException(); > } > } > } > > With a simple try catch, I catch the exception, and display "cpt" (the > counter)... > > Best regards > > > > > > ----- Message d'origine ---- > De : Andrzej Bialecki <[EMAIL PROTECTED]> > À : java-user@lucene.apache.org > Envoyé le : Jeudi, 7 Août 2008, 14h29mn 31s > Objet : Re: Stop search process when a given number of hits is reached > > Doron Cohen wrote: > > Nothing built in that I'm aware of will do this, but it can be done by > > searching with your own HitCollector. > > There is a related feature - stop search after a specified time - using > > TimeLimitedCollector. > > It is not released yet, see issue LUCENE-997. > > In short, the collector's collect() method is invoked in the search > process > > for each matching document. > > Once 500 docs were collected, your collector can cause the search to stop > by > > throwing an exception. > > Upon catching the exception you know that 500 docs were collected. > > Two additional comments: > > * the topN results from such incomplete search may be way off, if there > were some high scoring documents somewhere beyond the limit. > > * if you know that there are more important and less important documents > in your corpus, and their relative weight is independent of the query > (e.g. PageRank-type score), then you can restructure your index so that > postings belonging to highly-scoring documents come first on the posting > lists - this way you have a better chance to collect highly relevant > documents first, even though the search is incomplete. You can find an > implementation of this concept in Nutch > (org.apache.nutch.indexer.IndexSorter). > > -- > Best regards, > Andrzej Bialecki <>< > ___. ___ ___ ___ _ _ __________________________________ > [__ || __|__/|__||\/| Information Retrieval, Semantic Web > ___|||__|| \| || | Embedded Unix, System Integration > http://www.sigram.com Contact: info at sigram dot com > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > > > _____________________________________________________________________________ > Envoyez avec Yahoo! Mail. Une boite mail plus intelligente > http://mail.yahoo.fr >