Thanks a lot for your responses... I have tried the HitCollector and throw an exception when the limit of hits is reached... It works fine and the search time is really reduce when there is a lot of docs which are matching the query...
I did that : public class CountCollector extends HitCollector{ public int cpt; private int _maxHit; public CountCollector(int maxHit) { cpt = 0; _maxHit = maxHit } public void collect(int arg0, float arg1) { cpt++; if (cpt > _max_Hit) { throw new LimitIsReachedException(); } } } With a simple try catch, I catch the exception, and display "cpt" (the counter)... Best regards ----- Message d'origine ---- De : Andrzej Bialecki <[EMAIL PROTECTED]> À : java-user@lucene.apache.org Envoyé le : Jeudi, 7 Août 2008, 14h29mn 31s Objet : Re: Stop search process when a given number of hits is reached Doron Cohen wrote: > Nothing built in that I'm aware of will do this, but it can be done by > searching with your own HitCollector. > There is a related feature - stop search after a specified time - using > TimeLimitedCollector. > It is not released yet, see issue LUCENE-997. > In short, the collector's collect() method is invoked in the search process > for each matching document. > Once 500 docs were collected, your collector can cause the search to stop by > throwing an exception. > Upon catching the exception you know that 500 docs were collected. Two additional comments: * the topN results from such incomplete search may be way off, if there were some high scoring documents somewhere beyond the limit. * if you know that there are more important and less important documents in your corpus, and their relative weight is independent of the query (e.g. PageRank-type score), then you can restructure your index so that postings belonging to highly-scoring documents come first on the posting lists - this way you have a better chance to collect highly relevant documents first, even though the search is incomplete. You can find an implementation of this concept in Nutch (org.apache.nutch.indexer.IndexSorter). -- Best regards, Andrzej Bialecki <>< ___. ___ ___ ___ _ _ __________________________________ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _____________________________________________________________________________ Envoyez avec Yahoo! Mail. Une boite mail plus intelligente http://mail.yahoo.fr