If you only want to gather the docIDs, this approach is rather wasteful, as it 1) computes the score of each hit, and 2) keeps a sorted queue of all these hits. It also pre-allocates a 1M sized array, which is wasteful if your index doesn't have 1M docs.
It's better to create your own subclass of Collector that eg appends each collected docID to a List or Array, or sets them in a bit set (if you expect more than 1/8th of your index may be returned by the query). Mike On Fri, Nov 20, 2009 at 10:17 AM, it99 <deswiatlow...@syrres.com> wrote: > > Thanks that helped a lot with the speed!! > I am getting same search results but with different docIds. Is this expected > and OK? Are they just arbitrar numbers > > If I changed from > Hits hits = mSearcher.search(query, filter); > > > To the following > TopDocCollector collector = new TopDocCollector(1000000); > mSearcher.search(query, filter,collector); > hits = collector.topDocs().scoreDocs; > > > > > > > > Ian Lea wrote: >> >> First queries are often slow and subsequent ones faster. Search the >> list for warming - I think there was something on it in the last >> couple of days. Or read the "When measuring performance, disregard >> the first query" bit of >> http://wiki.apache.org/lucene-java/ImproveSearchingSpeed >> >> A good number to pass to the Collector is however many docs you are >> going to be interested in. If you are just going to display the first >> 10, pass 10. >> >> >> -- >> Ian. >> >> >> On Thu, Nov 19, 2009 at 3:36 PM, it99 <deswiatlow...@syrres.com> wrote: >>> >>> What is the best way to iterate across all the documents in a search >>> results? >>> Previously I was using the deprecated Hits object but changed the >>> implentations as recommended in javadocs to ScoreDoc. >>> >>> I've tried the following but I've seen warning about peformance. >>> Seems the first time I query something it takes long time and then after >>> that it is quick. >>> >>> >>> >>> for (int i = 0; i < mNumberOfHits; i++) >>> { >>> >>> int docId = hits[i].doc; >>> Document doc = searcher.doc(docId); >>> } >>> >>> Here's the code for the search >>> What is good number to pass intot TopDocCollector? >>> >>> TopDocCollector collector = new TopDocCollector(1000000); >>> searcher.search(query, collector); >>> ScoreDoc[] hits = collector.topDocs().scoreDocs; >>> -- >>> View this message in context: >>> http://old.nabble.com/best-way-to-iterate-through-all-docs-from-a-query-tp26421373p26421373.html >>> Sent from the Lucene - Java Users mailing list archive at Nabble.com. >>> >>> >>> --------------------------------------------------------------------- >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >>> >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> > > -- > View this message in context: > http://old.nabble.com/best-way-to-iterate-through-all-docs-from-a-query-tp26421373p26442733.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org