UNOFFICIAL Hi Mike,
I ran it again and this time the two methods came out about the same: 168 - 288 ms to process 173,000 documents for the walking method and 160 - 205 ms for the MultiDocValues method . I don't know what was happening with my last test. Here is my code: if (docs.totalHits > 0) { int currentContextIndex = 0; List<AtomicReaderContext> leaves = searcher.getIndexReader().leaves(); AtomicReaderContext currentContext = leaves.get(currentContextIndex); NumericDocValues values = getNumericDocValues(currentContext, "responseTime"); for (ScoreDoc scoreDoc : docs.scoreDocs) { while (scoreDoc.doc >= (currentContext.docBase + currentContext.reader().maxDoc())) { currentContext = leaves.get(++currentContextIndex); values = getNumericDocValues(currentContext, "responseTime"); } int value = (int)values.get(scoreDoc.doc - currentContext.docBase); // do stuff } } private NumericDocValues getNumericDocValues(final AtomicReaderContext context, final String field) throws ProfileException { try { return context.reader().getNumericDocValues(field); } catch (IOException e) { throw new ProfileException("Unable to extract results from index for query 'read response times'.", e); } } Thanks for the tip on using a custom Collector. This is in Lucene in Action (great book by the way). Regards, Steve -----Original Message----- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Monday, 4 November 2013 10:30 AM To: Lucene Users Subject: Re: splitting docIds from a search by segment [SEC=UNOFFICIAL] It's very strange that you see faster performance using MultiDocValues: that simply should not be the case. Can you share your per-segment code? Also, it's rather inefficient to collect all hits by passing maxDoc as n to IndexSearcher.search; if you really just want the docIDs and you don't care about order it's better to make a custom Collector that simply appends the docID to an array/list. I believe Lucene in Action includes an example for this... (disclosure: I'm one of the authors). Mike McCandless http://blog.mikemccandless.com On Sun, Nov 3, 2013 at 5:37 PM, Stephen GRAY <stephen.g...@immi.gov.au> wrote: > UNOFFICIAL > > That's what I did. You just pass searcher.search a very large value for max > docs so you get them all, then iterate through the ScoreDoc[] array - the > docId is in scoreDoc.doc. > > Regards, > Steve > > -----Original Message----- > From: Kyle Judson [mailto:kvjud...@hotmail.com] > Sent: Sunday, 3 November 2013 12:37 AM > To: java-user@lucene.apache.org > Subject: Re: splitting docIds from a search by segment > [SEC=UNOFFICIAL] > > All, > > Is the best way to get the docIDs in a case like this to use > IndexSercher.search to get TopDocs and then get the ScoreDoc[] from > TopDocs.scoreDocs? > > Thanks > > Kyle > > > On 10/30/13 4:56 AM, "Michael McCandless" <luc...@mikemccandless.com> > wrote: > >>You should try MultiDocValues first; it's trivial to use and may not >>be horribly slow. >> >>It must do a binary-search for every docID lookup. >> >>And then if this is too slow, assuming you traverse the docIDs in >>order, you can use IndexReader.leaves() to get the sub-readers. The >>docIDs are just "appended" from these sub-readers, so you'd walk your >>docIDs and also walk you sub-readers, moving to the next sub-reader >>once you have a docID that's beyond its end. Each sub-reader spans >>AtomicReaderContext.docBase to docBase + >>AtomicReaderContext.reader.maxDoc(). >> >>Mike McCandless >> >>http://blog.mikemccandless.com >> >>On Wed, Oct 30, 2013 at 2:21 AM, Stephen GRAY >><stephen.g...@immi.gov.au> >>wrote: >>> UNOFFICIAL >>> Hi everyone, >>> >>> I am trying to write an application that loops through 500,000 - >>>1,000,000 documents returned by a search and calculates some >>>statistics using the value in a stored field. Obviously this needs to >>>be as fast as possible so I am using a NumericDocValues field to store the >>>value. >>> >>> What I don't know is how to get the NumericDocValues value for each >>>docId returned by the search. What I've been told to do in a previous >>>thread was: >>> >>> 1. Split the docIds according to the segment they belong to >>> >>> 2. Get a per-segment NumericDocValues instance and use this to >>>extract the values >>> >>> Can someone tell me how to do 1 and 2? I don't know how to discover >>>what segment a given docId is in, or how to convert a segment into a >>>NumericDocValues array. >>> >>> By the way it's also been suggested that I just use >>>MultiDocValue.getNumericValues, but I gather that this will be much >>>slower. >>> >>> I'd appreciate any help, >>> >>> Thanks, >>> Steve >>> >>> UNOFFICIAL >>> >>> >>> -------------------------------------------------------------------- >>> Important Notice: If you have received this email by mistake, please >>>advise the sender and delete the message and attachments immediately. >>>This email, including attachments, may contain confidential, >>>sensitive, legally privileged and/or copyright information. Any >>>review, retransmission, dissemination or other use of this >>>information by persons or entities other than the intended recipient >>>is prohibited. DIAC respects your privacy and has obligations under >>>the Privacy Act 1988. The official departmental privacy policy can >>>be viewed on the department's website at www.immi.gov.au. >>>See: >>> http://www.immi.gov.au/functional/privacy.htm >>> >>> >>> -------------------------------------------------------------------- >>> - >>> >> >>--------------------------------------------------------------------- >>To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > UNOFFICIAL > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org UNOFFICIAL --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org