UNOFFICIAL Hi Kyle,
Actually the MultiDocValues method turned out to be about twice as fast as walking the docIds and sub-readers, which I wasn't expecting. I'm happy to post the code I used if you want to try both methods. Regards, Steve Stephen Gray Java Developer Border Midrange Systems Support Department of Immigration and Border Protection Phone: (02) 6223 9207 Mobile: 0419 885 959 -----Original Message----- From: Kyle Judson [mailto:kvjud...@hotmail.com] Sent: Sunday, 3 November 2013 12:35 AM To: java-user@lucene.apache.org Subject: Re: splitting docIds from a search by segment [SEC=UNOFFICIAL] Hi Steve, I'd appreciate knowing your results since I have a similar problem. Thanks Kyle On 10/30/13 8:44 PM, "Stephen GRAY" <stephen.g...@immi.gov.au> wrote: >UNOFFICIAL > >Hi Mike, > >Thanks for the helpful response. I'll try them both and see if any >performance imrpovement I get from the mre complicated method is worth >the extra complexity. > >Thanks, >Steve > >-----Original Message----- >From: Michael McCandless [mailto:luc...@mikemccandless.com] >Sent: Wednesday, 30 October 2013 9:57 PM >To: Lucene Users >Subject: Re: splitting docIds from a search by segment [SEC=UNOFFICIAL] > >You should try MultiDocValues first; it's trivial to use and may not be >horribly slow. > >It must do a binary-search for every docID lookup. > >And then if this is too slow, assuming you traverse the docIDs in >order, you can use IndexReader.leaves() to get the sub-readers. The >docIDs are just "appended" from these sub-readers, so you'd walk your >docIDs and also walk you sub-readers, moving to the next sub-reader >once you have a docID that's beyond its end. Each sub-reader spans >AtomicReaderContext.docBase to docBase + >AtomicReaderContext.reader.maxDoc(). > >Mike McCandless > >http://blog.mikemccandless.com > >On Wed, Oct 30, 2013 at 2:21 AM, Stephen GRAY ><stephen.g...@immi.gov.au> >wrote: >> UNOFFICIAL >> Hi everyone, >> >> I am trying to write an application that loops through 500,000 - >>1,000,000 documents returned by a search and calculates some >>statistics using the value in a stored field. Obviously this needs to >>be as fast as possible so I am using a NumericDocValues field to store the >>value. >> >> What I don't know is how to get the NumericDocValues value for each >>docId returned by the search. What I've been told to do in a previous >>thread was: >> >> 1. Split the docIds according to the segment they belong to >> >> 2. Get a per-segment NumericDocValues instance and use this to >>extract the values >> >> Can someone tell me how to do 1 and 2? I don't know how to discover >>what segment a given docId is in, or how to convert a segment into a >>NumericDocValues array. >> >> By the way it's also been suggested that I just use >>MultiDocValue.getNumericValues, but I gather that this will be much >>slower. >> >> I'd appreciate any help, >> >> Thanks, >> Steve >> >> UNOFFICIAL >> >> >> -------------------------------------------------------------------- >> Important Notice: If you have received this email by mistake, please >>advise the sender and delete the message and attachments immediately. >> This email, including attachments, may contain confidential, >>sensitive, legally privileged and/or copyright information. Any >>review, retransmission, dissemination or other use of this information >>by persons or entities other than the intended recipient is >>prohibited. DIAC respects your privacy and has obligations under the >>Privacy Act 1988. The official departmental privacy policy can be >>viewed on the department's website at www.immi.gov.au. See: >> http://www.immi.gov.au/functional/privacy.htm >> >> >> --------------------------------------------------------------------- >> > >--------------------------------------------------------------------- >To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >For additional commands, e-mail: java-user-h...@lucene.apache.org > > >UNOFFICIAL > >--------------------------------------------------------------------- >To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >For additional commands, e-mail: java-user-h...@lucene.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org UNOFFICIAL --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org