Hi Mike, We did use IndexWriter::setInfoStream. Apparently there is a lot to sift through.
I'll let you know if we make any discoveries useful for others. Thanks! Justin ----- Original Message ---- From: Michael McCandless <luc...@mikemccandless.com> To: java-user@lucene.apache.org Sent: Thu, June 24, 2010 4:04:47 AM Subject: Re: Problems with homebrew ParallelWriter I agree w/ Shai -- from your description it looks like your docs should be in sync (assuming no exceptions, and a serial doc/del stream going in). If you turn on infoStream for all the writers & post the results, we can look for where they diverge... Mike On Wed, Jun 23, 2010 at 11:48 PM, Shai Erera <ser...@gmail.com> wrote: > How do you add documents to the index? Is it synchronized (such that > basically only one thread can add documents at a time)? > The same goes for removing documents as well. > > Also, did you encounter any exceptions during the run - if say an addDoc > fails on one of the slices, then you need to revert that addDoc in all > previous slices ... > > I remember running into such exception when working on the Parallel Index > stuff, but I don't remember what caused it ... > > About merging, note that if you use LogDocMP, then you can guarantee that > all slices will be in sync, but still some merges could happen on some > slices not when you intended them to happen. For example, during a flush of > one addDoc on one of the slices, before the others addDoc finished. But if > you didn't see any exceptions and didn't terminate the process mid-action, > then this should not happen ... > > I hope this helps. Unfortunately I had to shift focus from LUCENE-1879. > Perhaps I'll get back to it one day. But if you advanced on PI somehow, > perhaps you can diff the patch that's there and your code, and if you've > made progress, upload another patch? > > Shai > > On Thu, Jun 24, 2010 at 1:44 AM, Justin <cry...@yahoo.com> wrote: > >> Hi all, >> >> We've been waiting for LUCENE-1879 and LUCENE-2425 and have written our own >> ParallelWriter class in the meantime. Apparently our indexes are falling >> out of sync (I suspect my colleague is seeing error messages come from >> ParallelReader stating the the number of documents must be the same). >> >> Here's a code snippet from our ParallelWriter which extends Object: >> >> writer1 = new IndexWriter(dir, analyzer, >> create, >> >> new IndexWriter.MaxFieldLength(MFL)); >> >> writer1.setMergePolicy(new LogDocMergePolicy()); >> >> writer1.setMergeScheduler(new SerialMergeScheduler()); >> >> writer1.setMaxBufferedDocs(MBD); >> >> writer1.setRAMBufferSizeMB(IndexWriter.DISABLE_AUTO_FLUSH); >> >> My colleague suspects that merging or flushing is being triggered on >> something other than the doc count which leads to the writers' different >> behaviors. I suspect our next step is to scatter breakpoints around Lucene >> source (we've got tr...@926791 to take advantage of latest NRT readers). >> >> Does anyone have ideas on how the indexes would get out of sync? Process >> close, committing, optimizing,... they all should work okay? >> >> Thanks, >> Justin >> >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org