OK then this is exactly what was happening with LUCENE-1573 -- Lucene was not properly handling the interrupt request, and instead of throwing an exception, entered a spin loop that starves the merge threads.
Can you try 2.9.0 (and turn back on your Future.cancel(true) call) and see if it's fixed? Mike On Fri, Oct 16, 2009 at 7:11 PM, Christopher Tignor <ctig...@thinkmap.com> wrote: > the doWait() call is synchronized on IndexWriter but it is also, as you > suggest in a loop in a block synchronized on IndexWriter. > > The doWait() call returns immediately, still holding the IndexWriter lock > from the loop in the synchornized block as my stack trace shows, without > blocking and giving the merger thread a chance to merge. It keeps repeating > this doWait procedure which unfortunately never actually waits and the > MergerThread is starved.... > > C>T> > > On Fri, Oct 16, 2009 at 6:53 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> I'm glad you worked around it! But I don't fully understand the >> issue. That doWait is inside a sync(writer) block... if the Future >> manages to interrupt it, then that thread will release the lock when >> it exits that sync block. >> >> Actually, if the thread was indeed interrupted, you may be hitting this: >> >> http://issues.apache.org/jira/browse/LUCENE-1573 >> >> (which is fixed in 2.9). >> >> Mike >> >> On Fri, Oct 16, 2009 at 6:01 PM, Christopher Tignor >> <ctig...@thinkmap.com> wrote: >> > I discovered the problem and fixed it's effect on my code: >> > >> > Using the source for Lucene version 2.4.1, >> > in IndexWriter.optimize() there is a call to doWait() on line 2283 >> > this method attempts to wait for a second in order to give other threads >> it >> > has spawned a chance to acquire it's mutex and complete index merges. >> > It checks if there are any merges left after it wakes up. if there >> aren't >> > int proceeds. >> > >> > However, using using a Future object associated with this thread and >> calling >> > cancel(true) doesn't allow the Thread to enter into Object.wait() (or the >> > executorservice observing this Future immediately wakes it up or >> something) >> > so it returns immediately and the thread holding the lock on the >> IndexWriter >> > never sceeds it's lock to threads running a merge. >> > >> > I re-wrote my code such that it didn't need to call the interruptible >> > version of Future.cancel() to solve it, i.e. Future.cancel(false) >> > >> > thanks, >> > >> > C>T> >> > >> > On Fri, Oct 16, 2009 at 4:44 PM, Christopher Tignor < >> ctig...@thinkmap.com>wrote: >> > >> >> Indeed it looks like the thread running MergerThread started (After >> passing >> >> off to ConcurentMergeScheduler) by the thread calling >> IndexWriter.optimize() >> >> is indeed waiting on the mutex for the IndexWriter to be free so it can >> use >> >> the object to call mergeInit(). >> >> >> >> The IndexWriter however has entered a synchronized() waiting loop, >> waking >> >> up every second ( in doWait() ) and checking if there are running merges >> >> left - which of course there are as the thread responsible for doing the >> >> merging can't get in. Deadlocked stack traces are below: >> >> >> >> Thread [pool-2-thread-5] (Suspended) >> >> owns: IndexWriter (id=71) >> >> owns: LuceneResultsPersister (id=85) >> >> IndexWriter.optimize(int, boolean) line: 2283 >> >> IndexWriter.optimize(boolean) line: 2218 >> >> IndexWriter.optimize() line: 2198 >> >> LuceneResultsPersister.commit() line: 97 >> >> PersistenceJobQueue.persistAndCommitBatch(PersistenceJobConsumer) >> line: >> >> 105 >> >> PersistenceJobConsumer.consume() line: 46 >> >> PersistenceJobConsumer.run() line: 67 >> >> Executors$RunnableAdapter<T>.call() line: 441 >> >> FutureTask$Sync.innerRun() line: 303 >> >> FutureTask<V>.run() line: 138 >> >> ThreadPoolExecutor$Worker.runTask(Runnable) line: 886 >> >> ThreadPoolExecutor$Worker.run() line: 908 >> >> Thread.run() line: 619 >> >> >> >> Daemon Thread [Lucene Merge Thread #0] (Suspended) >> >> waiting for: IndexWriter (id=71) >> >> IndexWriter.mergeInit(MergePolicy$OneMerge) line: 3971 >> >> IndexWriter.merge(MergePolicy$OneMerge) line: 3879 >> >> ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge) line: 205 >> >> ConcurrentMergeScheduler$MergeThread.run() line: 260 >> >> >> >> I don't really understand how this code is supposed to (and has) worked >> >> before this and what the problem thus might be here: >> >> >> >> In IndexWriter.optimize() at line 2263 we have the synchronized block >> where >> >> doWait becomes true using the parameter-less call to optimize(): >> >> >> >> if (doWait) { >> >> synchronized(this) { >> >> while(true) { >> >> if (mergeExceptions.size() > 0) { >> >> // Forward any exceptions in background merge >> >> // threads to the current thread: >> >> final int size = mergeExceptions.size(); >> >> for(int i=0;i<size;i++) { >> >> final MergePolicy.OneMerge merge = (MergePolicy.OneMerge) >> >> mergeExceptions.get(0); >> >> if (merge.optimize) { >> >> IOException err = new IOException("background merge hit >> >> exception: " + merge.segString(directory)); >> >> final Throwable t = merge.getException(); >> >> if (t != null) >> >> err.initCause(t); >> >> throw err; >> >> } >> >> } >> >> } >> >> >> >> if (optimizeMergesPending()) >> >> doWait(); >> >> else >> >> break; >> >> } >> >> } >> >> >> >> It is holding this lock while the thread it started to do the mergin is >> >> trying to call it's mergeInit() method. >> >> >> >> Any thoughts? >> >> >> >> thanks, >> >> >> >> C>T> >> >> >> >> >> >> On Fri, Oct 16, 2009 at 4:11 PM, Christopher Tignor < >> ctig...@thinkmap.com>wrote: >> >> >> >>> After tracing through the lucene source more it seems that what is >> >>> happening is after I call Future.cancel(true) on my parent thread, >> >>> optimize() is called and this method launches it's own thread using a >> >>> ConcurrentMergeScheduler$MergeThread to do the actual merging. >> >>> >> >>> When this Thread comes around to calling mergeInit() on my index writer >> - >> >>> a synchronized method - it hangs. For some reason it seems to no long >> have >> >>> the mutex perhaps? Trace of this thread's stall below... >> >>> >> >>> Daemon Thread [Lucene Merge Thread #0] (Suspended) >> >>> IndexWriter.mergeInit(MergePolicy$OneMerge) line: 3971 >> >>> IndexWriter.merge(MergePolicy$OneMerge) line: 3879 >> >>> ConcurrentMergeScheduler.doMerge(MergePolicy$OneMerge) line: 205 >> >>> ConcurrentMergeScheduler$MergeThread.run() line: 260 >> >>> >> >>> thanks again, >> >>> >> >>> C>T> >> >>> >> >>> >> >>> >> >>> On Fri, Oct 16, 2009 at 3:53 PM, Michael McCandless < >> >>> luc...@mikemccandless.com> wrote: >> >>> >> >>>> But if the Future.cancel call turns out to be a no-op (simply waits >> >>>> instead of interrupting the thread), how could it be that the deadlock >> >>>> only happens when you call it? Weird. Are you really sure it's not >> >>>> actually calling Thread.interrupt? >> >>>> >> >>>> That stack trace looks like a normal "optimize is waiting for the >> >>>> background merges to complete". Is it possible your background merges >> >>>> are hitting exceptions? You should see them on your error console if >> >>>> so... >> >>>> >> >>>> Mike >> >>>> >> >>>> On Fri, Oct 16, 2009 at 3:17 PM, Christopher Tignor >> >>>> <ctig...@thinkmap.com> wrote: >> >>>> > It doesn't look like my Future.cancel(true) is actually interrupting >> >>>> the >> >>>> > thread. It only does so "if necessary" and in this case seems to be >> >>>> letting >> >>>> > the Thread finish gracefully without need for interruption. >> >>>> > >> >>>> > The stack trace leading up to the hanging IndexWriter.optimize() >> method >> >>>> is >> >>>> > below, though not terribly useful I imagine. Here only 5 documents >> are >> >>>> > trying to be added to an index taking up only 39.2 MB on disk. >> Again >> >>>> this >> >>>> > deadlocking only happens after I use the Future for this task to >> cancel >> >>>> the >> >>>> > task... >> >>>> > >> >>>> > Thread [pool-3-thread-5] (Suspended) >> >>>> > IndexWriter.doWait() line: 4494 >> >>>> > IndexWriter.optimize(int, boolean) line: 2283 >> >>>> > IndexWriter.optimize(boolean) line: 2218 >> >>>> > IndexWriter.optimize() line: 2198 >> >>>> > LuceneResultsPersister.commit() line: 97 >> >>>> > PersistenceJobQueue.persistAndCommitBatch(PersistenceJobConsumer) >> >>>> line: >> >>>> > 105 >> >>>> > PersistenceJobConsumer.consume() line: 46 >> >>>> > PersistenceJobConsumer.run() line: 67 >> >>>> > Executors$RunnableAdapter<T>.call() line: 441 >> >>>> > FutureTask$Sync.innerRun() line: 303 >> >>>> > FutureTask<V>.run() line: 138 >> >>>> > ThreadPoolExecutor$Worker.runTask(Runnable) line: 886 >> >>>> > ThreadPoolExecutor$Worker.run() line: 908 >> >>>> > Thread.run() line: 619 >> >>>> > >> >>>> > thanks, >> >>>> > >> >>>> > C>T> >> >>>> > >> >>>> > On Fri, Oct 16, 2009 at 1:58 PM, Michael McCandless < >> >>>> > luc...@mikemccandless.com> wrote: >> >>>> > >> >>>> >> My guess is it's the invocation of Thread.interrupt (which >> >>>> >> Future.cancel(true) calls if the task is running) that lead to the >> >>>> >> deadlock. >> >>>> >> >> >>>> >> Is it possible to get the stack trace of the thrown exception when >> the >> >>>> >> thread was interrupted? Maybe indeed something in IW isn't >> cleaning >> >>>> >> up its state on being interrupted. >> >>>> >> >> >>>> >> Mike >> >>>> >> >> >>>> >> On Fri, Oct 16, 2009 at 1:43 PM, Christopher Tignor >> >>>> >> <ctig...@thinkmap.com> wrote: >> >>>> >> > thanks for getting back. >> >>>> >> > >> >>>> >> > I do not lock on the IndexWriter object itself but all methods in >> my >> >>>> >> > consumer class that use IndexWriter are synchronized (locking my >> >>>> >> singleton >> >>>> >> > consumer object itself). >> >>>> >> > The thread is waiting at IndexWriter.doWait(). What might cuase >> >>>> this? >> >>>> >> > >> >>>> >> > thanks - >> >>>> >> > >> >>>> >> > C>T> >> >>>> >> > >> >>>> >> > On Fri, Oct 16, 2009 at 12:58 PM, Uwe Schindler <u...@thetaphi.de >> > >> >>>> wrote: >> >>>> >> > >> >>>> >> >> Do you use the IndexWriter as mutex in a synchronized() block? >> This >> >>>> is >> >>>> >> not >> >>>> >> >> supported and may hang. Never lock on IndexWriter instances. >> >>>> IndexWriter >> >>>> >> >> itself is thread safe. >> >>>> >> >> >> >>>> >> >> ----- >> >>>> >> >> Uwe Schindler >> >>>> >> >> H.-H.-Meier-Allee 63, D-28213 Bremen >> >>>> >> >> http://www.thetaphi.de >> >>>> >> >> eMail: u...@thetaphi.de >> >>>> >> >> >> >>>> >> >> > -----Original Message----- >> >>>> >> >> > From: Christopher Tignor [mailto:ctig...@thinkmap.com] >> >>>> >> >> > Sent: Friday, October 16, 2009 6:50 PM >> >>>> >> >> > To: java-user >> >>>> >> >> > Subject: IndexWriter optimize() deadlock >> >>>> >> >> > >> >>>> >> >> > Hello, >> >>>> >> >> > >> >>>> >> >> > I am trying to track down the cause of my code hanging on >> calling >> >>>> >> >> > IndexWriter.optimize() at its doWait() method. >> >>>> >> >> > It appears, thus that it is watiing on other merges to happen >> >>>> which is >> >>>> >> a >> >>>> >> >> > bit >> >>>> >> >> > confusing to me: >> >>>> >> >> > >> >>>> >> >> > My application is a simple producer consumer model where >> >>>> documents are >> >>>> >> >> > added >> >>>> >> >> > to a queue by producers and then one consumer with one >> >>>> indexwriter >> >>>> >> (the >> >>>> >> >> > only >> >>>> >> >> > in the application) periodically calls addDocument() on a >> batch >> >>>> of >> >>>> >> these >> >>>> >> >> > jobs and then calls optimize(), commit(). and then close(). >> >>>> There is >> >>>> >> >> only >> >>>> >> >> > one thread running the consumer so I am confused as to how the >> >>>> >> >> indexwriter >> >>>> >> >> > might be deadlocking itself. Indeed this is the only thread >> >>>> active >> >>>> >> when >> >>>> >> >> > the >> >>>> >> >> > deadlock occurs so it seems to be a problem of reentry. >> >>>> >> >> > >> >>>> >> >> > Importantly, the deadlocking occurs only when the thread is >> >>>> trying to >> >>>> >> >> > shutdown - that is the Thread running this lucene consumer has >> a >> >>>> >> Future >> >>>> >> >> > that >> >>>> >> >> > has had its cancel(true) interrupting method called. Is it >> >>>> possible >> >>>> >> that >> >>>> >> >> > an >> >>>> >> >> > internal Lucene lock is obtained during addDocument() and on >> >>>> >> interruption >> >>>> >> >> > is >> >>>> >> >> > never released so the subsequent optimize() call hangs? This >> >>>> doesn't >> >>>> >> >> > appear >> >>>> >> >> > to be happening... >> >>>> >> >> > >> >>>> >> >> > Any help appreciated. >> >>>> >> >> > >> >>>> >> >> > thanks, >> >>>> >> >> > >> >>>> >> >> > C>T> >> >>>> >> >> > >> >>>> >> >> > what might I be missing here? >> >>>> >> >> > >> >>>> >> >> > -- >> >>>> >> >> > TH!NKMAP >> >>>> >> >> > >> >>>> >> >> > Christopher Tignor | Senior Software Architect >> >>>> >> >> > 155 Spring Street NY, NY 10012 >> >>>> >> >> > p.212-285-8600 x385 f.212-285-8999 >> >>>> >> >> >> >>>> >> >> >> >>>> >> >> >> >>>> --------------------------------------------------------------------- >> >>>> >> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >>>> >> >> For additional commands, e-mail: >> java-user-h...@lucene.apache.org >> >>>> >> >> >> >>>> >> >> >> >>>> >> > >> >>>> >> > >> >>>> >> > -- >> >>>> >> > TH!NKMAP >> >>>> >> > >> >>>> >> > Christopher Tignor | Senior Software Architect >> >>>> >> > 155 Spring Street NY, NY 10012 >> >>>> >> > p.212-285-8600 x385 f.212-285-8999 >> >>>> >> > >> >>>> >> >> >>>> >> >> --------------------------------------------------------------------- >> >>>> >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >>>> >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >>>> >> >> >>>> >> >> >>>> > >> >>>> > >> >>>> > -- >> >>>> > TH!NKMAP >> >>>> > >> >>>> > Christopher Tignor | Senior Software Architect >> >>>> > 155 Spring Street NY, NY 10012 >> >>>> > p.212-285-8600 x385 f.212-285-8999 >> >>>> > >> >>>> >> >>>> --------------------------------------------------------------------- >> >>>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> >>>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >>>> >> >>>> >> >>> >> >>> >> >>> -- >> >>> TH!NKMAP >> >>> >> >>> Christopher Tignor | Senior Software Architect >> >>> 155 Spring Street NY, NY 10012 >> >>> p.212-285-8600 x385 f.212-285-8999 >> >>> >> >> >> >> >> >> >> >> -- >> >> TH!NKMAP >> >> >> >> Christopher Tignor | Senior Software Architect >> >> 155 Spring Street NY, NY 10012 >> >> p.212-285-8600 x385 f.212-285-8999 >> >> >> > >> > >> > >> > -- >> > TH!NKMAP >> > >> > Christopher Tignor | Senior Software Architect >> > 155 Spring Street NY, NY 10012 >> > p.212-285-8600 x385 f.212-285-8999 >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > > -- > TH!NKMAP > > Christopher Tignor | Senior Software Architect > 155 Spring Street NY, NY 10012 > p.212-285-8600 x385 f.212-285-8999 > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org