It looks like merging is running too slowly in your environment, relative to indexing; all of your indexing threads are stuck wanting to launch a new merge but there's already the max allowed (1) concurrent merge running and so IW (intentionally) stalls them.
Are you sure you passed 2 for numThreads to ThreadedIndexWriter? I see 5 threads in your stack traces, I think. What kind of IO system do you have? Can you setInfoStream in the IW and post the output? Could give more clues... Mike McCandless http://blog.mikemccandless.com On Wed, Aug 3, 2011 at 10:16 AM, Pierre-Henri Toussaint <pierrehenri.toussa...@gmail.com> wrote: > Hi All, > > I'm currently testing the new DocumentsWriterPerThread in Lucene 4.0.0 > (latest build). I use the wikipedia full english article dump as a source > for indexing and the ThreadedIndexWriter implementation proposed in LIA to > achieve concurrent indexing. > Indexing performance seems good at the beginning (good throughput, cpu > maxed, no drop while flushing!!), but then drop suddenly to nearly null > after 20 minutes. > > See cpu usage drop bellow. IO is very low as well, and nothing get indexed > from few minutes to an hour : > http://lucene.472066.n3.nabble.com/file/n3222427/merge_problem_2.png > > After investigations, it appears that the performance drop while the merge > occurs. I did a thread dump (see bellow) and it's pretty clear that > everything wait for the merge worker to complete, which can take a long > time. > > My config : > -RAMBufferSizeMB:256mb > -Indexer thread pool : 2/5 using the implementation proposed in LIA > (ThreadedIndexWriter) > -NIOFSDirectory > -ConcurrentMergeScheduler, using default settings > -osx10/core2duo@2.4ghz/4GBram > > What could cause this problem ? > I use no external synchronisation around IndexWriter (as explained in > javadoc). I tried w > > Thanks in advance for taking your time. > > /Full thread dump Java HotSpot(TM) 64-Bit Server VM (19.1-b02-334 mixed > mode): > > "pool-1-thread-454645" prio=5 tid=101940000 nid=0x10cf90000 in Object.wait() > [10cf8f000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at java.lang.Object.wait(Object.java:485) > at > org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:319) > - locked <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1943) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1938) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1934) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1548) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1300) > at lia.admin.ThreadedIndexWriter.access$0(ThreadedIndexWriter.java:1) > at lia.admin.ThreadedIndexWriter$Job.run(ThreadedIndexWriter.java:52) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > > "pool-1-thread-451632" prio=5 tid=10193b000 nid=0x10ce8d000 in Object.wait() > [10ce8c000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at java.lang.Object.wait(Object.java:485) > at > org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:319) > - locked <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1943) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1938) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1934) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1548) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1300) > at lia.admin.ThreadedIndexWriter.access$0(ThreadedIndexWriter.java:1) > at lia.admin.ThreadedIndexWriter$Job.run(ThreadedIndexWriter.java:52) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > > "pool-1-thread-442636" prio=5 tid=10194e000 nid=0x10bef1000 in Object.wait() > [10bef0000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at java.lang.Object.wait(Object.java:485) > at > org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:319) > - locked <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1943) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1938) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1934) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1548) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1300) > at lia.admin.ThreadedIndexWriter.access$0(ThreadedIndexWriter.java:1) > at lia.admin.ThreadedIndexWriter$Job.run(ThreadedIndexWriter.java:52) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > > "pool-1-thread-427357" prio=5 tid=102b0d000 nid=0x10d901000 in Object.wait() > [10d900000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at java.lang.Object.wait(Object.java:485) > at > org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:319) > - locked <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1943) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1938) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1934) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1548) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1300) > at lia.admin.ThreadedIndexWriter.access$0(ThreadedIndexWriter.java:1) > at lia.admin.ThreadedIndexWriter$Job.run(ThreadedIndexWriter.java:52) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > > "Lucene Merge Thread #4" daemon prio=6 tid=102ae6000 nid=0x10bdee000 in > Object.wait() [10bded000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <7d0a16870> (a > org.apache.lucene.index.MergePolicy$OneMerge) > at > org.apache.lucene.index.MergePolicy$OneMerge.checkAborted(MergePolicy.java:132) > - locked <7d0a16870> (a org.apache.lucene.index.MergePolicy$OneMerge) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3566) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3205) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447) > > "pool-1-thread-412998" prio=5 tid=10193d800 nid=0x10cc87000 in Object.wait() > [10cc86000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at java.lang.Object.wait(Object.java:485) > at > org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:319) > - locked <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1943) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1938) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1934) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1548) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1300) > at lia.admin.ThreadedIndexWriter.access$0(ThreadedIndexWriter.java:1) > at lia.admin.ThreadedIndexWriter$Job.run(ThreadedIndexWriter.java:52) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > > "Lucene Merge Thread #3" daemon prio=6 tid=102b0b800 nid=0x10c901000 in > Object.wait() [10c900000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <7c0222a80> (a > org.apache.lucene.index.MergePolicy$OneMerge) > at > org.apache.lucene.index.MergePolicy$OneMerge.checkAborted(MergePolicy.java:132) > - locked <7c0222a80> (a org.apache.lucene.index.MergePolicy$OneMerge) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3566) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3205) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447) > > "Lucene Merge Thread #2" daemon prio=6 tid=102adc000 nid=0x10cd8a000 in > Object.wait() [10cd89000] > java.lang.Thread.State: TIMED_WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <7bd7faeb8> (a > org.apache.lucene.index.MergePolicy$OneMerge) > at > org.apache.lucene.index.MergePolicy$OneMerge.checkAborted(MergePolicy.java:132) > - locked <7bd7faeb8> (a org.apache.lucene.index.MergePolicy$OneMerge) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3515) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3205) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447) > > "Lucene Merge Thread #1" daemon prio=6 tid=102ae0000 nid=0x10ca81000 > runnable [10ca80000] > java.lang.Thread.State: RUNNABLE > at sun.nio.ch.FileDispatcher.pread0(Native Method) > at sun.nio.ch.FileDispatcher.pread(FileDispatcher.java:31) > at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:199) > at sun.nio.ch.IOUtil.read(IOUtil.java:175) > at sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:612) > at > org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:215) > at > org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:237) > at > org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:50) > at org.apache.lucene.store.DataInput.readVInt(DataInput.java:94) > at > org.apache.lucene.store.BufferedIndexInput.readVInt(BufferedIndexInput.java:205) > at > org.apache.lucene.index.codecs.standard.StandardPostingsReader$SegmentDocsAndPositionsEnum.nextDoc(StandardPostingsReader.java:499) > at > org.apache.lucene.index.codecs.MappingMultiDocsAndPositionsEnum.nextDoc(MappingMultiDocsAndPositionsEnum.java:84) > at > org.apache.lucene.index.codecs.PostingsConsumer.merge(PostingsConsumer.java:79) > at > org.apache.lucene.index.codecs.TermsConsumer.merge(TermsConsumer.java:112) > at > org.apache.lucene.index.codecs.FieldsConsumer.merge(FieldsConsumer.java:53) > at > org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:566) > at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:135) > at > org.apache.lucene.index.IndexWriter.mergeMiddle(IndexWriter.java:3569) > at org.apache.lucene.index.IndexWriter.merge(IndexWriter.java:3205) > at > org.apache.lucene.index.ConcurrentMergeScheduler.doMerge(ConcurrentMergeScheduler.java:379) > at > org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:447) > > "Surrogate Locker Thread (CMS)" daemon prio=5 tid=10284a800 nid=0x10af01000 > waiting on condition [00000000] > java.lang.Thread.State: RUNNABLE > > "Finalizer" daemon prio=8 tid=102842000 nid=0x10aa8a000 in Object.wait() > [10aa89000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <7bd7c43e8> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:118) > - locked <7bd7c43e8> (a java.lang.ref.ReferenceQueue$Lock) > at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:134) > at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:159) > > "Reference Handler" daemon prio=10 tid=102841000 nid=0x10a987000 in > Object.wait() [10a986000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <7bd7ca638> (a java.lang.ref.Reference$Lock) > at java.lang.Object.wait(Object.java:485) > at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:116) > - locked <7bd7ca638> (a java.lang.ref.Reference$Lock) > > "main" prio=5 tid=101801000 nid=0x100601000 in Object.wait() [1005ff000] > java.lang.Thread.State: WAITING (on object monitor) > at java.lang.Object.wait(Native Method) > - waiting on <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at java.lang.Object.wait(Object.java:485) > at > org.apache.lucene.index.ConcurrentMergeScheduler.merge(ConcurrentMergeScheduler.java:319) > - locked <7bd7dfd10> (a > org.apache.lucene.index.ConcurrentMergeScheduler) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1943) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1938) > at > org.apache.lucene.index.IndexWriter.maybeMerge(IndexWriter.java:1934) > at > org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1548) > at > org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1300) > at lia.admin.ThreadedIndexWriter.access$0(ThreadedIndexWriter.java:1) > at lia.admin.ThreadedIndexWriter$Job.run(ThreadedIndexWriter.java:52) > at > java.util.concurrent.ThreadPoolExecutor$CallerRunsPolicy.rejectedExecution(ThreadPoolExecutor.java:1746) > at > java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:767) > at > java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:658) > at > lia.admin.ThreadedIndexWriter.addDocument(ThreadedIndexWriter.java:82) > at org.pache.soundcloud.wikisearch.Indexer.addContact(Indexer.java:69) > at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.commons.beanutils.MethodUtils.invokeMethod(MethodUtils.java:282) > at org.apache.commons.digester.SetNextRule.end(SetNextRule.java:220) > at org.apache.commons.digester.Rule.end(Rule.java:257) > at org.apache.commons.digester.Digester.endElement(Digester.java:1345) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:601) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1782) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2938) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:648) > at > com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:511) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:808) > at > com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:737) > at > com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:119) > at > com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1205) > at > com.sun.org.apache.xerces.internal.jaxp.SAXParserImpl$JAXPSAXParser.parse(SAXParserImpl.java:522) > at org.apache.commons.digester.Digester.parse(Digester.java:1871) > at org.pache.soundcloud.wikisearch.Indexer.main(Indexer.java:119) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:58)/ > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Thread-locking-and-extreme-low-perfs-while-merging-ConcurrentMergeScheduler-issue-tp3222427p3222427.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org