OK so the problem definitely comes from the slow merging. I slightly increased the number merge count and thread to avoid the problem described previously. But as expected, it just delayed it !
results : 75 minutes to index the 33GB xml file, and 150 minutes to finish the merge after indexer.close. See uploaded http://lucene.472066.n3.nabble.com/file/n3223874/slowmerge log file containing: logs (timems:numberofdocsindexed/current_title) + infoStream + random threaddump. You can spot "indexer.close (no optimize)" (line 5721) for indexing completion and the beginning of merging nightmare. *conf : */conf.setRAMBufferSizeMB(512); ConcurrentMergeScheduler mergeScheduler = new ConcurrentMergeScheduler(); mergeScheduler.setMaxMergeCount(6); mergeScheduler.setMaxThreadCount(4); conf.setMergeScheduler(mergeScheduler); writer = new ThreadedIndexWriter(directory, analyzer, true, 2, 5, conf);/ >>everything else default. no optimize called *documents : */pageDocument.add(new Field("title", page.getTitle(), Field.Store.YES, Field.Index.NO)); pageDocument.add(new Field("text", page.getText(), Field.Store.NO, Field.Index.ANALYZED)); if (page.getContributorUserName() != null) pageDocument.add(new Field("contributorUserName", page.getContributorUserName(), Field.Store.NO, Field.Index.ANALYZED));/ *infoStream info :* setInfoStream deletionPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy@2dafae45 dir=org.apache.lucene.store.NIOFSDirectory@/Users/ptoussaint/Documents/workspace/wikisearch/index2 lockFactory=org.apache.lucene.store.NativeFSLockFactory@39dd3812 index= version=4.0-SNAPSHOT matchVersion=LUCENE_40 analyzer=org.pache.soundcloud.wikisearch.Indexer$WikiAnalyzer delPolicy=org.apache.lucene.index.KeepOnlyLastCommitDeletionPolicy commit=null openMode=CREATE_OR_APPEND similarityProvider=org.apache.lucene.search.DefaultSimilarityProvider termIndexInterval=32 mergeScheduler=org.apache.lucene.index.ConcurrentMergeScheduler default WRITE_LOCK_TIMEOUT=1000 writeLockTimeout=1000 maxBufferedDeleteTerms=-1 ramBufferSizeMB=512.0 maxBufferedDocs=-1 mergedSegmentWarmer=null codecProvider=org.apache.lucene.index.codecs.CoreCodecProvider@6a8c436b mergePolicy=[TieredMergePolicy: maxMergeAtOnce=10, maxMergeAtOnceExplicit=30, maxMergedSegmentMB=5120.0, floorSegmentMB=2.0, expungeDeletesPctAllowed=10.0, segmentsPerTier=10.0, useCompoundFile=true, noCFSRatio=0.1 indexerThreadPool=org.apache.lucene.index.ThreadAffinityDocumentsWriterThreadPool@1e9e5c73 readerPooling=false readerTermsIndexDivisor=1 flushPolicy=org.apache.lucene.index.FlushByRamOrCountsPolicy@2ec791b9 perThreadHardLimitMB=1945 -- View this message in context: http://lucene.472066.n3.nabble.com/Thread-locking-while-merging-ConcurrentMergeScheduler-issue-tp3222427p3223874.html Sent from the Lucene - Java Users mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org