Hi, I am debugging a bulk indexing performance issue while upgrading to 6.6 from 4.5.0 . I have commits disabled while indexing total of 85G data during 7 hours. At the end of it, I want some 30 or so big segments. But i am getting 3000 segments. I deleted the index and enabled infostream logging ; i have attached the log when first segment is flushed. Here are few questions:
1. When a segment if flushed , then is it permanent or can more documents be written to it (besides the merge scenario)? 2. It seems that 330+ threads are writing in parallel. Will each one of them become one segment when written to the disk? In which case, i should probably decrease concurrency? 3. One possibility is to delay flushing, the flush is getting triggered at 10000MB, probably coming from <ramBufferSizeMB>10000</ramBufferSizeMB> ; however, the segment which is flushed is only 115MB. Is this limit for the combined size of all in-memory segments? In which case, is it ok to increase it further to use more of my heap (48GB). 4. How can I decrease the concurrency, maybe the solution is to use fewer in memory segments? In previous run, there were 110k files in the index folder after I stopping indexing. Before doing commit, I noticed that the file count continued to decrease every few minutes, until it reduced to 27k or so. (I committed after it stabilized) My Indexconfig is this: <indexConfig> <writeLockTimeout>1000</writeLockTimeout> <commitLockTimeout>10000</commitLockTimeout> <maxIndexingThreads>10</maxIndexingThreads> <useCompoundFile>false</useCompoundFile> <ramBufferSizeMB>10000</ramBufferSizeMB> <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory"> <int name="maxMergeAtOnce">5</int> <int name="segmentsPerTier">3000</int> <int name="maxMergeAtOnceExplicit">10</int> <int name="floorSegmentMB">16</int> <!-- 200 gb since we want few big segments during full indexing --> <double name="maxMergedSegmentMB">200000</double> <double name="forceMergeDeletesPctAllowed">1</double> </mergePolicyFactory> <mergeScheduler class="org.apache.lucene.index.ConcurrentMergeScheduler"> <int name="maxThreadCount">10</int> <int name="maxMergeCount">10</int> </mergeScheduler> <lockType>${solr.lock.type:native}</lockType> <reopenReaders>true</reopenReaders> <deletionPolicy class="solr.SolrDeletionPolicy"> <str name="maxCommitsToKeep">1</str> <str name="maxOptimizedCommitsToKeep">0</str> </deletionPolicy> <infoStream>true</infoStream> <applyAllDeletesOnFlush>false</applyAllDeletesOnFlush> </indexConfig> Thanks Nawab
--------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org