This is very unusual. I ran four Solr 6 clusters in prod for a few years and never saw any index corruption. One had 50+ million documents with hourly updates. We resized it seasonally between 24 and 64 nodes. Another had two nodes but 120 million documents with incremental updates (every time a new user registered).
I actually can’t remember any index corruption in Solr and I’ve run versions from 1.3 to 9.1 with both high query load (Netflix) and massive content (LexisNexis). I would look at system-level causes, not Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 18, 2024, at 8:07 AM, mtn search <search...@gmail.com> wrote: > > Hello, > > We are in the process of moving to SolrCloud Solr 9, but my team is still > maintaining a *large Solr 6 farm (Solr/Lucene 6.4.2).* > > In the last 6 months we have noticed a good number of CorruptIndexException > errors. Most, if not all, we relate to an event on Vsphere where a large > number of VMs lost the ability to write to disk (not good!). As we > encounter these errors we have re-indexed the Solr cores to fix them. A > good bit of work! We have tried to be proactive and use the Lucene Index > checker tool to detect corrupt cores, however this is a lot of overhead and > time to run. > > I am interested to learn more about what may cause Index corruption > (wondering if other circumstances, beyond the temp loss of disk event, > might be causing these errors in our Solr farm). > - Is it always a problem with the Linux VM file system or storage, or a > possible issue with Lucene? > - Is there some misbehavior (or the handling of a specific scenario - high > number of atomic updates?) within Lucene/Solr that can result in corruption? > - Is there a size limit to Solr core that when exceeded, make them more > vulnerable to corruption? > > Below is one of the type of error messages related to corruption that we > have observed: > > Caused by: org.apache.lucene.index.CorruptIndexException: codec header > mismatch: actual header=-1527899865 vs expected header=1071082519 (resource= > BufferedChecksumIndexInput(MMapIndexInput(path="/var/data/solr/instance-1/ > C23491/content/index/_vj4.cfs") [slice=_vj4.fnm])) at org.apache.lucene. > codecs.CodecUtil.checkHeader(CodecUtil.java:196) at org.apache.lucene.codecs > .CodecUtil.checkIndexHeader(CodecUtil.java:255) at org.apache.lucene.codecs. > lucene60.Lucene60FieldInfosFormat.read(Lucene60FieldInfosFormat.java:117) at > org.apache.lucene.index.IndexWriter.readFieldInfos(IndexWriter.java:1063) at > org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexWriter.java:1079) > at org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java:968) at org. > apache.solr.update.SolrIndexWriter.<init>(SolrIndexWriter.java:125) at org. > apache.solr.update.SolrIndexWriter.create(SolrIndexWriter.java:100) at org. > apache.solr.update.DefaultSolrCoreState.createMainIndexWriter( > DefaultSolrCoreState.java:240) at org.apache.solr.update. > DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:114) at org. > apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1852) ... 40 more > Suppressed: org.apache.lucene.index.CorruptIndexException: codec footer > mismatch (file truncated?): actual footer=-548541180 vs expected footer=- > 1071082520 (resource=BufferedChecksumIndexInput(MMapIndexInput(path="/var/ > data/solr/instance-1/C23491/content/index/_vj4.cfs") [slice=_vj4.fnm])) at > org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil.java:499) at org > .apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:411) at org. > apache.lucene.codecs.CodecUtil.checkFooter(CodecUtil.java:459) at org.apache > .lucene.codecs.lucene60.Lucene60FieldInfosFormat.read( > Lucene60FieldInfosFormat.java:171) ... 48 more > > Thanks, > Matt