Here's checkindex: NOTE: testing will be more thorough if you run java with '-ea:org.apache.lucene', so assertions are enabled
Opening index @ /vol/solr/data/index/ Segments file=segments_vxx numSegments=8 version=FORMAT_HAS_PROX [Lucene 2.4] 1 of 8: name=_ks4 docCount=2504982 compound=false hasProx=true numFiles=11 size (MB)=3,965.695 no deletions test: open reader.........OK test: fields, norms.......OK [343 fields] test: terms, freq, prox...OK [37238560 terms; 161527224 terms/docs pairs; 186273362 tokens] test: stored fields.......OK [55813402 total field count; avg 22.281 fields per doc] test: term vectors........OK [7998458 total vector count; avg 3.193 term/freq vector fields per doc] 2 of 8: name=_oaw docCount=514635 compound=false hasProx=true numFiles=12 size (MB)=746.887 has deletions [delFileName=_oaw_1rb.del] test: open reader.........OK [155528 deleted docs] test: fields, norms.......OK [172 fields] test: terms, freq, prox...OK [7396227 terms; 28146962 terms/docs pairs; 17298364 tokens] test: stored fields.......OK [5736012 total field count; avg 15.973 fields per doc] test: term vectors........OK [1045176 total vector count; avg 2.91 term/freq vector fields per doc] 3 of 8: name=_tll docCount=827949 compound=false hasProx=true numFiles=12 size (MB)=761.782 has deletions [delFileName=_tll_2fs.del] test: open reader.........OK [39283 deleted docs] test: fields, norms.......OK [180 fields] test: terms, freq, prox...OK [10925397 terms; 43361019 terms/docs pairs; 42123294 tokens] test: stored fields.......OK [8673255 total field count; avg 10.997 fields per doc] test: term vectors........OK [880272 total vector count; avg 1.116 term/freq vector fields per doc] 4 of 8: name=_tdx docCount=18372 compound=false hasProx=true numFiles=12 size (MB)=56.856 has deletions [delFileName=_tdx_9.del] test: open reader.........OK [18368 deleted docs] test: fields, norms.......OK [50 fields] test: terms, freq, prox...OK [261974 terms; 2018842 terms/docs pairs; 150 tokens] test: stored fields.......OK [76 total field count; avg 19 fields per doc] test: term vectors........OK [14 total vector count; avg 3.5 term/freq vector fields per doc] 5 of 8: name=_te8 docCount=19929 compound=false hasProx=true numFiles=12 size (MB)=60.475 has deletions [delFileName=_te8_a.del] test: open reader.........OK [19900 deleted docs] test: fields, norms.......OK [72 fields] test: terms, freq, prox...OK [276045 terms; 2166958 terms/docs pairs; 1196 tokens] test: stored fields.......OK [522 total field count; avg 18 fields per doc] test: term vectors........OK [132 total vector count; avg 4.552 term/freq vector fields per doc] 6 of 8: name=_tej docCount=22201 compound=false hasProx=true numFiles=12 size (MB)=65.827 has deletions [delFileName=_tej_o.del] test: open reader.........OK [22171 deleted docs] test: fields, norms.......OK [50 fields] test: terms, freq, prox...FAILED WARNING: would remove reference to this segment (-fix was not specified); full exception: java.lang.ArrayIndexOutOfBoundsException: Array index out of range: 34950 at org.apache.lucene.util.BitVector.get(BitVector.java:91) at org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125) at org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98) at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:222) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:433) 7 of 8: name=_1agw docCount=1717926 compound=false hasProx=true numFiles=12 size (MB)=2,390.413 has deletions [delFileName=_1agw_1.del] test: open reader.........OK [1 deleted docs] test: fields, norms.......OK [438 fields] test: terms, freq, prox...OK [20959015 terms; 101603282 terms/docs pairs; 123561985 tokens] test: stored fields.......OK [26248407 total field count; avg 15.279 fields per doc] test: term vectors........OK [4911368 total vector count; avg 2.859 term/freq vector fields per doc] 8 of 8: name=_1agz docCount=1 compound=false hasProx=true numFiles=8 size (MB)=0 no deletions test: open reader.........OK test: fields, norms.......OK [6 fields] test: terms, freq, prox...OK [6 terms; 6 terms/docs pairs; 6 tokens] test: stored fields.......OK [6 total field count; avg 6 fields per doc] test: term vectors........OK [0 total vector count; avg 0 term/freq vector fields per doc] WARNING: 1 broken segments detected WARNING: 30 documents would be lost if -fix were specified NOTE: would write new segments file [-fix was not specified] On Fri, Jan 2, 2009 at 3:47 PM, Brian Whitman <br...@echonest.com> wrote: > I will but I bet I can guess what happened -- this index has many > duplicates in it as well (same uniqueKey id multiple times) - this happened > to us once before and it was because the solr server went down during an > add. We may have to re-index, but I will run checkIndex now. Thanks > (Thread for dupes here : > http://mail-archives.apache.org/mod_mbox/lucene-solr-user/200803.mbox/%3c4ed8c459-1b0f-41cc-986c-4ffceef82...@variogr.am%3e) > > > On Fri, Jan 2, 2009 at 3:44 PM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> It looks like your index has some kind of corruption. Were there any >> other >> exceptions prior to this one, or, any previous problems with the OS/IO >> system? >> >> Can you run CheckIndex (java org.apache.lucene.index.CheckIndex to see >> usage) and post the output? >> Mike >> >> Brian Whitman <br...@echonest.com> wrote: >> >> > I am getting this on a 10GB index (via solr 1.3) during an optimize: >> > Jan 2, 2009 6:51:52 PM org.apache.solr.common.SolrException log >> > SEVERE: java.io.IOException: background merge hit exception: >> _ks4:C2504982 >> > _oaw:C514635 _tll:C827949 _tdx:C18372 _te8:C19929 _tej:C22201 >> > _1agw:C1717926 >> > _1agz:C1 into _1ah2 [optimize] >> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2346) >> > at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2280) >> > at >> > >> > >> org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:355) >> > at >> > >> > >> org.apache.solr.update.processor.RunUpdateProcessor.processCommit(RunUpdateProcessorFactory.java:77) >> > ... >> > >> > Exception in thread "Lucene Merge Thread #2" >> > org.apache.lucene.index.MergePolicy$MergeException: >> > java.lang.ArrayIndexOutOfBoundsException: Array index out of range: >> 34950 >> > at >> > >> > >> org.apache.lucene.index.ConcurrentMergeScheduler.handleMergeException(ConcurrentMergeScheduler.java:314) >> > at >> > >> > >> org.apache.lucene.index.ConcurrentMergeScheduler$MergeThread.run(ConcurrentMergeScheduler.java:291) >> > Caused by: java.lang.ArrayIndexOutOfBoundsException: Array index out of >> > range: 34950 >> > at org.apache.lucene.util.BitVector.get(BitVector.java:91) >> > at >> org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:125) >> > at >> > >> > >> org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:98) >> > ... >> > >> > >> > Does anyone know how this is caused and how I can fix it? It happens >> with >> > every optimize. Commits were very slow on this index as well (40x as >> slow >> > as >> > a similar index on another machine) I have plenty of disk space (many >> 100s >> > of GB) free. >> > >> > >