ArrayIndexOutOfBoundsException when iterating over TermDocs
-----------------------------------------------------------

                 Key: LUCENE-2666
                 URL: https://issues.apache.org/jira/browse/LUCENE-2666
             Project: Lucene - Java
          Issue Type: Bug
          Components: Index
    Affects Versions: 3.0.2
            Reporter: Shay Banon


A user got this very strange exception, and I managed to get the index that it 
happens on. Basically, iterating over the TermDocs causes an AAOIB exception. I 
easily reproduced it using the FieldCache which does exactly that (the field in 
question is indexed as numeric). Here is the exception:

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 114
        at org.apache.lucene.util.BitVector.get(BitVector.java:104)
        at 
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
        at 
org.apache.lucene.search.FieldCacheImpl$LongCache.createValue(FieldCacheImpl.java:501)
        at 
org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:183)
        at 
org.apache.lucene.search.FieldCacheImpl.getLongs(FieldCacheImpl.java:470)
        at TestMe.main(TestMe.java:56)

It happens on the following segment: _26t docCount: 914 delCount: 1 
delFileName: _26t_1.del

And as you can see, it smells like a corner case (it fails for document number 
912, the AIOOB happens from the deleted docs). The code to recreate it is 
simple:

        FSDirectory dir = FSDirectory.open(new File("index"));
        IndexReader reader = IndexReader.open(dir, true);

        IndexReader[] subReaders = reader.getSequentialSubReaders();
        for (IndexReader subReader : subReaders) {
            Field field = 
subReader.getClass().getSuperclass().getDeclaredField("si");
            field.setAccessible(true);
            SegmentInfo si = (SegmentInfo) field.get(subReader);
            System.out.println("--> " + si);
            if (si.getDocStoreSegment().contains("_26t")) {
                // this is the probleatic one...
                System.out.println("problematic one...");
                FieldCache.DEFAULT.getLongs(subReader, "__documentdate", 
FieldCache.NUMERIC_UTILS_LONG_PARSER);
            }
        }

Here is the result of a check index on that segment:

  8 of 10: name=_26t docCount=914
    compound=true
    hasProx=true
    numFiles=2
    size (MB)=1.641
    diagnostics = {optimize=false, mergeFactor=10, 
os.version=2.6.18-194.11.1.el5.centos.plus, os=Linux, mergeDocStores=true, 
lucene.version=3.0.2 953716 - 2010-06-11 17:13:53, source=merge, os.arch=amd64, 
java.version=1.6.0, java.vendor=Sun Microsystems Inc.}
    has deletions [delFileName=_26t_1.del]
    test: open reader.........OK [1 deleted docs]
    test: fields..............OK [32 fields]
    test: field norms.........OK [32 fields]
    test: terms, freq, prox...ERROR [114]
java.lang.ArrayIndexOutOfBoundsException: 114
        at org.apache.lucene.util.BitVector.get(BitVector.java:104)
        at 
org.apache.lucene.index.SegmentTermDocs.next(SegmentTermDocs.java:127)
        at 
org.apache.lucene.index.SegmentTermPositions.next(SegmentTermPositions.java:102)
        at org.apache.lucene.index.CheckIndex.testTermIndex(CheckIndex.java:616)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:509)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
        at TestMe.main(TestMe.java:47)
    test: stored fields.......ERROR [114]
java.lang.ArrayIndexOutOfBoundsException: 114
        at org.apache.lucene.util.BitVector.get(BitVector.java:104)
        at 
org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
        at 
org.apache.lucene.index.CheckIndex.testStoredFields(CheckIndex.java:684)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:512)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
        at TestMe.main(TestMe.java:47)
    test: term vectors........ERROR [114]
java.lang.ArrayIndexOutOfBoundsException: 114
        at org.apache.lucene.util.BitVector.get(BitVector.java:104)
        at 
org.apache.lucene.index.ReadOnlySegmentReader.isDeleted(ReadOnlySegmentReader.java:34)
        at 
org.apache.lucene.index.CheckIndex.testTermVectors(CheckIndex.java:721)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:515)
        at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:299)
        at TestMe.main(TestMe.java:47)



The creation of the index does not do something fancy (all defaults), though 
there is usage of the near real time aspect (IndexWriter#getReader) which does 
complicate deleted docs handling. Seems like the deleted docs got written 
without matching the number of docs?. Sadly, I don't have something that 
recreates it from scratch, but I do have the index if someone want to have a 
look at it (mail me directly and I will provide a download link).

I will continue to investigate why this might happen, just wondering if someone 
stumbled on this exception before. Lucene 3.0.2 is used.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to