Re: Corrupt index (IndexOutOfBoundsException)

René Zöpnek Tue, 24 Mar 2009 14:00:53 -0700

Thank you for your help Michael. I've solved the problem by new creation of the 
index.
The OutOfErrorException killed the thread, which was responsible for index 
maintenance.
So the index recreation failed without an error message. So after recreating 
the index,
the problem is solved.


Sorry for the inconvenience.

Greetz
René

Michael McCandless schrieb:
> When I run checkIndex on your index, I see a new exception:
>
> org.apache.lucene.index.CorruptIndexException: Incompatible format
> version: 119865344 expected 1 or lower
>     at org.apache.lucene.index.FieldsReader.<init>(FieldsReader.java:116)
>     at 
> org.apache.lucene.index.SegmentReader.initialize(SegmentReader.java:519)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:468)
>     at org.apache.lucene.index.SegmentReader.get(SegmentReader.java:382)
>     at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:395)
>     at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:689)
>
> Is this what you see?
>
> Mike
>
> "René Zöpnek" <[email protected]> wrote:
>  
>> Thanks for your answer, Mike.
>>
>> Unfortunately I have no direct access to the server with the corrupt index. 
>> So changing the creation process of the index is not possible.
>>
>> I've uploaded the index to http://drop.io/hlu53sl (9 MB).
>>
>>
>>
>> Here is the code for creating the index:
>>
>> public static void createIndex()
>> {
>>  log.info("create index");
>>  long start = System.currentTimeMillis();
>>  IndexWriter index = null;
>>  InitialContext ic = null;
>>  Connection connect = null;
>>  PreparedStatement query = null;
>>  ResultSet result = null;
>>  try{
>>          //create index
>>          index = new IndexWriter("/var/content/index", getAnalyzer(), true);
>>
>>          //get content data
>>          ic = new InitialContext();
>>          javax.sql.DataSource source = (javax.sql.DataSource) 
>> ic.lookup("java:/ContentDS");
>>          connect = source.getConnection();
>>          query = connect.prepareStatement("SELECT DISTINCT C.* FROM 
>> TAB_CONTENT C,TAB_PROJCONTENT PC WHERE C.CONTENT_ID = PC.CONTENT_ID AND NOT 
>> C.STORAGE = 'CONTAINER'");
>>          result = query.executeQuery();
>>
>>          while(result.next())
>>          {
>>                  // map file info
>>                  TabContentData data = TabContentMapper.getMapped(result);
>>                  // index metadata
>>                  try{
>>                          indexMetadata(data, index);
>>                  }catch(Exception e)
>>                  {
>>                          log.error("Failed to index "+data.getFileId()+" 
>> with id "+data.getContentId(),e);
>>                  }
>>          }
>>          log.info("indexing done");
>>  }catch(Exception e)
>>  {
>>         log.error("create index failed",e);
>>  }
>>  finally
>>  {
>>         //clean up
>>         try{ result.close(); }catch(Exception e){};
>>         try{ query.close(); }catch(Exception e){};
>>         try{ connect.close(); }catch(Exception e){};
>>         try{ ic.close(); }catch(Exception e){};
>>         try{ index.optimize(); }catch(Exception e){};
>>         try{ index.close(); }catch(Exception e){};
>>  }
>> }
>>
>> The indexMetadata(data, index); method just maps the column names and the 
>> column contents of one content into a lucene document which is then added to 
>> the index.
>>
>>
>> If you have any further questions, don't hesitate to ask and thank you for 
>> your help.
>>
>> Greetz!
>> René
>>
>>
>>
>> Michael McCandless schrieb:
>>    
>>> Something appears to be wrong with your _X.tii file (inside the compound 
>>> file).
>>>
>>> Can you post the code that recreates this broken index?
>>>
>>> Since it appears to be repeatable, could you regenerate your index with 
>>> compound file off, confirm the problem still happens, and then post the 
>>> _X.tii file?  I can try to look at it.
>>>
>>> Mike
>>>
>>> René Zöpnek wrote:
>>>
>>>      
>>>> Hello,
>>>>
>>>> I'm using Lucene 2.3.2 and had no problems untill now.
>>>>
>>>> But now I got an corrupt index. When searching, a 
>>>> java.lang.OutOfMemoryError is thrown. I've wrote the following test 
>>>> program:
>>>>
>>>> private static void search(String index, String query) throws 
>>>> CorruptIndexException, IOException, ParseException
>>>> {
>>>>     IndexReader reader = IndexReader.open(index);
>>>>     //reader.setTermInfosIndexDivisor(10);
>>>>     Collection col = Reader.getFieldNames(IndexReader.FieldOption.INDEXED);
>>>>     Iterator it = col.iterator();
>>>>     String[] fields = new String[col.size()];
>>>>     int i = 0;
>>>>     while(it.hasNext())
>>>>     {
>>>>         fields[i] = (String)it.next();
>>>>         System.out.println("field["+i+"]: "+fields[i]);
>>>>         i++;
>>>>     }
>>>>     Analyzer analyzer = new StandardAnalyzer();
>>>>     MultiFieldQueryParser parser = new MultiFieldQueryParser(fields, 
>>>> analyzer);
>>>>     parser.setAllowLeadingWildcard(true);
>>>>     Query quer = parser.parse(query);
>>>>     System.out.println("Query: "+quer.toString());
>>>>     quer = quer.rewrite(reader);
>>>>     System.out.println("rewritten Query: "+quer.toString());
>>>>     reader.close();
>>>> }
>>>>
>>>> If reader.setTermInfosIndexDivisor() is commented out, the stacktrace 
>>>> looks like this:
>>>>
>>>> Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>>     at 
>>>> org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:155)
>>>>     at 
>>>> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
>>>>     at 
>>>> org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
>>>>     at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
>>>>     at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
>>>>     at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
>>>>     at diplom.lucene.Index.search(Index.java:59)
>>>>     at diplom.lucene.Index.main(Index.java:28)
>>>>
>>>> The index has a size of 59 MB, so it is weird to get an 
>>>> OutOfMemoryException. So with reader.setTermInfosIndexDivisor() set to 10, 
>>>> the stacktrace looks like:
>>>>
>>>> java.lang.IndexOutOfBoundsException: Index: 103, Size: 54
>>>>     at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>>>     at java.util.ArrayList.get(ArrayList.java:322)
>>>>     at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
>>>>     at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
>>>>     at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
>>>>     at 
>>>> org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
>>>>     at 
>>>> org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:159)
>>>>     at 
>>>> org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:202)
>>>>     at 
>>>> org.apache.lucene.index.TermInfosReader.terms(TermInfosReader.java:277)
>>>>     at org.apache.lucene.index.SegmentReader.terms(SegmentReader.java:643)
>>>>     at org.apache.lucene.search.PrefixQuery.rewrite(PrefixQuery.java:42)
>>>>     at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:385)
>>>>     at diplom.lucene.Index.search(Index.java:59)
>>>>     at diplom.lucene.Index.main(Index.java:28)
>>>>
>>>>
>>>> CheckIndex prints the following:
>>>>
>>>> Segments file=segments_1zrx5 numSegments=1 version=FORMAT_SHARED_DOC_STORE 
>>>> [Lucene 2.3]
>>>>  1 of 1: name=_5pa8 docCount=117378
>>>>    compound=true
>>>>    numFiles=1
>>>>    size (MB)=57,573
>>>>    no deletions
>>>>    test: open reader.........OK
>>>>    test: fields, norms.......OK [54 fields]
>>>>    test: terms, freq, prox...FAILED
>>>>    WARNING: would remove reference to this segment (-fix was not 
>>>> specified); full exception:
>>>> java.lang.IndexOutOfBoundsException: Index: 110, Size: 54
>>>>     at java.util.ArrayList.RangeCheck(ArrayList.java:547)
>>>>     at java.util.ArrayList.get(ArrayList.java:322)
>>>>     at org.apache.lucene.index.FieldInfos.fieldInfo(FieldInfos.java:260)
>>>>     at org.apache.lucene.index.FieldInfos.fieldName(FieldInfos.java:249)
>>>>     at org.apache.lucene.index.TermBuffer.read(TermBuffer.java:68)
>>>>     at 
>>>> org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:123)
>>>>     at org.apache.lucene.index.CheckIndex.check(CheckIndex.java:182)
>>>>     at diplom.lucene.Index.check(Index.java:67)
>>>>     at diplom.lucene.Index.main(Index.java:28)
>>>>
>>>> WARNING: 1 broken segments detected
>>>> WARNING: 117378 documents would be lost if -fix were specified
>>>>
>>>> NOTE: would write new segments file [-fix was not specified]
>>>>
>>>> Index correct: false
>>>>
>>>>
>>>>
>>>> Recreating the index didn't solve the problem. And I have no idea for 
>>>> solving it, so every help is greatly appreciated.
>>>>
>>>> Thanks in advance.
>>>> Rene
>>>> -- 
>>>> Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
>>>> Jetzt absichern: https://homebanking.gmx.net/[email protected]
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [email protected]
>>>> For additional commands, e-mail: [email protected]
>>>>
>>>>         
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [email protected]
>>> For additional commands, e-mail: [email protected]
>>>
>>>       
>> -- 
>> Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
>> Jetzt absichern: https://homebanking.gmx.net/[email protected]
>>
>> -- 
>> Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: 
>> http://www.gmx.net/de/go/multimessenger01
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [email protected]
>> For additional commands, e-mail: [email protected]
>>
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>   

-- 
Aufgepasst: Sind Ihre Daten beim Online-Banking auch optimal geschützt?
Jetzt absichern: https://homebanking.gmx.net/[email protected]

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: Corrupt index (IndexOutOfBoundsException)

Reply via email to