Here's the issue:

    https://issues.apache.org/jira/browse/LUCENE-3255

It's because we read the first 0 int to be an ancient segments file
format, and the next 0 int to mean there are no segments.  Yuck!

This format pre-dates Lucene 1.9, so the fix for 3.x is to stop
supporting this ancient format... but I don't see any easy way to fix
this pre-3.x where we must (by our back compat rules) support such an
ancient index.

Mike McCandless

http://blog.mikemccandless.com

On Tue, Jun 28, 2011 at 10:09 AM, mark harwood <markharw...@yahoo.co.uk> wrote:
> I've got Greg's bad segment file and it does look to be all zeros and if I 
> drop
> it into an existing index directory with the name segment_N+1 it reproduces 
> the
> error i.e. IndexReader opens the index as if it contains zero docs.
> Preparing a Jira as we speak.
>
>
> ----- Original Message ----
> From: Michael McCandless <luc...@mikemccandless.com>
> To: java-user@lucene.apache.org
> Sent: Tue, 28 June, 2011 14:59:48
> Subject: Re: Corrupt segments file full of zeros
>
> On Tue, Jun 28, 2011 at 9:29 AM, mark harwood <markharw...@yahoo.co.uk> wrote:
>> Hi Mike.
>>>>Hmmm -- what code are you running here, to print the number of docs?
>>
>> SegmentInfos.setInfoStream(System.out);
>> FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex"));
>> IndexReader r = IndexReader.open(dir, true);
>> System.out.println("index has "+r.maxDoc()+" docs");
>>
>> From my own tests outside of Greg's environment I've found Lucene to be doing
>> all the right things and IndexReader falls back gracefully to the previous
>> commit e.g. here is the output from when I deliberately killed an update 
>> after
>> prepareToCommit, leaving segment_2 and segment_3 and  then vandalised
> segment_3
>> with all zero bytes:
>>   SIS [main]: directory listing genA=3
>>   SIS [main]: fallback check: 2; 2
>>   SIS [main]: segments.gen check: genB=2
>>   SIS [main]: primary Exception on 'segments_3': java.io.IOException: read
> past
>> EOF'; will retry: retry=false; gen = 3
>>   SIS [main]: fallback to prior segment file 'segments_2'
>>   SIS [main]: success on fallback segments_2
>>
>> Lucene does the right thing going back to _2. I can't yet see why in Greg's
>> environment (NFS based) it fails to see _4vc as corrupt in the same way the
>> above test correctly sees _3 as corrupt.
>
> Hmm.  Mark, if you vandalise segments_3 with 0s, and then remove
> segmetns_2, what happens when you try to open the IndexReader?  (I
> would expect exc).
>
> Greg, can you post the full stdout you see from SIS after enabling its
> infoStream in the case that returns an IR with 0 docs (ie when you
> delete segments_4vb).
>
> Also: if you don't delete any of the segments_N file, and run the same
> code, how many docs do you get?
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
> For additional commands, e-mail: java-user-h...@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to