Hi Mike. 
>>Hmmm -- what code are you running here, to print the number of docs?

SegmentInfos.setInfoStream(System.out);
FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex"));
IndexReader r = IndexReader.open(dir, true);
System.out.println("index has "+r.maxDoc()+" docs");  

From my own tests outside of Greg's environment I've found Lucene to be doing 
all the right things and IndexReader falls back gracefully to the previous 
commit e.g. here is the output from when I deliberately killed an update after 
prepareToCommit, leaving segment_2 and segment_3 and  then vandalised segment_3 
with all zero bytes:
   SIS [main]: directory listing genA=3
   SIS [main]: fallback check: 2; 2
   SIS [main]: segments.gen check: genB=2
   SIS [main]: primary Exception on 'segments_3': java.io.IOException: read 
past 
EOF'; will retry: retry=false; gen = 3
   SIS [main]: fallback to prior segment file 'segments_2'
   SIS [main]: success on fallback segments_2

Lucene does the right thing going back to _2. I can't yet see why in Greg's 
environment (NFS based) it fails to see _4vc as corrupt in the same way the 
above test correctly sees _3 as corrupt.

Cheers
Mark


----- Original Message ----
From: Michael McCandless <luc...@mikemccandless.com>
To: java-user@lucene.apache.org
Sent: Tue, 28 June, 2011 14:04:40
Subject: Re: Corrupt segments file full of zeros

On Tue, Jun 28, 2011 at 8:53 AM, Tarr, Gregory <gregory.t...@detica.com> wrote:
> Michael
>
> We are not using commit points unfortunately.

That's fine -- even if you don't keep multiple commit points in your
index, when a commit() op fails, then you can end up with two
segments_N files.  The older one is "good" (last successful commit)
and the new one is broken.

> This was a scheduled update to our index, and on observation the index 
>directory had two segments_N files:
>
> segments_4vb (modified 24 June 2011 02:05:38 size 7.61KB)
> segments_4vc (modified 24 June 2011 02:20:42 size 5.91KB)

OK, so you have 2 segments_N files because something went wrong during
commit of the 2nd one.

> We were not sure which one of these was the real one, so we deleted 4vb and 
> got 
>the following from SegmentInfos:

It will always be the "older" one that was the last successful commit,
unless you keep multiple commit points in the index.

> Directory listing genA=6312
> Fallback check: 6311; 6311
> Segments.gen check: genB=6311
> Index has 0 docs

Hmmm -- what code are you running here, to print the number of docs?
new IndexWriter(), with create=true?  I would have expected IR.open to
throw an exc here.

> We then deleted 4vc and got the following:
>
> Directory listing genA=6311
> Fallback check: 6311; 6311
> Segments.gen check: genB=6311
> Index has 40022898 docs
>
> Opening 4vc in an octal editor yields only ASCII zeros (0000000 nul nul nul 
> nul 
>nul nul nul....etc). It may be that Windows is responsible for this, as our 
>indexes are accessed through a fileserver and we know that a delayed write 
>occurred.
>
> My question is: why does an index with 4vc open?

I'm not sure, unless you are opening with IW and create=true.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to