Here's the issue: https://issues.apache.org/jira/browse/LUCENE-3255
It's because we read the first 0 int to be an ancient segments file format, and the next 0 int to mean there are no segments. Yuck! This format pre-dates Lucene 1.9, so the fix for 3.x is to stop supporting this ancient format... but I don't see any easy way to fix this pre-3.x where we must (by our back compat rules) support such an ancient index. Mike McCandless http://blog.mikemccandless.com On Tue, Jun 28, 2011 at 10:09 AM, mark harwood <markharw...@yahoo.co.uk> wrote: > I've got Greg's bad segment file and it does look to be all zeros and if I > drop > it into an existing index directory with the name segment_N+1 it reproduces > the > error i.e. IndexReader opens the index as if it contains zero docs. > Preparing a Jira as we speak. > > > ----- Original Message ---- > From: Michael McCandless <luc...@mikemccandless.com> > To: java-user@lucene.apache.org > Sent: Tue, 28 June, 2011 14:59:48 > Subject: Re: Corrupt segments file full of zeros > > On Tue, Jun 28, 2011 at 9:29 AM, mark harwood <markharw...@yahoo.co.uk> wrote: >> Hi Mike. >>>>Hmmm -- what code are you running here, to print the number of docs? >> >> SegmentInfos.setInfoStream(System.out); >> FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex")); >> IndexReader r = IndexReader.open(dir, true); >> System.out.println("index has "+r.maxDoc()+" docs"); >> >> From my own tests outside of Greg's environment I've found Lucene to be doing >> all the right things and IndexReader falls back gracefully to the previous >> commit e.g. here is the output from when I deliberately killed an update >> after >> prepareToCommit, leaving segment_2 and segment_3 and then vandalised > segment_3 >> with all zero bytes: >> SIS [main]: directory listing genA=3 >> SIS [main]: fallback check: 2; 2 >> SIS [main]: segments.gen check: genB=2 >> SIS [main]: primary Exception on 'segments_3': java.io.IOException: read > past >> EOF'; will retry: retry=false; gen = 3 >> SIS [main]: fallback to prior segment file 'segments_2' >> SIS [main]: success on fallback segments_2 >> >> Lucene does the right thing going back to _2. I can't yet see why in Greg's >> environment (NFS based) it fails to see _4vc as corrupt in the same way the >> above test correctly sees _3 as corrupt. > > Hmm. Mark, if you vandalise segments_3 with 0s, and then remove > segmetns_2, what happens when you try to open the IndexReader? (I > would expect exc). > > Greg, can you post the full stdout you see from SIS after enabling its > infoStream in the case that returns an IR with 0 docs (ie when you > delete segments_4vb). > > Also: if you don't delete any of the segments_N file, and run the same > code, how many docs do you get? > > Mike > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org