Corrupt segments file full of zeros

2011-06-28 Thread Tarr, Gregory
We have a problem with our fileserver where our indexes are hosted remotely, using Lucene 2.9.3. This can mean that a segments file is written which is full of ASCII zeros. Using the od -ah command, we get: 000 nul nul nul nul nul nul nuletc If opened in Luke, the index opens successfull

Re: Corrupt segments file full of zeros

2011-06-28 Thread Shai Erera
You can try the CheckIndex tool. You feed it a directory and call .check() and it reports the results. Shai On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory wrote: > We have a problem with our fileserver where our indexes are hosted > remotely, using Lucene 2.9.3. > > This can mean that a segment

RE: Corrupt segments file full of zeros

2011-06-28 Thread Tarr, Gregory
Yes I have done that, and you just get "No problems were detected with this index" Surely there is a major problem with this index? Also the check() procedure takes a long time - is there any way you can just do a health check on the segments file? Thanks Greg -Original Message- From:

RE: Corrupt segments file full of zeros

2011-06-28 Thread Uwe Schindler
So where is the problem at all? Why should a segments file not contain lots of zeroes? If the index is not corrupt all is fine. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -Original Message- > From: Tarr, Gregory [mailto:grego

RE: Corrupt segments file full of zeros

2011-06-28 Thread Tarr, Gregory
The segments file containing lots of zeros means that the index has no segments. We could run the following to check this: SegmentInfos sis = new SegmentInfos(); sis.read(indexDir); int numSegments = sis.size(); if (numSegments < 1) { // index has no segments } Greg -Original Message-

Re: Corrupt segments file full of zeros

2011-06-28 Thread mark harwood
According to the spec there should at least be an Int32 of -9 to declare the Format - http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File - Original Message From: Uwe Schindler To: java-user@lucene.apache.org Sent: Tue, 28 June, 2011 12:32:34 Subject: RE: Corrupt segme

RE: Corrupt segments file full of zeros

2011-06-28 Thread Tarr, Gregory
We don't have a -9 in the file. It isn't a valid lucene segments file, as it only contains zeros. We're wondering why this opens in Luke, and why the CheckIndex reports that the index is OK. -Original Message- From: mark harwood [mailto:markharw...@yahoo.co.uk] Sent: 28 June 2011 13:09

Re: Corrupt segments file full of zeros

2011-06-28 Thread Michael McCandless
Is there only one segments_N file in the index (the one with all 0s)? Or is there a segments_(N-1) too? Mike McCandless http://blog.mikemccandless.com On Tue, Jun 28, 2011 at 8:17 AM, Tarr, Gregory wrote: > We don't have a -9 in the file. It isn't a valid lucene segments file, > as it only cont

RE: Corrupt segments file full of zeros

2011-06-28 Thread Tarr, Gregory
There was a segments_(N-1), which was a valid segments file and opened correctly in luke. The trouble came because we had to manually rename these files in order to prevent the index from being wiped. Thanks Greg -Original Message- From: Michael McCandless [mailto:luc...@mikemccandle

Re: Corrupt segments file full of zeros

2011-06-28 Thread Michael McCandless
OK, this is why Lucene (and Luke) consider the index fine, ie, if Lucene has problems opening segments_N (all 0s is definitely not a valid segments_N file), it falls back to the last commit (segments_(N-1)) and opens that instead. Ie, IR.open and new IW(...) open the last successful commit. Mike

RE: Corrupt segments file full of zeros

2011-06-28 Thread Tarr, Gregory
Michael We are not using commit points unfortunately. This was a scheduled update to our index, and on observation the index directory had two segments_N files: segments_4vb (modified 24 June 2011 02:05:38 size 7.61KB) segments_4vc (modified 24 June 2011 02:20:42 size 5.91KB) We were not sure

Re: Corrupt segments file full of zeros

2011-06-28 Thread Michael McCandless
On Tue, Jun 28, 2011 at 8:53 AM, Tarr, Gregory wrote: > Michael > > We are not using commit points unfortunately. That's fine -- even if you don't keep multiple commit points in your index, when a commit() op fails, then you can end up with two segments_N files. The older one is "good" (last suc

Re: Corrupt segments file full of zeros

2011-06-28 Thread mark harwood
Hi Mike. >>Hmmm -- what code are you running here, to print the number of docs? SegmentInfos.setInfoStream(System.out); FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex")); IndexReader r = IndexReader.open(dir, true); System.out.println("index has "+r.maxDoc()+" docs"); From my

Re: Corrupt segments file full of zeros

2011-06-28 Thread Michael McCandless
On Tue, Jun 28, 2011 at 9:29 AM, mark harwood wrote: > Hi Mike. >>>Hmmm -- what code are you running here, to print the number of docs? > > SegmentInfos.setInfoStream(System.out); > FSDirectory dir = FSDirectory.open(new File("j:/indexes/myindex")); > IndexReader r = IndexReader.open(dir, true); >

Re: Corrupt segments file full of zeros

2011-06-28 Thread mark harwood
I've got Greg's bad segment file and it does look to be all zeros and if I drop it into an existing index directory with the name segment_N+1 it reproduces the error i.e. IndexReader opens the index as if it contains zero docs. Preparing a Jira as we speak. - Original Message From: Mic

Re: Corrupt segments file full of zeros

2011-06-28 Thread Michael McCandless
Here's the issue: https://issues.apache.org/jira/browse/LUCENE-3255 It's because we read the first 0 int to be an ancient segments file format, and the next 0 int to mean there are no segments. Yuck! This format pre-dates Lucene 1.9, so the fix for 3.x is to stop supporting this ancient for

Re: Corrupt segments file full of zeros

2011-06-28 Thread Trejkaz
On Wed, Jun 29, 2011 at 2:24 AM, Michael McCandless wrote: > Here's the issue: > >    https://issues.apache.org/jira/browse/LUCENE-3255 > > It's because we read the first 0 int to be an ancient segments file > format, and the next 0 int to mean there are no segments.  Yuck! > > This format pre-dat