Michael We are not using commit points unfortunately.
This was a scheduled update to our index, and on observation the index directory had two segments_N files: segments_4vb (modified 24 June 2011 02:05:38 size 7.61KB) segments_4vc (modified 24 June 2011 02:20:42 size 5.91KB) We were not sure which one of these was the real one, so we deleted 4vb and got the following from SegmentInfos: Directory listing genA=6312 Fallback check: 6311; 6311 Segments.gen check: genB=6311 Index has 0 docs We then deleted 4vc and got the following: Directory listing genA=6311 Fallback check: 6311; 6311 Segments.gen check: genB=6311 Index has 40022898 docs Opening 4vc in an octal editor yields only ASCII zeros (0000000 nul nul nul nul nul nul nul....etc). It may be that Windows is responsible for this, as our indexes are accessed through a fileserver and we know that a delayed write occurred. My question is: why does an index with 4vc open? Thanks Greg -----Original Message----- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: 28 June 2011 13:36 To: java-user@lucene.apache.org Subject: Re: Corrupt segments file full of zeros OK, this is why Lucene (and Luke) consider the index fine, ie, if Lucene has problems opening segments_N (all 0s is definitely not a valid segments_N file), it falls back to the last commit (segments_(N-1)) and opens that instead. Ie, IR.open and new IW(...) open the last successful commit. Mike McCandless http://blog.mikemccandless.com On Tue, Jun 28, 2011 at 8:28 AM, Tarr, Gregory <gregory.t...@detica.com> wrote: > There was a segments_(N-1), which was a valid segments file and opened > correctly in luke. > > The trouble came because we had to manually rename these files in order to > prevent the index from being wiped. > > Thanks > > Greg > > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: 28 June 2011 13:26 > To: java-user@lucene.apache.org > Subject: Re: Corrupt segments file full of zeros > > Is there only one segments_N file in the index (the one with all 0s)? > Or is there a segments_(N-1) too? > > Mike McCandless > > http://blog.mikemccandless.com > > On Tue, Jun 28, 2011 at 8:17 AM, Tarr, Gregory <gregory.t...@detica.com> > wrote: >> We don't have a -9 in the file. It isn't a valid lucene segments >> file, as it only contains zeros. >> >> We're wondering why this opens in Luke, and why the CheckIndex >> reports that the index is OK. >> >> -----Original Message----- >> From: mark harwood [mailto:markharw...@yahoo.co.uk] >> Sent: 28 June 2011 13:09 >> To: java-user@lucene.apache.org >> Subject: Re: Corrupt segments file full of zeros >> >> According to the spec there should at least be an Int32 of -9 to >> declare the Format - >> http://lucene.apache.org/java/2_9_3/fileformats.html#Segments File >> >> >> >> ----- Original Message ---- >> From: Uwe Schindler <u...@thetaphi.de> >> To: java-user@lucene.apache.org >> Sent: Tue, 28 June, 2011 12:32:34 >> Subject: RE: Corrupt segments file full of zeros >> >> So where is the problem at all? Why should a segments file not >> contain lots of zeroes? If the index is not corrupt all is fine. >> >> ----- >> Uwe Schindler >> H.-H.-Meier-Allee 63, D-28213 Bremen >> http://www.thetaphi.de >> eMail: u...@thetaphi.de >> >> >>> -----Original Message----- >>> From: Tarr, Gregory [mailto:gregory.t...@detica.com] >>> Sent: Tuesday, June 28, 2011 11:56 AM >>> To: java-user@lucene.apache.org >>> Subject: RE: Corrupt segments file full of zeros >>> >>> Yes I have done that, and you just get "No problems were detected >>> with >> this >>> index" >>> >>> Surely there is a major problem with this index? >>> >>> Also the check() procedure takes a long time - is there any way you >> can >> just >>> do a health check on the segments file? >>> >>> Thanks >>> >>> Greg >>> >>> -----Original Message----- >>> From: Shai Erera [mailto:ser...@gmail.com] >>> Sent: 28 June 2011 10:36 >>> To: java-user@lucene.apache.org >>> Subject: Re: Corrupt segments file full of zeros >>> >>> You can try the CheckIndex tool. You feed it a directory and call >>> .check() and it reports the results. >>> >>> Shai >>> >>> On Tue, Jun 28, 2011 at 11:46 AM, Tarr, Gregory >>> <gregory.t...@detica.com>wrote: >>> >>> > We have a problem with our fileserver where our indexes are hosted >>> > remotely, using Lucene 2.9.3. >>> > >>> > This can mean that a segments file is written which is full of >>> > ASCII zeros. Using the od -ah command, we get: >>> > >>> > 0000000 nul nul nul nul nul nul nul....etc >>> > >>> > If opened in Luke, the index opens successfully but has zero >>> documents. >>> > >>> > Why does this open correctly in luke, and is there a procedure in >> the >>> > lucene code that can verify a segments file, e.g. check whether it >>> > refers to any segments? >>> > >>> > Thanks >>> > >>> > Greg >>> > >>> > >>> > Please consider the environment before printing this email. >>> > >>> > This message should be regarded as confidential. If you have >> received >>> > this email in error please notify the sender and destroy it >>> immediately. >>> > >>> > Statements of intent shall only become binding when confirmed in >> hard >>> > copy by an authorised signatory. The contents of this email may >>> > relate to dealings with other companies under the control of >>> > Detica Limited, details of which can be found at >>> http://www.detica.com/statutory-information. >>> > >>> > Detica Limited is registered in England under No: 1337451. >>> > Registered offices: Surrey Research Park, Guildford, Surrey, GU2 >> 7YP, >>> > England. >>> > >>> > >>> Please consider the environment before printing this email. >>> >>> This message should be regarded as confidential. If you have >>> received >> this >>> email in error please notify the sender and destroy it immediately. >>> >>> Statements of intent shall only become binding when confirmed in >>> hard >> copy >>> by an authorised signatory. The contents of this email may relate >>> to >> dealings >>> with other companies under the control of Detica Limited, details of >> which >>> can be found at http://www.detica.com/statutory-information. >>> >>> Detica Limited is registered in England under No: 1337451. >>> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 >>> 7YP, England. >>> >>> -------------------------------------------------------------------- >>> - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >>> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> Please consider the environment before printing this email. >> >> This message should be regarded as confidential. If you have received this >> email in error please notify the sender and destroy it immediately. >> >> Statements of intent shall only become binding when confirmed in hard copy >> by an authorised signatory. The contents of this email may relate to >> dealings with other companies under the control of Detica Limited, details >> of which can be found at http://www.detica.com/statutory-information. >> >> Detica Limited is registered in England under No: 1337451. >> Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, >> England. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > Please consider the environment before printing this email. > > This message should be regarded as confidential. If you have received this > email in error please notify the sender and destroy it immediately. > > Statements of intent shall only become binding when confirmed in hard copy by > an authorised signatory. The contents of this email may relate to dealings > with other companies under the control of Detica Limited, details of which > can be found at http://www.detica.com/statutory-information. > > Detica Limited is registered in England under No: 1337451. > Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England. > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Please consider the environment before printing this email. This message should be regarded as confidential. If you have received this email in error please notify the sender and destroy it immediately. Statements of intent shall only become binding when confirmed in hard copy by an authorised signatory. The contents of this email may relate to dealings with other companies under the control of Detica Limited, details of which can be found at http://www.detica.com/statutory-information. Detica Limited is registered in England under No: 1337451. Registered offices: Surrey Research Park, Guildford, Surrey, GU2 7YP, England. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org