sorry to resend. I'll change IO to local. Is there anyway to recover first index? now it can not be opened by checkIndex, we are building index of 7 billion webpages, it costs much time to rebuild.
On Sun, Sep 25, 2016 at 5:31 PM, Ziming Dong <dzm1016397...@gmail.com> wrote: > I'll change IO to local. Is there anyway to recover first index? now it > can be opened by checkIndex, we are building index of 7 billion webpages, > it costs much time to rebuild. > > On Sat, Sep 24, 2016 at 2:54 AM, Michael McCandless < > luc...@mikemccandless.com> wrote: > >> The 'sync' option for an NFS client just means that every write is >> sent immediately across the network. And it really is useless >> performance loss as long as your app (like Lucene) does the "right >> thing" with fsync. >> >> The more important question is why fsync sent to your NFS client and >> then to the Mac Mini's NFS server failed to actually move all written >> bytes to durable storage. >> >> Can you reproduce this issue if you use a more well trodden IO system, >> e.g. Linux with ext4 on a local IO device? >> >> Mike McCandless >> >> http://blog.mikemccandless.com >> >> On Fri, Sep 23, 2016 at 12:00 AM, Ziming Dong <dzm1016397...@gmail.com> >> wrote: >> > I use the macmini on NFS server side. It seems mount option sync is >> > useless, just slows down the index program. >> > >> > On Fri, Sep 23, 2016 at 4:43 AM, Michael McCandless >> > <luc...@mikemccandless.com> wrote: >> >> >> >> OK sorry I meant your first index, and it seems to have only one >> >> (broken) segments file. Can you post the "ls -l" output of that first >> >> index? It looks like the file was (illegally) filled with 0s, or at >> >> least the first 4 bytes were. >> >> >> >> Lucene writes this file, fsyncs it, does an atomic rename, and fsyncs >> >> the directory, so this should not happen, if your IO system honors >> >> fsync. >> >> >> >> What IO devices are used by the NFS server? >> >> >> >> NFS is not well tested and has several known problems with Lucene so >> >> this is already risky ground... >> >> >> >> Mike McCandless >> >> >> >> http://blog.mikemccandless.com >> >> >> >> On Thu, Sep 22, 2016 at 11:33 AM, Ziming Dong <dzm1016397...@gmail.com >> > >> >> wrote: >> >> > second index is recovered by checkIndex, I don't know what are in >> second >> >> > index directory before recover. >> >> > checkIndex can't read first index. index filenames are attached. >> >> > I use lucene6.0.0 at the beginning, then I upgrade to lucene6.1.0 to >> >> > continue index. >> >> > >> >> > On Thu, Sep 22, 2016 at 10:17 PM, Michael McCandless >> >> > <luc...@mikemccandless.com> wrote: >> >> >> >> >> >> Do you have 2 separate segments files in that 2nd index? >> >> >> >> >> >> Which exact Lucene version is this? >> >> >> >> >> >> Mike McCandless >> >> >> >> >> >> http://blog.mikemccandless.com >> >> >> >> >> >> >> >> >> On Thu, Sep 22, 2016 at 7:44 AM, Ziming Dong < >> dzm1016397...@gmail.com> >> >> >> wrote: >> >> >> > I used checkIndex to recover second index though I lost many docs >> in >> >> >> > index, >> >> >> > but first index can't be read by checkIndex, error is >> >> >> > >> >> >> >> java -cp lucene-core-6.1.0.jar -ea:org.apache.lucene... >> >> >> >> org.apache.lucene.index.CheckIndex >> >> >> >> /Volumes/HPT8_56T/infomall-index/index0 >> >> >> >> Opening index @ /Volumes/HPT8_56T/infomall-index/index0 >> >> >> >> ERROR: could not read any segments file in directory >> >> >> >> org.apache.lucene.index.IndexFormatTooOldException: Format >> version >> >> >> >> is >> >> >> >> not >> >> >> >> supported (resource >> >> >> >> >> >> >> >> >> >> >> >> BufferedChecksumIndexInput(MMapIndexInput(path="/Volumes/HPT >> 8_56T/infomall-index/index0/segments_5t3"))): >> >> >> >> 0 (needs to be between 1071082519 and 1071082519). This version >> of >> >> >> >> Lucene >> >> >> >> only supports indexes created with release 5.0 and later. >> >> >> >> at >> >> >> >> >> >> >> >> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos >> .java:295) >> >> >> >> at >> >> >> >> >> >> >> >> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos >> .java:284) >> >> >> >> at >> >> >> >> org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex. >> java:507) >> >> >> >> at >> >> >> >> org.apache.lucene.index.CheckIndex.doCheck(CheckIndex.java:2595) >> >> >> >> at >> >> >> >> org.apache.lucene.index.CheckIndex.doMain(CheckIndex.java:2497) >> >> >> >> at >> >> >> >> org.apache.lucene.index.CheckIndex.main(CheckIndex.java:2423) >> >> >> > >> >> >> > >> >> >> > I use NFS, but I set mount option as mount -t nfs -o >> >> >> > tcp,sync,retrans=10 >> >> >> > The index program has run 1 month without any problem before power >> >> >> > failure. >> >> >> > >> >> >> > On Thu, Sep 22, 2016 at 6:06 PM, Michael McCandless >> >> >> > <luc...@mikemccandless.com> wrote: >> >> >> >> >> >> >> >> Hmm I'm no longer so sure this is an IW bug: on commit we fsync >> the >> >> >> >> pending_segments_N and then do an atomic rename to segments_N. >> >> >> >> >> >> >> >> Can you describe your IO system? Is it possible it does not >> >> >> >> implement >> >> >> >> fsync or atomic renames correctly? >> >> >> >> >> >> >> >> Also, your 2nd exception indices the segments_N file was intact >> but >> >> >> >> the .cfs file was corrupt, which is also hard to explain unless >> >> >> >> fsync >> >> >> >> isn't working on your IO system. >> >> >> >> >> >> >> >> Mike McCandless >> >> >> >> >> >> >> >> http://blog.mikemccandless.com >> >> >> >> >> >> >> >> On Thu, Sep 22, 2016 at 5:10 AM, Michael McCandless >> >> >> >> <luc...@mikemccandless.com> wrote: >> >> >> >> > Sorry for the slow reply here. Curious that both of these >> >> >> >> > exceptions >> >> >> >> > are from IW.init. I think this may be a real bug, caused by >> this: >> >> >> >> > >> >> >> >> > >> >> >> >> > >> >> >> >> > https://github.com/apache/lucene-solr/commit/981bfba841144d0 >> 8df1d1a183d39fcd6f195ad56 >> >> >> >> > >> >> >> >> > I'll see if I can make a standalone test case showing this. >> >> >> >> > >> >> >> >> > If you open those indices with an IndexReader instead, does it >> >> >> >> > succeed? >> >> >> >> > >> >> >> >> > If you run CheckIndex, what does it report? >> >> >> >> > >> >> >> >> > Mike McCandless >> >> >> >> > >> >> >> >> > http://blog.mikemccandless.com >> >> >> >> > >> >> >> >> > On Wed, Sep 14, 2016 at 1:22 AM, Ziming Dong >> >> >> >> > <dzm1016397...@gmail.com> >> >> >> >> > wrote: >> >> >> >> >> I have 6 machine and 6 index directories, each machine builds >> >> >> >> >> index >> >> >> >> >> into >> >> >> >> >> one index directory. After power failure last night, two of >> those >> >> >> >> >> machine >> >> >> >> >> can't start index program. >> >> >> >> >> >> >> >> >> >> one error is >> >> >> >> >> >> >> >> >> >>> INFO: 2016-09-14 12:31:38 [main] >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.InfomallIndexer$Builder:ignoreCollectionsF >> ile(227): >> >> >> >> >>> Loaded 2146 ignored collections from >> >> >> >> >>> /mnt/HPT8_56T/infomall-index/index0/ignored_collections.txt >> >> >> >> >>> ERROR: 2016-09-14 12:31:39 [main] >> >> >> >> >>> sewm.bdbox.util.LogUtil:error(71): >> >> >> >> >>> org.apache.lucene.index.IndexFormatTooOldException: Format >> >> >> >> >>> version >> >> >> >> >>> is >> >> >> >> >>> not >> >> >> >> >>> supported (resource >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> BufferedChecksumIndexInput(MMapIndexInput(path="/mnt/HPT8_ >> 56T/infomall-index/index0/segments_5t3"))): >> >> >> >> >>> 0 (needs to be between 1071082519 and 1071082519). This >> version >> >> >> >> >>> of >> >> >> >> >>> Lucene >> >> >> >> >>> only supports indexes created with release 5.0 and later. >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos >> .java:295) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> org.apache.lucene.index.SegmentInfos.readCommit(SegmentInfos >> .java:284) >> >> >> >> >>> at >> >> >> >> >>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java: >> 910) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.InfomallIndexer.<init>(InfomallIndexer. >> java:60) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init>( >> ThreadedInfomallIndexer.java:28) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init>( >> ThreadedInfomallIndexer.java:21) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer$Builder.build(Thre >> adedInfomallIndexer.java:72) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.main(ThreadedInfom >> allIndexer.java:129) >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> another is >> >> >> >> >> >> >> >> >> >> INFO: 2016-09-14 01:11:06 [main] >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.InfomallIndexer$Builder:ignoreCollectionsF >> ile(227): >> >> >> >> >>> Loaded 8575 ignored collections from >> >> >> >> >>> /mnt/HPT8/infomall-index/index5/ignored_collections.txt >> >> >> >> >>> ERROR: 2016-09-14 01:11:09 [main] >> >> >> >> >>> sewm.bdbox.util.LogUtil:error(71): >> >> >> >> >>> org.apache.lucene.index.CorruptIndexException: codec footer >> >> >> >> >>> mismatch >> >> >> >> >>> (file >> >> >> >> >>> truncated?): actual footer=0 vs expected footer=-1071082520 >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> (resource=MMapIndexInput(path="/mnt/HPT8/infomall-index/inde >> x5/_1kqn.cfs")) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> org.apache.lucene.codecs.CodecUtil.validateFooter(CodecUtil. >> java:448) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> org.apache.lucene.codecs.CodecUtil.retrieveChecksum(CodecUti >> l.java:433) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> org.apache.lucene.codecs.lucene50.Lucene50CompoundReader.< >> init>(Lucene50CompoundReader.java:86) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> org.apache.lucene.codecs.lucene50.Lucene50CompoundFormat.get >> CompoundReader(Lucene50CompoundFormat.java:71) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> org.apache.lucene.index.IndexWriter.readFieldInfos(IndexWrit >> er.java:1016) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> org.apache.lucene.index.IndexWriter.getFieldNumberMap(IndexW >> riter.java:1033) >> >> >> >> >>> at >> >> >> >> >>> org.apache.lucene.index.IndexWriter.<init>(IndexWriter.java: >> 938) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.InfomallIndexer.<init>(InfomallIndexer. >> java:60) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init>( >> ThreadedInfomallIndexer.java:28) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.<init>( >> ThreadedInfomallIndexer.java:21) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer$Builder.build(Thre >> adedInfomallIndexer.java:72) >> >> >> >> >>> at >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> >> >> >> >> >>> sewm.bdbox.search.ThreadedInfomallIndexer.main(ThreadedInfom >> allIndexer.java:129) >> >> >> >> >>> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> it seems 1071082519 is a special number. >> >> >> >> >> >> >> >> >> >> - - >> >> >> >> >> >> >> >> >> >> Ziming Dong >> >> >> >> >> *http://suiyuan2009.github.io/ <http://suiyuan2009.github.io/ >> >* >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > -- >> >> >> > >> >> >> > Ziming Dong >> >> >> > http://suiyuan2009.github.io/ >> >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > -- >> >> > >> >> > Ziming Dong >> >> > http://suiyuan2009.github.io/ >> >> > >> > >> > >> > >> > >> > -- >> > >> > Ziming Dong >> > http://suiyuan2009.github.io/ >> > >> > > > > -- > > Ziming Dong > *http://suiyuan2009.github.io/ <http://suiyuan2009.github.io/>* > > -- Ziming Dong *http://suiyuan2009.github.io/ <http://suiyuan2009.github.io/>*