Hi, The checksum is also written for a second reason: Java VMs often have optimization bugs (you may know the Java 7 GA disaster and Java 7u40 vector optimization bugs that Lucene discovered). The checksums will often catch those bugs, too.
Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de > -----Original Message----- > From: Michael McCandless [mailto:luc...@mikemccandless.com] > Sent: Tuesday, December 6, 2016 12:30 PM > To: Duke DAI <duke.dai....@gmail.com> > Cc: Lucene Users <java-user@lucene.apache.org> > Subject: Re: Hardcoded checksum mechanism in BlockTreeTermsReader > > I see. Bits can also be flipped by the network as they are travelling > to/from the DB. The end to end checksum Lucene does now would catch > that. > > Anyway, that BlockTree index file that is being entirely checksummed > is a very small file. And, using the first pattern is not easy for it > because it needs to seek to the end to load its directory location, > and then seek back to that location to read each field's information. > Do you see a simple way to change it to the first pattern? > > Mike McCandless > > http://blog.mikemccandless.com > > > On Tue, Dec 6, 2016 at 6:00 AM, Duke DAI <duke.dai....@gmail.com> > wrote: > > Thanks for your quick response, Mike. > > > > Database has its own raw page management over OS page management, > and most > > likely database has its own checksum on page level, that's why I want to > > avoid checksum in Lucene Directory level. > > > > Certainly checksum is good, I like the pattern(rewrite openChecksumInput > > according to real case): > > inputStream = directory.openChecksumInput(...); > > // at the end check checksum, as by-product > > CodecUtil.checkFooter(...) > > > > But I do not like the pattern: > > CodecUtil.checksumEntireFile(..), its purpose is pure checksum via reading > > all data, not the by-product. > > If the design/API is pluggable with default way, it'll be good enough for > > various scenario. > > > > > > > > > > Best regards, > > Duke > > If not now, when? If not me, who? > > > > On Tue, Dec 6, 2016 at 6:36 PM, Michael McCandless > > <luc...@mikemccandless.com> wrote: > >> > >> We have learned over time not to trust the underlying store to > >> correctly record the bytes we wrote to it. > >> > >> This is why checksumming is very strongly built into Lucene at this > >> point. If you disable checksumming, when bits do flip, you get exotic > >> exceptions at search time that might look like Lucene bugs and can > >> cost a lot of time to explain. > >> > >> It's not just the BlockTreeTermsReader; many other codec components > >> check the checksum with CodecUtil.checkFooter at search time. > >> > >> Can you explain why it's necessary to remove it for your database > >> files based Directory? > >> > >> Mike McCandless > >> > >> http://blog.mikemccandless.com > >> > >> > >> On Tue, Dec 6, 2016 at 5:25 AM, Duke DAI <duke.dai....@gmail.com> > wrote: > >> > Hi all, > >> > > >> > I'm customizing Lucene Directory, which extends o.a.l.store.Directory > >> > based > >> > on database files. I do not need checksum again on IndexIndex and > >> > IndexOutput. > >> > > >> > But in BlockTreeTermsReader constructor, following code open a > >> > hard-coded BufferedChecksumIndexInput to checksum on raw > IndexInput. I > >> > have > >> > to use CRC32 on IndexOutput to make through it. Is there any more > >> > graceful > >> > way to do checksum, such as let Directory construct a checksum instance > >> > instead of API Directory.openChecksumInput ? > >> > > >> > > >> > String indexName = IndexFileNames.segmentFileName(segment, > >> > state.segmentSuffix, TERMS_INDEX_EXTENSION); > >> > indexIn = state.directory.openInput(indexName, state.context); > >> > CodecUtil.checkIndexHeader(indexIn, TERMS_INDEX_CODEC_NAME, > >> > version, > >> > version, state.segmentInfo.getId(), state.segmentSuffix); > >> > CodecUtil.checksumEntireFile(indexIn); > >> > > >> > > >> > > >> > > >> > Best regards, > >> > Duke > >> > If not now, when? If not me, who? > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org