Bytes seem the same but checksums differ
I'm attempting to create a Directory implementation (lucene-core 2.4.1) to sit on top of Google's App Engine Datastore (written in Scala). In the process of doing this I found something odd for which I'm hoping there is a relatively simple solution. When instantiating a new IndexWriter with my Directory implementation (which uses Datastore based IndexInput and IndexOutput classes) the checksum of the segments file (segments_1 because there is nothing in the index yet) varies when calculated by ChecksumIndexOutput vs ChecksumIndexInput. Of course, this causes a CorruptIndexException to be thrown at line 248. The interesting thing is the array of bytes being written by my DatastoreIndexOutput is the same array of bytes being ready by DatastoreIndexInput. I've also noticed the difference between the checksums is consistently that checksumThen (line 246 in SegmentInfos) is one less than checksumNow (line 245 in SegmentInfos). In an attempt to gain further information about this problem I added CRC32 objects to my IndexInput and IndexOutput definitions to in order to peak in on their values while debugging and it seems the IndexInput and IndexOutput classes I defined have the same checksum after reading and writing all the bits. The source for my implementation to this point can be found at http://github.com/bryanjswift/quotidian/tree/search-checksums under src/main/scala/quotidian/search Any insight or assistance would be very much appreciated at this point. Cheers. Bryan
Re: Bytes seem the same but checksums differ
Right, so after looking at what was happening in SegmentInfos again I noticed I was saving to the Datastore on IndexOutput.flush but not on close. Persisting the file on close solved this particular problem. Sorry about that. On Sat, Aug 15, 2009 at 1:03 AM, Bryan Swift wrote: > I'm attempting to create a Directory implementation (lucene-core 2.4.1) to > sit on top of Google's App Engine Datastore (written in Scala). In the > process of doing this I found something odd for which I'm hoping there is a > relatively simple solution. > When instantiating a new IndexWriter with my Directory implementation > (which uses Datastore based IndexInput and IndexOutput classes) the checksum > of the segments file (segments_1 because there is nothing in the index yet) > varies when calculated by ChecksumIndexOutput vs ChecksumIndexInput. Of > course, this causes a CorruptIndexException to be thrown at line 248. The > interesting thing is the array of bytes being written by my > DatastoreIndexOutput is the same array of bytes being ready by > DatastoreIndexInput. I've also noticed the difference between the checksums > is consistently that checksumThen (line 246 in SegmentInfos) is one less > than checksumNow (line 245 in SegmentInfos). > > In an attempt to gain further information about this problem I added CRC32 > objects to my IndexInput and IndexOutput definitions to in order to peak in > on their values while debugging and it seems the IndexInput and IndexOutput > classes I defined have the same checksum after reading and writing all the > bits. > > The source for my implementation to this point can be found at > http://github.com/bryanjswift/quotidian/tree/search-checksums under > src/main/scala/quotidian/search > > Any insight or assistance would be very much appreciated at this point. > > Cheers. > Bryan >
Lucene-Core test failures
I was running the tests which lucene-core version 2.4 and I noticed a failure in org.apache.lucene.index.TestIndexInput for testRead at line 89. The assertions in question have to do with reading "Modified UTF-8 null bytes" according to the comments in the file. It seems these modified null bytes are not being retrieved properly. This failure is of particular interest to me because I am getting a similar failure in the specs for the implementation of IndexInput I am working on for the App Engine Datastore. I was wondering if this test failure is something to be concerned with or if it could be ignored/removed with relative safety? Cheers. Bryan - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org