Bytes seem the same but checksums differ

2009-08-14 Thread Bryan Swift
I'm attempting to create a Directory implementation (lucene-core 2.4.1) to
sit on top of Google's App Engine Datastore (written in Scala). In the
process of doing this I found something odd for which I'm hoping there is a
relatively simple solution.
When instantiating a new IndexWriter with my Directory implementation (which
uses Datastore based IndexInput and IndexOutput classes) the checksum of the
segments file (segments_1 because there is nothing in the index yet) varies
when calculated by ChecksumIndexOutput vs ChecksumIndexInput. Of course,
this causes a CorruptIndexException to be thrown at line 248. The
interesting thing is the array of bytes being written by my
DatastoreIndexOutput is the same array of bytes being ready by
DatastoreIndexInput. I've also noticed the difference between the checksums
is consistently that checksumThen (line 246 in SegmentInfos) is one less
than checksumNow (line 245 in SegmentInfos).

In an attempt to gain further information about this problem I added CRC32
objects to my IndexInput and IndexOutput definitions to in order to peak in
on their values while debugging and it seems the IndexInput and IndexOutput
classes I defined have the same checksum after reading and writing all the
bits.

The source for my implementation to this point can be found at
http://github.com/bryanjswift/quotidian/tree/search-checksums under
src/main/scala/quotidian/search

Any insight or assistance would be very much appreciated at this point.

Cheers.
Bryan


Re: Bytes seem the same but checksums differ

2009-08-14 Thread Bryan Swift
Right, so after looking at what was happening in SegmentInfos again I
noticed I was saving to the Datastore on IndexOutput.flush but not on close.
Persisting the file on close solved this particular problem.
Sorry about that.

On Sat, Aug 15, 2009 at 1:03 AM, Bryan Swift wrote:

> I'm attempting to create a Directory implementation (lucene-core 2.4.1) to
> sit on top of Google's App Engine Datastore (written in Scala). In the
> process of doing this I found something odd for which I'm hoping there is a
> relatively simple solution.
> When instantiating a new IndexWriter with my Directory implementation
> (which uses Datastore based IndexInput and IndexOutput classes) the checksum
> of the segments file (segments_1 because there is nothing in the index yet)
> varies when calculated by ChecksumIndexOutput vs ChecksumIndexInput. Of
> course, this causes a CorruptIndexException to be thrown at line 248. The
> interesting thing is the array of bytes being written by my
> DatastoreIndexOutput is the same array of bytes being ready by
> DatastoreIndexInput. I've also noticed the difference between the checksums
> is consistently that checksumThen (line 246 in SegmentInfos) is one less
> than checksumNow (line 245 in SegmentInfos).
>
> In an attempt to gain further information about this problem I added CRC32
> objects to my IndexInput and IndexOutput definitions to in order to peak in
> on their values while debugging and it seems the IndexInput and IndexOutput
> classes I defined have the same checksum after reading and writing all the
> bits.
>
> The source for my implementation to this point can be found at
> http://github.com/bryanjswift/quotidian/tree/search-checksums under
> src/main/scala/quotidian/search
>
> Any insight or assistance would be very much appreciated at this point.
>
> Cheers.
> Bryan
>


Lucene-Core test failures

2009-08-18 Thread Bryan Swift
I was running the tests which lucene-core version 2.4 and I noticed a  
failure in org.apache.lucene.index.TestIndexInput for testRead at line  
89. The assertions in question have to do with reading "Modified UTF-8  
null bytes" according to the comments in the file. It seems these  
modified null bytes are not being retrieved properly.


This failure is of particular interest to me because I am getting a  
similar failure in the specs for the implementation of IndexInput I  
am  working on for the App Engine Datastore. I was wondering if this  
test failure is something to be concerned with or if it could be  
ignored/removed with relative safety?


Cheers.
Bryan

-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org