On Tue, Jan 21, 2014 at 7:54 PM, Steven Schlansker <ste...@likeness.com> wrote:

Hi,

Firstly, thanks to all of you for your insights.

> How can two byte arrays be equal if they have different lengths?
> Same way as two Strings with differing lengths can never be equal, two
> byte arrays with different lengths will never be equivalent.

Indeed. As Michael pointed out, I happened to have a misunderstanding
in what "length" meant in the code. Thanks for clearing that!

> copyBytes doesn’t change the length of the BytesRef, so two unequal BytesRef
> instances cannot become equal solely through a copyBytes call, by my reading?

Certainly, but my problem still persists if I do not do it. I spent
the whole night debugging the code, to no avail. As a matter of fact,
when I run a series of tests on my application, the following happens
about once out of ten times (this is the resulting log of some sysout
calls):

Payload: toString=Nc6, bytes=[4e 63 36], offset=3, length=3, hashcode=78081
map.keySet()=[[4e 63 36]]
Now testing map.contains(payload)
map.contains(payload)==false
Now testing map.isEmpty()
map.isEmpty()==false
Map is not empty. Manually iterating keys.
Key n°1: toString=Nc6, bytes=[4e 63 36], offset=3, length=3, hashcode=78081
Verifying key.equals(payload)==true
Verifying map.containsKey(payload)==false
Verifying map.containsKey(key)==false

As you can see, the map provides the key I am looking for, but it
cannot identify it back! Going through the HashMap data structure, it
was indeed assigned a different hashCode (73787).

I do not understand how this could happen. I thought that there was
maybe a concurrency issue with the payload itself - as if it were
reused in concurrent scoring processes (I use the payload sent back by
DefaultSimilarity) - but the faulty hashCode, as far as I can see,
should not be generated by my test data set.

I'll try looking again at the code with fresh eyes, but in the
meanwhile, do not hesitate to tell me if this makes sense to you.

> Not all bytes are valid representations of Strings, so don’t do this unless
> you are very sure you are dealing with character data and know the encoding.

This would not be a problem in my use case, as the provided text is
generated by the application, and uses only certain ASCII chars.

> What differently-sized byte arrays would you expect to compare as equals?

Arrays that would contain an equal slice of values (the logical value)
- one would discard some leading bits, of various length, considered
as technical (junk). This is how I understood the BytesRef structure.

Kind regards.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to