On Tue, Jan 21, 2014 at 7:54 PM, Steven Schlansker <ste...@likeness.com> wrote:
Hi, Firstly, thanks to all of you for your insights. > How can two byte arrays be equal if they have different lengths? > Same way as two Strings with differing lengths can never be equal, two > byte arrays with different lengths will never be equivalent. Indeed. As Michael pointed out, I happened to have a misunderstanding in what "length" meant in the code. Thanks for clearing that! > copyBytes doesn’t change the length of the BytesRef, so two unequal BytesRef > instances cannot become equal solely through a copyBytes call, by my reading? Certainly, but my problem still persists if I do not do it. I spent the whole night debugging the code, to no avail. As a matter of fact, when I run a series of tests on my application, the following happens about once out of ten times (this is the resulting log of some sysout calls): Payload: toString=Nc6, bytes=[4e 63 36], offset=3, length=3, hashcode=78081 map.keySet()=[[4e 63 36]] Now testing map.contains(payload) map.contains(payload)==false Now testing map.isEmpty() map.isEmpty()==false Map is not empty. Manually iterating keys. Key n°1: toString=Nc6, bytes=[4e 63 36], offset=3, length=3, hashcode=78081 Verifying key.equals(payload)==true Verifying map.containsKey(payload)==false Verifying map.containsKey(key)==false As you can see, the map provides the key I am looking for, but it cannot identify it back! Going through the HashMap data structure, it was indeed assigned a different hashCode (73787). I do not understand how this could happen. I thought that there was maybe a concurrency issue with the payload itself - as if it were reused in concurrent scoring processes (I use the payload sent back by DefaultSimilarity) - but the faulty hashCode, as far as I can see, should not be generated by my test data set. I'll try looking again at the code with fresh eyes, but in the meanwhile, do not hesitate to tell me if this makes sense to you. > Not all bytes are valid representations of Strings, so don’t do this unless > you are very sure you are dealing with character data and know the encoding. This would not be a problem in my use case, as the provided text is generated by the application, and uses only certain ASCII chars. > What differently-sized byte arrays would you expect to compare as equals? Arrays that would contain an equal slice of values (the logical value) - one would discard some leading bits, of various length, considered as technical (junk). This is how I understood the BytesRef structure. Kind regards. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org