I agree that comparing the BytesRef lengths in an equals() method seems counter 
to the purpose of having a BytesRef class. 

I'd recommend taking a look at the BytesRefHash which maps BytesRef objects to 
unique ids as it 'may' be more efficient than converting to Strings. 

Stuart


-----Original Message-----
From: Yann-Erwan Perio [mailto:ye.pe...@gmail.com] 
Sent: Tuesday, January 21, 2014 7:33 AM
To: java-user@lucene.apache.org
Subject: BytesRef equals() method

Hello,

I have been working a bit with BytesRef recently, and I wonder whether the 
content of the equals() method, and more specifically the content of the 
bytesEquals(BytesRef other) method, is the intended one.

Here is my use case. I work with Lucene 4.6.0. During indexing, using a custom 
tokenizer, I have added some payloads onto some tokens. Using an extension of 
the Default Similarity, I was then able to retrieve these payloads, passing 
them to a collector of mine, so as to perform aggregation calculations. It 
occurred to me that the BytesRef retrieved were not exactly the same as the 
indexed - namely their real content was the same, but their offsets would 
differ.

I was made aware of this because I used a Map<BytesRef, ...> in the collector, 
and the map would sometimes give inconsistent results.
Checking out the source code, the hashcode() method looks valid to me, but the 
bytesEquals() method looks strange - because prior to comparing the real value 
of the BytesRef, it checks their lengths - and AIUI these may differ, even 
though BytesRef are logically equal.

I am not familiar at all with the internals of Lucene (this includes the 
BytesRef mechanics), so I may be completely wrong here. FWIW, I solved my 
problem by creating fresh BytesRef from the ones sent by the similarity, using 
the copyBytes method. I could also have used the string representation of the 
BytesRef, but this appears to be slower than copying the bytes, by a magnitude 
of about 2.5.

Kind regards.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org

Reply via email to