[ https://issues.apache.org/jira/browse/KAFKA-3499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15235799#comment-15235799 ]
Michael Noll edited comment on KAFKA-3499 at 4/11/16 8:05 PM: -------------------------------------------------------------- FWIW, we ran into the same problem when handling byte[] in Twitter Algebird. Back then I introduced a custom [Bytes|https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/Bytes.scala] wrapper for byte arrays ([original PR|https://github.com/twitter/algebird/pull/399/files]), which happens to use java.nio.ByteBuffer. The Bytes.scala code might be a good starting point; it includes sane implementations of hashCode, ordering/compare, equals, etc. Note that, by design (performance reasons), this wrapper is not enforcing immutability. See the javadocs in the source link above for details. was (Author: miguno): FWIW, we ran into the same problem when handling byte[] in Twitter Algebird. Back then I introduced a custom [Bytes|https://github.com/twitter/algebird/blob/develop/algebird-core/src/main/scala/com/twitter/algebird/Bytes.scala] wrapper for byte arrays ([original PR|https://github.com/twitter/algebird/pull/399/files]), which happens to use java.nio.ByteBuffer. The Bytes.scala code might be a good starting point; it includes a sane implementations of hashCode, ordering/compare, equals, etc. Note that, by design (performance reasons), this wrapper is not enforcing immutability. See the javadocs in the source link above for details. > byte[] should not be used as Map key nor Set member > --------------------------------------------------- > > Key: KAFKA-3499 > URL: https://issues.apache.org/jira/browse/KAFKA-3499 > Project: Kafka > Issue Type: Sub-task > Components: streams > Reporter: josh gruenberg > Labels: user-experience > Fix For: 0.10.0.0 > > > On the JVM, Array.equals and Array.hashCode do not incorporate array > contents; they inherit Object.equals/hashCode. This implies that Collections > that rely upon equals/hashCode (eg, HashMap/HashSet and variants) treat two > arrays with equal contents as distinct elements. > Many of the Kafka Streams internal classes currently use generic HashMaps and > Sets to manage caches and invalidation status. For example, > RocksDBStore.cacheDirtyKeys is a HashSet<K>. Then, in RocksDBWindowStore, the > Elements are constructed as RocksDBStore<byte[], byte[]>. > Similarly, the MemoryLRUCache<K, RocksDBCacheEntry> internally holds a > LinkedHashMap<K,V> map, and a HashSet<K> keys, and these end up holding > byte[] keys. Finally, user-code may attempt to use any of these provided > types with byte[], with undesirable results. > Keys that are byte-arrays should be wrapped in a type that incorporates the > content in their computation of equals/hashCode. java.nio.ByteBuffer is one > such type that could be used, but a purpose-built immutable class would > likely be a better solution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)