Definitely +1 for advertising this in the docs. What I can't figure out is the upgrade path... if my application assumes that all data for a single user is in one partition (so it subscribes to a single partition and expects everything about a specific subset of users to be in that partition), this assumption will not survive an upgrade to 0.8.2.X. I think the assumption of stable hash partitions even after upgrades is pretty reasonable (i.e. I made it about gazillion times without thinking twice). Note that in this story my app wasn't even upgraded - it broke because a producer upgraded to a new API.
If we advertise: "upgrading to the new producer API may break consumers", we may need to offer a work-around to allow people to upgrade producers anyway. Perhaps we can say "wait for Sriharsha's partitioner patch and write a custom partitioner that uses hashcode()". Gwen On Sun, Apr 26, 2015 at 7:57 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > This was actually intentional. > > The problem with relying on hashCode is that > (1) it is often a very bad hash function, > (2) it is not guaranteed to be consistent from run to run (i.e. if you > restart the jvm the value of hashing the same key can change!), > (3) it is not available outside the jvm so non-java producers can't use the > same function. > > In general at the moment different producers don't use the same hash code > so I think this is not quite as bad as it sounds. Though it would be good > to standardize things. > > I think the most obvious thing we could do here would be to do a much > better job of advertising this in the docs, though, so people don't get > bitten by it. > > -Jay > > On Fri, Apr 24, 2015 at 5:48 PM, James Cheng <jch...@tivo.com> wrote: > > > Hi, > > > > I was playing with the new producer in 0.8.2.1 using partition keys > > ("semantic partitioning" I believe is the phrase?). I noticed that the > > default partitioner in 0.8.2.1 does not partition items the same way as > the > > old 0.8.1.1 default partitioner was doing. For a test item, the old > > producer was sending it to partition 0, whereas the new producer was > > sending it to partition 4. > > > > Digging in the code, it appears that the partitioning logic is different > > between the old and new producers. Both of them hash the key, but they > use > > different hashing algorithms. > > > > Old partitioner: > > ./core/src/main/scala/kafka/producer/DefaultPartitioner.scala: > > > > def partition(key: Any, numPartitions: Int): Int = { > > Utils.abs(key.hashCode) % numPartitions > > } > > > > New partitioner: > > > > > ./clients/src/main/java/org/apache/kafka/clients/producer/internals/Partitioner.java: > > > > } else { > > // hash the key to choose a partition > > return Utils.abs(Utils.murmur2(record.key())) % > numPartitions; > > } > > > > Where murmur2 is a custom hashing algorithm. (I'm assuming that murmur2 > > isn't the same logic as hashCode, especially since hashCode is > > overrideable). > > > > Was it intentional that the hashing algorithm would change between the > old > > and new producer? If so, was this documented? I don't know if anyone was > > relying on the old default partitioner, as opposed to going round-robin > or > > using their own custom partitioner. Do you expect it to change in the > > future? I'm guessing that one of the main reasons to have a custom > hashing > > algorithm is so that you are full control of the partitioning and can > keep > > it stable (as opposed to being reliant on hashCode()). > > > > Thanks, > > -James > > > > >