Hey Jae, The rationale for switching was to use a hash code that is cross language and not dependent on the particular object. There are all kinds of gotchas with Java's hashCode() as a partition assignment strategy (e.g. two byte arrays with the same bytes will have different hash codes).
-Jay On Wed, Sep 17, 2014 at 11:00 AM, Bae, Jae Hyeon <metac...@gmail.com> wrote: > The major motivation of adopting new producer before it's released, old > producer is showing terrible throughput of cross-regional kafka mirroring > in EC2. > > Let me share numbers. > > Using iperf, network bandwidth between us-west-2 AWS EC2 and us-east-1 AWS > EC2 is more than 40 MB/sec. But old producer's throughput is less than 3 > MB/sec. > > start.timeend.timecompressionmessage.sizebatch.sizetotal.data.sent.in.MB > MB.sectotal.data.sent.in.nMsgnMsg.sec2014-09-16 20:22:25:5372014-09-16 > 20:24:13:13823000200286.102.6589100000929.3594 > > Even though we increased the socket send buffer on the producer side and > recv buffer on the broker side, it didn't work. > send.buffer.bytes: 8388608 > start.timeend.timecompressionmessage.sizebatch.sizetotal.data.sent.in.MB > MB.sectotal.data.sent.in.nMsgnMsg.sec2014-09-16 20:48:49:5882014-09-16 > 20:50:03:00623000200286.103.89691000001362.0638 > > But new producer which is not released yet is showing significant > performance improvement. Its performance is more than 30MB/sec. > start.timeend.timecompressionmessage.sizebatch.sizetotal.data.sent.in.MB > MB.sectotal.data.sent.in.nMsgnMsg.sec2014-09-16 20:50:31:7202014-09-16 > 20:50:41:24123000200286.1030.049610000010503.098 > I was excited about new producer's performance but its partitioning logic > is different. > > Without partition number in ProducerRecord, its partitioning logic is based > on murmur2 hash key. But in the old partitioner, partitioning logic is > based on key.hashCode. > > Could you make them same logic? Otherwise, I have to change implementation > of kafka producer container.