Hey Jae,

The rationale for switching was to use a hash code that is cross
language and not dependent on the particular object. There are all
kinds of gotchas with Java's hashCode() as a partition assignment
strategy (e.g. two byte arrays with the same bytes will have different
hash codes).

-Jay

On Wed, Sep 17, 2014 at 11:00 AM, Bae, Jae Hyeon <metac...@gmail.com> wrote:
> The major motivation of adopting new producer before it's released, old
> producer is showing terrible throughput of cross-regional kafka mirroring
> in EC2.
>
> Let me share numbers.
>
> Using iperf, network bandwidth between us-west-2 AWS EC2 and us-east-1 AWS
> EC2 is more than 40 MB/sec. But old producer's throughput is less than 3
> MB/sec.
>
> start.timeend.timecompressionmessage.sizebatch.sizetotal.data.sent.in.MB
> MB.sectotal.data.sent.in.nMsgnMsg.sec2014-09-16 20:22:25:5372014-09-16
> 20:24:13:13823000200286.102.6589100000929.3594
>
> Even though we increased the socket send buffer on the producer side and
> recv buffer on the broker side, it didn't work.
> send.buffer.bytes: 8388608
> start.timeend.timecompressionmessage.sizebatch.sizetotal.data.sent.in.MB
> MB.sectotal.data.sent.in.nMsgnMsg.sec2014-09-16 20:48:49:5882014-09-16
> 20:50:03:00623000200286.103.89691000001362.0638
>
> But new producer which is not released yet is showing significant
> performance improvement. Its performance is more than 30MB/sec.
> start.timeend.timecompressionmessage.sizebatch.sizetotal.data.sent.in.MB
> MB.sectotal.data.sent.in.nMsgnMsg.sec2014-09-16 20:50:31:7202014-09-16
> 20:50:41:24123000200286.1030.049610000010503.098
> I was excited about new producer's performance but its partitioning logic
> is different.
>
> Without partition number in ProducerRecord, its partitioning logic is based
> on murmur2 hash key. But in the old partitioner, partitioning logic is
> based on key.hashCode.
>
> Could you make them same logic? Otherwise, I have to change implementation
> of kafka producer container.

Reply via email to