So I’ve been looking at the SSL performance (which potentially also has an 
issue unrelated to this). I’d noticed strange behaviour with poll() on the new 
consumer but I wasn’t sure whether this was a bug or a feature. On closer 
inspection it seems to arise from ConsumerPerformance relying on a call to 
consumer.poll(100). Not all our tests do it this way i should add. 

If you change this to poll(1) you should see reasonable performance ensue (so 
for 1KB messages I see the performance locally jump from ~10MB/s to ~300MB/s 
which matches the old consumer).

Something like: 

   val records = consumer.poll(100)
   var records = consumer.poll(0)
        records = consumer.poll(1)

The problem appears to be that the client returns empty results even when there 
are messages waiting to be read. This then results in a sleep of the specified 
timeout and overall performance takes a hit.

 I’ll dig a little deeper but for now there appears to be a work around, so 
that’s something :)


> On 28 Aug 2015, at 07:30, Ewen Cheslack-Postava <> wrote:
> Tried bisecting, but turns out things were broken for some time. We really
> need some system tests in place to avoid letting even new code break for so
> long.
> At 49026f11781181c38e9d5edb634be9d27245c961 (May 14th), we went from good
> performance -> an error due to broker apparently not accepting the
> partition assignment strategy. Since this commit seems to add heartbeats
> and the server side code for partition assignment strategies, I assume we
> were missing something on the client side and by filling in the server
> side, things stopped working.
> On either 84636272422b6379d57d4c5ef68b156edc1c67f8 or
> a5b11886df8c7aad0548efd2c7c3dbc579232f03 (July 17th), I am able to run the
> perf test again, but it's slow -- ~10MB/s for me vs the 2MB/s Jay was
> seeing, but that's still far less than the 600MB/s I saw on the earlier
> commits.
> Added this to the new consumer checklist, marked for 0.8.3, and at least
> for now assigned to Jason since I think he'll probably be able to sort this
> out most quickly:
> -Ewen
> On Thu, Aug 27, 2015 at 8:03 PM, Guozhang Wang <> wrote:
>> 436b7ddc386eb688ba0f12836710f5e4bcaa06c8 is pretty recent and there could
>> be some current consumer improvement patches that introduces some
>> regression. I would suggest doing a binary search in the log from
>> 3f8480ccfb011eb43da774737597c597f703e11b
>> (maybe even earlier?) to do a quick check.
>> Guozhang
>> On Thu, Aug 27, 2015 at 4:39 PM, Jay Kreps <> wrote:
>>> I think this is likely a regression. The two clients had more or less
>>> equivalent performance when we checked in the code (see my post on this
>>> earlier in the year). Looks like maybe we broke something up in the
>>> interim?
>>> On my laptop the new consumer perf seems to have dropped from about
>>> ~200MB/sec to about 2MB/sec.
>>> -Jay
>>> On Thu, Aug 27, 2015 at 4:21 PM, Ewen Cheslack-Postava <
>>> wrote:
>>>> I don't think the commands are really equivalent despite just adding
>> the
>>>> --new-consumer flag. ConsumerPerformance uses a single thread when
>> using
>>>> the new consumer (it literally just allocates the consumer, loops until
>>>> it's consumed enough, then exits), whereas the old consumer uses a
>> bunch
>>> of
>>>> additional threads.
>>>> To really compare performance, someone would have to think through a
>> fair
>>>> way to compare them -- the two operate so differently that you'd have
>> to
>>> be
>>>> very careful to get an apples-to-apples comparison.
>>>> By the way, membership in consumer groups should be a lot cheaper with
>>> the
>>>> new consumer (the ZK coordination issues with lots of consumers aren't
>> a
>>>> problem since ZK is not involved), so you can probably scale up the
>>> number
>>>> of consumer threads with little impact. It might be nice to patch the
>>>> consumer perf test to respect the # of threads setting, which might be
>> a
>>>> first step to getting a more reasonable comparison.
>>>> -Ewen
>>>> On Thu, Aug 27, 2015 at 11:25 AM, Poorna Chandra Tejashvi Reddy <
>>>>> wrote:
>>>>> Hi,
>>>>> We have built the latest kafka from
>>>> based
>>>>> on this commit id 436b7ddc386eb688ba0f12836710f5e4bcaa06c8 .
>>>>> We ran the performance test on a 3 node kafka cluster. There is a
>> huge
>>>>> throughput degradation using the new-consumer compared to the regular
>>>>> consumer. Below are the numbers that explain the same.
>>>>> bin/ --zookeeper zkIp:2181 --broker-list
>>>>> brokerIp:9092 --topics test --messages 5000000 : gives a throughput
>> of
>>>> 693
>>>>> K
>>>>> bin/ --zookeeper zkIp:2181 --broker-list
>>>>> brokerIp:9092 --topics test --messages 5000000 --new-consumer :
>> gives a
>>>>> throughput of  51k
>>>>> The whole set up is based on ec2, Kafka brokers running on r3.2x
>> large.
>>>>> Are you guys aware of this performance degradation , do you have a
>>>> for
>>>>> this, which can be used to track the resolution.
>>>>> Thanks,
>>>>> -Poorna
>>>> --
>>>> Thanks,
>>>> Ewen
>> --
>> -- Guozhang
> -- 
> Thanks,
> Ewen

Reply via email to