So I’ve been looking at the SSL performance (which potentially also has an 
issue unrelated to this). I’d noticed strange behaviour with poll() on the new 
consumer but I wasn’t sure whether this was a bug or a feature. On closer 
inspection it seems to arise from ConsumerPerformance relying on a call to 
consumer.poll(100). Not all our tests do it this way i should add. 

If you change this to poll(1) you should see reasonable performance ensue (so 
for 1KB messages I see the performance locally jump from ~10MB/s to ~300MB/s 
which matches the old consumer).

Something like: 

   val records = consumer.poll(100)
   var records = consumer.poll(0)
      while(records.isEmpty) 
        records = consumer.poll(1)
      

The problem appears to be that the client returns empty results even when there 
are messages waiting to be read. This then results in a sleep of the specified 
timeout and overall performance takes a hit.

 I’ll dig a little deeper but for now there appears to be a work around, so 
that’s something :)

B



> On 28 Aug 2015, at 07:30, Ewen Cheslack-Postava <e...@confluent.io> wrote:
> 
> Tried bisecting, but turns out things were broken for some time. We really
> need some system tests in place to avoid letting even new code break for so
> long.
> 
> At 49026f11781181c38e9d5edb634be9d27245c961 (May 14th), we went from good
> performance -> an error due to broker apparently not accepting the
> partition assignment strategy. Since this commit seems to add heartbeats
> and the server side code for partition assignment strategies, I assume we
> were missing something on the client side and by filling in the server
> side, things stopped working.
> 
> On either 84636272422b6379d57d4c5ef68b156edc1c67f8 or
> a5b11886df8c7aad0548efd2c7c3dbc579232f03 (July 17th), I am able to run the
> perf test again, but it's slow -- ~10MB/s for me vs the 2MB/s Jay was
> seeing, but that's still far less than the 600MB/s I saw on the earlier
> commits.
> 
> Added this to the new consumer checklist, marked for 0.8.3, and at least
> for now assigned to Jason since I think he'll probably be able to sort this
> out most quickly: https://issues.apache.org/jira/browse/KAFKA-2486
> 
> -Ewen
> 
> 
> On Thu, Aug 27, 2015 at 8:03 PM, Guozhang Wang <wangg...@gmail.com> wrote:
> 
>> 436b7ddc386eb688ba0f12836710f5e4bcaa06c8 is pretty recent and there could
>> be some current consumer improvement patches that introduces some
>> regression. I would suggest doing a binary search in the log from
>> 3f8480ccfb011eb43da774737597c597f703e11b
>> (maybe even earlier?) to do a quick check.
>> 
>> Guozhang
>> 
>> On Thu, Aug 27, 2015 at 4:39 PM, Jay Kreps <j...@confluent.io> wrote:
>> 
>>> I think this is likely a regression. The two clients had more or less
>>> equivalent performance when we checked in the code (see my post on this
>>> earlier in the year). Looks like maybe we broke something up in the
>>> interim?
>>> 
>>> On my laptop the new consumer perf seems to have dropped from about
>>> ~200MB/sec to about 2MB/sec.
>>> 
>>> -Jay
>>> 
>>> 
>>> On Thu, Aug 27, 2015 at 4:21 PM, Ewen Cheslack-Postava <
>> e...@confluent.io>
>>> wrote:
>>> 
>>>> I don't think the commands are really equivalent despite just adding
>> the
>>>> --new-consumer flag. ConsumerPerformance uses a single thread when
>> using
>>>> the new consumer (it literally just allocates the consumer, loops until
>>>> it's consumed enough, then exits), whereas the old consumer uses a
>> bunch
>>> of
>>>> additional threads.
>>>> 
>>>> To really compare performance, someone would have to think through a
>> fair
>>>> way to compare them -- the two operate so differently that you'd have
>> to
>>> be
>>>> very careful to get an apples-to-apples comparison.
>>>> 
>>>> By the way, membership in consumer groups should be a lot cheaper with
>>> the
>>>> new consumer (the ZK coordination issues with lots of consumers aren't
>> a
>>>> problem since ZK is not involved), so you can probably scale up the
>>> number
>>>> of consumer threads with little impact. It might be nice to patch the
>>>> consumer perf test to respect the # of threads setting, which might be
>> a
>>>> first step to getting a more reasonable comparison.
>>>> 
>>>> -Ewen
>>>> 
>>>> On Thu, Aug 27, 2015 at 11:25 AM, Poorna Chandra Tejashvi Reddy <
>>>> pctre...@gmail.com> wrote:
>>>> 
>>>>> Hi,
>>>>> 
>>>>> We have built the latest kafka from https://github.com/apache/kafka
>>>> based
>>>>> on this commit id 436b7ddc386eb688ba0f12836710f5e4bcaa06c8 .
>>>>> We ran the performance test on a 3 node kafka cluster. There is a
>> huge
>>>>> throughput degradation using the new-consumer compared to the regular
>>>>> consumer. Below are the numbers that explain the same.
>>>>> 
>>>>> bin/kafka-consumer-perf-test.sh --zookeeper zkIp:2181 --broker-list
>>>>> brokerIp:9092 --topics test --messages 5000000 : gives a throughput
>> of
>>>> 693
>>>>> K
>>>>> 
>>>>> bin/kafka-consumer-perf-test.sh --zookeeper zkIp:2181 --broker-list
>>>>> brokerIp:9092 --topics test --messages 5000000 --new-consumer :
>> gives a
>>>>> throughput of  51k
>>>>> 
>>>>> The whole set up is based on ec2, Kafka brokers running on r3.2x
>> large.
>>>>> 
>>>>> Are you guys aware of this performance degradation , do you have a
>> JIRA
>>>> for
>>>>> this, which can be used to track the resolution.
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> 
>>>>> -Poorna
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Thanks,
>>>> Ewen
>>>> 
>>> 
>> 
>> 
>> 
>> --
>> -- Guozhang
>> 
> 
> 
> 
> -- 
> Thanks,
> Ewen

Reply via email to