I know it is a big ask, but can you try bisecting?

For example, test before/after on commits:
* 45c8195 KAFKA-3025; Added timetamp to Message and use relative offset.
* 5b375d7 KAFKA-3149; Extend SASL implementation to support more mechanisms
* 69d9a66 KAFKA-3618; Handle ApiVersionsRequest before SASL authentication

This may help us nail down the issue source of the issue.

Gwen

On Thu, May 12, 2016 at 1:38 PM, Tom Crayford <tcrayf...@heroku.com> wrote:
> Yep, confirm.
>
> On Thu, May 12, 2016 at 9:37 PM, Gwen Shapira <g...@confluent.io> wrote:
>
>> Just to confirm:
>> You tested both versions with plain text and saw no performance drop?
>>
>>
>> On Thu, May 12, 2016 at 1:26 PM, Tom Crayford <tcrayf...@heroku.com>
>> wrote:
>> > We've started running our usual suite of performance tests against Kafka
>> > 0.10.0.0 RC. These tests orchestrate multiple consumer/producer machines
>> to
>> > run a fairly normal mixed workload of producers and consumers (each
>> > producer/consumer are just instances of kafka's inbuilt consumer/producer
>> > perf tests). We've found about a 33% performance drop in the producer if
>> > TLS is used (compared to 0.9.0.1)
>> >
>> > We've seen notable producer performance degredations between 0.9.0.1 and
>> > 0.10.0.0 RC. We're running as of the commit 9404680 right now.
>> >
>> > Our specific test case runs Kafka on 8 EC2 machines, with enhanced
>> > networking. Nothing is changed between the instances, and I've reproduced
>> > this over 4 different sets of clusters now. We're seeing about a 33%
>> > performance drop between 0.9.0.1 and 0.10.0.0 as of commit 9404680.
>> Please
>> > to note that this doesn't match up with
>> > https://issues.apache.org/jira/browse/KAFKA-3565, because our
>> performance
>> > tests are with compression off, and this seems to be an TLS only issue.
>> >
>> > Under 0.10.0-rc4, we see an 8 node cluster with replication factor of 3,
>> > and 13 producers max out at around 1 million 100 byte messages a second.
>> > Under 0.9.0.1, the same cluster does 1.5 million messages a second. Both
>> > tests were with TLS on. I've reproduced this on multiple clusters now (5
>> or
>> > so of each version) to account for the inherent performance variance of
>> > EC2. There's no notable performance difference without TLS on these runs
>> -
>> > it appears to be an TLS regression entirely.
>> >
>> > A single producer with TLS under 0.10 does about 75k messages/s. Under
>> > 0.9.0.01 it does around 120k messages/s.
>> >
>> > The exact producer-perf line we're using is this:
>> >
>> > bin/kafka-producer-perf-test --topic "bench" --num-records "500000000"
>> > --record-size "100" --throughput "100" --producer-props acks="-1"
>> > bootstrap.servers=REDACTED ssl.keystore.location=client.jks
>> > ssl.keystore.password=REDACTED ssl.truststore.location=server.jks
>> > ssl.truststore.password=REDACTED
>> > ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1 security.protocol=SSL
>> >
>> > We're using the same setup, machine type etc for each test run.
>> >
>> > We've tried using both 0.9.0.1 producers and 0.10.0.0 producers and the
>> TLS
>> > performance impact was there for both.
>> >
>> > I've glanced over the code between 0.9.0.1 and 0.10.0.0 and haven't seen
>> > anything that seemed to have this kind of impact - indeed the TLS code
>> > doesn't seem to have changed much between 0.9.0.1 and 0.10.0.0.
>> >
>> > Any thoughts? Should I file an issue and see about reproducing a more
>> > minimal test case?
>> >
>> > I don't think this is related to
>> > https://issues.apache.org/jira/browse/KAFKA-3565 - that is for
>> compression
>> > on and plaintext, and this is for TLS only.
>>

Reply via email to