Tom,

Thanks for reporting this. A few quick comments.

1. Did you send the right command for producer-perf? The command limits the
throughput to 100 msgs/sec. So, not sure how a single producer can get 75K
msgs/sec.

2. Could you collect some stats (e.g. average batch size) in the producer
and see if there is any noticeable difference between 0.9 and 0.10?

3. Is the broker-to-broker communication also on SSL? Could you do another
test with replication factor 1 and see if you still see the degradation?

Finally, email is probably not the best way to discuss performance results.
If you have more of them, could you create a jira and attach your findings
there?

Thanks,

Jun



On Thu, May 12, 2016 at 1:26 PM, Tom Crayford <tcrayf...@heroku.com> wrote:

> We've started running our usual suite of performance tests against Kafka
> 0.10.0.0 RC. These tests orchestrate multiple consumer/producer machines to
> run a fairly normal mixed workload of producers and consumers (each
> producer/consumer are just instances of kafka's inbuilt consumer/producer
> perf tests). We've found about a 33% performance drop in the producer if
> TLS is used (compared to 0.9.0.1)
>
> We've seen notable producer performance degredations between 0.9.0.1 and
> 0.10.0.0 RC. We're running as of the commit 9404680 right now.
>
> Our specific test case runs Kafka on 8 EC2 machines, with enhanced
> networking. Nothing is changed between the instances, and I've reproduced
> this over 4 different sets of clusters now. We're seeing about a 33%
> performance drop between 0.9.0.1 and 0.10.0.0 as of commit 9404680. Please
> to note that this doesn't match up with
> https://issues.apache.org/jira/browse/KAFKA-3565, because our performance
> tests are with compression off, and this seems to be an TLS only issue.
>
> Under 0.10.0-rc4, we see an 8 node cluster with replication factor of 3,
> and 13 producers max out at around 1 million 100 byte messages a second.
> Under 0.9.0.1, the same cluster does 1.5 million messages a second. Both
> tests were with TLS on. I've reproduced this on multiple clusters now (5 or
> so of each version) to account for the inherent performance variance of
> EC2. There's no notable performance difference without TLS on these runs -
> it appears to be an TLS regression entirely.
>
> A single producer with TLS under 0.10 does about 75k messages/s. Under
> 0.9.0.01 it does around 120k messages/s.
>
> The exact producer-perf line we're using is this:
>
> bin/kafka-producer-perf-test --topic "bench" --num-records "500000000"
> --record-size "100" --throughput "100" --producer-props acks="-1"
> bootstrap.servers=REDACTED ssl.keystore.location=client.jks
> ssl.keystore.password=REDACTED ssl.truststore.location=server.jks
> ssl.truststore.password=REDACTED
> ssl.enabled.protocols=TLSv1.2,TLSv1.1,TLSv1 security.protocol=SSL
>
> We're using the same setup, machine type etc for each test run.
>
> We've tried using both 0.9.0.1 producers and 0.10.0.0 producers and the TLS
> performance impact was there for both.
>
> I've glanced over the code between 0.9.0.1 and 0.10.0.0 and haven't seen
> anything that seemed to have this kind of impact - indeed the TLS code
> doesn't seem to have changed much between 0.9.0.1 and 0.10.0.0.
>
> Any thoughts? Should I file an issue and see about reproducing a more
> minimal test case?
>
> I don't think this is related to
> https://issues.apache.org/jira/browse/KAFKA-3565 - that is for compression
> on and plaintext, and this is for TLS only.
>

Reply via email to