All,

I asked a number of questions of the group over the last week, and I'm happy to 
report that I've had great success getting Kafka up and running in AWS. I am 
using 3 EC2 instances, each of which is a M2 High-Memory Quadruple Extra Large 
with 8 cores and 58.4 GiB of memory according to the AWS specs. I have 
co-located Zookeeper instances next to Zafka on each machine.

I am able to publish in a repeatable fashion 273,000 events per second, with 
each event payload consisting of a fixed size of 2048 bytes! This represents 
the maximum throughput possible on this configuration, as the servers became 
CPU constrained, averaging 97% utilization in a relatively flat line. This 
isn't a "burst" speed – it represents a sustained throughput from 20 M1 Large 
EC2 Kafka multi-threaded producers. Putting this into perspective, if my log 
retention period was a month, I'd be aggregating 1.3 petabytes of data on my 
disk drives. Suffice to say, I don't see us retaining data for more than a few 
hours!

Here were the keys to tuning for future folks to consider:

First and foremost, be sure to configure your Java heap size accordingly when 
you launch Kafka. The default is like 512MB, which in my case left virtually 
all of my RAM inaccessible to Kafka.
Second, stay away from OpenJDK. No, seriously – this was a huge thorn in my 
side, and I almost gave up on Kafka because of the problems I encountered. The 
OpenJDK NIO functions repeatedly resulted in Kafka crashing and burning in 
dramatic fashion. The moment I switched over to Oracle's JDK for linux, Kafka 
didn't puke once- I mean, like not even a hiccup.
Third know your message size. In my opinion, the more you understand about your 
event payload characteristics, the better you can tune the system. The two 
knobs to really turn are the log.flush.interval and 
log.default.flush.interval.ms. The values here are intrinsically connected to 
the types of payloads you are putting through the system.
Fourth and finally, to maximize throughput you have to code against the async 
paradigm, and be prepared to tweak the batch size, queue properties, and 
compression codec (wait for it…) in a way that matches the message payload you 
are putting through the system and the capabilities of the producer system 
itself.


Jason





This electronic message contains information which may be confidential or 
privileged. The information is intended for the use of the individual or entity 
named above. If you are not the intended recipient, be aware that any 
disclosure, copying, distribution or use of the contents of this information is 
prohibited. If you have received this electronic transmission in error, please 
notify us by e-mail at (postmas...@rapid7.com) immediately.

Reply via email to