Thanks for sharing your experience with the community, Jason! -Neha
On Wed, May 22, 2013 at 1:42 PM, Jason Weiss <jason_we...@rapid7.com> wrote: > All, > > I asked a number of questions of the group over the last week, and I'm > happy to report that I've had great success getting Kafka up and running in > AWS. I am using 3 EC2 instances, each of which is a M2 High-Memory > Quadruple Extra Large with 8 cores and 58.4 GiB of memory according to the > AWS specs. I have co-located Zookeeper instances next to Zafka on each > machine. > > I am able to publish in a repeatable fashion 273,000 events per second, > with each event payload consisting of a fixed size of 2048 bytes! This > represents the maximum throughput possible on this configuration, as the > servers became CPU constrained, averaging 97% utilization in a relatively > flat line. This isn't a "burst" speed – it represents a sustained > throughput from 20 M1 Large EC2 Kafka multi-threaded producers. Putting > this into perspective, if my log retention period was a month, I'd be > aggregating 1.3 petabytes of data on my disk drives. Suffice to say, I > don't see us retaining data for more than a few hours! > > Here were the keys to tuning for future folks to consider: > > First and foremost, be sure to configure your Java heap size accordingly > when you launch Kafka. The default is like 512MB, which in my case left > virtually all of my RAM inaccessible to Kafka. > Second, stay away from OpenJDK. No, seriously – this was a huge thorn in > my side, and I almost gave up on Kafka because of the problems I > encountered. The OpenJDK NIO functions repeatedly resulted in Kafka > crashing and burning in dramatic fashion. The moment I switched over to > Oracle's JDK for linux, Kafka didn't puke once- I mean, like not even a > hiccup. > Third know your message size. In my opinion, the more you understand about > your event payload characteristics, the better you can tune the system. The > two knobs to really turn are the log.flush.interval and > log.default.flush.interval.ms. The values here are intrinsically > connected to the types of payloads you are putting through the system. > Fourth and finally, to maximize throughput you have to code against the > async paradigm, and be prepared to tweak the batch size, queue properties, > and compression codec (wait for it…) in a way that matches the message > payload you are putting through the system and the capabilities of the > producer system itself. > > > Jason > > > > > > This electronic message contains information which may be confidential or > privileged. The information is intended for the use of the individual or > entity named above. If you are not the intended recipient, be aware that > any disclosure, copying, distribution or use of the contents of this > information is prohibited. If you have received this electronic > transmission in error, please notify us by e-mail at ( > postmas...@rapid7.com) immediately. >