Jason, Thanks for sharing. This is very interesting. Normally, Kafka brokers don't use too much CPU. Are most of the 750% CPU actually used by Kafka brokers?
Jun On Wed, May 22, 2013 at 6:11 PM, Jason Weiss <jason_we...@rapid7.com> wrote: > >>Did you check that you were using all cores? > > top was reporting over 750% > > Jason > > ________________________________________ > From: Ken Krugler [kkrugler_li...@transpac.com] > Sent: Wednesday, May 22, 2013 20:59 > To: users@kafka.apache.org > Subject: Re: Apache Kafka in AWS > > Hi Jason, > > On May 22, 2013, at 3:35pm, Jason Weiss wrote: > > > Ken, > > > > Great question! I should have indicated I was using EBS, 500GB with 2000 > provisioned IOPs. > > OK, thanks. Sounds like you were pegged on CPU usage. > > But that does surprise me a bit. Did you check that you were using all > cores? > > Thanks, > > -- Ken > > PS - back in 2006 I spent a week of hell debugging an occasion job failure > on Hadoop (this is when it was still part of Nutch). Turns out one of our > 12 slaves was accidentally using OpenJDK, and this had a JIT compiler bug > that would occasionally rear its ugly head. Obviously the Sun/Oracle JRE > isn't bug-free, but it gets a lot more stress testing. So one of my basic > guidelines in the ops portion of the Hadoop class I teach is that every > server must have exactly the same version of Oracle's JRE. > > > ________________________________________ > > From: Ken Krugler [kkrugler_li...@transpac.com] > > Sent: Wednesday, May 22, 2013 17:23 > > To: users@kafka.apache.org > > Subject: Re: Apache Kafka in AWS > > > > Hi Jason, > > > > Thanks for the notes. > > > > I'm curious whether you went with using local drives (ephemeral storage) > or EBS, and if with EBS then what IOPS. > > > > Thanks, > > > > -- Ken > > > > On May 22, 2013, at 1:42pm, Jason Weiss wrote: > > > >> All, > >> > >> I asked a number of questions of the group over the last week, and I'm > happy to report that I've had great success getting Kafka up and running in > AWS. I am using 3 EC2 instances, each of which is a M2 High-Memory > Quadruple Extra Large with 8 cores and 58.4 GiB of memory according to the > AWS specs. I have co-located Zookeeper instances next to Zafka on each > machine. > >> > >> I am able to publish in a repeatable fashion 273,000 events per second, > with each event payload consisting of a fixed size of 2048 bytes! This > represents the maximum throughput possible on this configuration, as the > servers became CPU constrained, averaging 97% utilization in a relatively > flat line. This isn't a "burst" speed – it represents a sustained > throughput from 20 M1 Large EC2 Kafka multi-threaded producers. Putting > this into perspective, if my log retention period was a month, I'd be > aggregating 1.3 petabytes of data on my disk drives. Suffice to say, I > don't see us retaining data for more than a few hours! > >> > >> Here were the keys to tuning for future folks to consider: > >> > >> First and foremost, be sure to configure your Java heap size > accordingly when you launch Kafka. The default is like 512MB, which in my > case left virtually all of my RAM inaccessible to Kafka. > >> Second, stay away from OpenJDK. No, seriously – this was a huge thorn > in my side, and I almost gave up on Kafka because of the problems I > encountered. The OpenJDK NIO functions repeatedly resulted in Kafka > crashing and burning in dramatic fashion. The moment I switched over to > Oracle's JDK for linux, Kafka didn't puke once- I mean, like not even a > hiccup. > >> Third know your message size. In my opinion, the more you understand > about your event payload characteristics, the better you can tune the > system. The two knobs to really turn are the log.flush.interval and > log.default.flush.interval.ms. The values here are intrinsically > connected to the types of payloads you are putting through the system. > >> Fourth and finally, to maximize throughput you have to code against the > async paradigm, and be prepared to tweak the batch size, queue properties, > and compression codec (wait for it…) in a way that matches the message > payload you are putting through the system and the capabilities of the > producer system itself. > >> > >> > >> Jason > >> > >> > >> > >> > >> > >> This electronic message contains information which may be confidential > or privileged. The information is intended for the use of the individual or > entity named above. If you are not the intended recipient, be aware that > any disclosure, copying, distribution or use of the contents of this > information is prohibited. If you have received this electronic > transmission in error, please notify us by e-mail at ( > postmas...@rapid7.com) immediately. > > > > -------------------------- > > Ken Krugler > > +1 530-210-6378 > > http://www.scaleunlimited.com > > custom big data solutions & training > > Hadoop, Cascading, Cassandra & Solr > > > > > > > > > > > > This electronic message contains information which may be confidential > or privileged. The information is intended for the use of the individual or > entity named above. If you are not the intended recipient, be aware that > any disclosure, copying, distribution or use of the contents of this > information is prohibited. If you have received this electronic > transmission in error, please notify us by e-mail at ( > postmas...@rapid7.com) immediately. > > > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Cassandra & Solr > > > > > > This electronic message contains information which may be confidential or > privileged. The information is intended for the use of the individual or > entity named above. If you are not the intended recipient, be aware that > any disclosure, copying, distribution or use of the contents of this > information is prohibited. If you have received this electronic > transmission in error, please notify us by e-mail at ( > postmas...@rapid7.com) immediately. > >