Curious if you tested with larger message sizes, like around 20-30kb (you mentioned 2kb).
Any numbers on that size? On Thu, May 23, 2013 at 10:12 AM, Jason Weiss <jason_we...@rapid7.com>wrote: > Bummer. > > Yes, but it will be several days. I'll post back to the forum with a URL > once I'm done. > > Jason > > > > On 5/23/13 10:11 AM, "Jun Rao" <jun...@gmail.com> wrote: > > >Jason, > > > >Unfortunately, Apache mailing lists don't support attachments. Could you > >document your experience (with the graphs) in a blog (or a wiki page in > >Kafka)? > > > >Thanks, > > > >Jun > > > > > >On Thu, May 23, 2013 at 2:00 AM, Jason Weiss <jason_we...@rapid7.com> > >wrote: > > > >> Jun, > >> > >> Here is a screenshot from AWS's statistics (per-minute sampling is the > >> finest granularity I believe that they chart). I don't have a > >>screenshot of > >> the top output. > >> > >> This shows when I added a 4th machine to the cluster with the same > >>number > >> of clients, my CPU utilization fell- but remained constant. The > >>flatline is > >> pretty obvious in the extended 4 minute test-- it ramps up, flat lines, > >> then ramps down. > >> > >> Jason > >> > >> ________________________________________ > >> From: Jun Rao [jun...@gmail.com] > >> Sent: Thursday, May 23, 2013 00:17 > >> To: users@kafka.apache.org > >> Subject: Re: Apache Kafka in AWS > >> > >> Jason, > >> > >> Thanks for sharing. This is very interesting. Normally, Kafka brokers > >>don't > >> use too much CPU. Are most of the 750% CPU actually used by Kafka > >>brokers? > >> > >> Jun > >> > >> > >> On Wed, May 22, 2013 at 6:11 PM, Jason Weiss <jason_we...@rapid7.com> > >> wrote: > >> > >> > >>Did you check that you were using all cores? > >> > > >> > top was reporting over 750% > >> > > >> > Jason > >> > > >> > ________________________________________ > >> > From: Ken Krugler [kkrugler_li...@transpac.com] > >> > Sent: Wednesday, May 22, 2013 20:59 > >> > To: users@kafka.apache.org > >> > Subject: Re: Apache Kafka in AWS > >> > > >> > Hi Jason, > >> > > >> > On May 22, 2013, at 3:35pm, Jason Weiss wrote: > >> > > >> > > Ken, > >> > > > >> > > Great question! I should have indicated I was using EBS, 500GB with > >> 2000 > >> > provisioned IOPs. > >> > > >> > OK, thanks. Sounds like you were pegged on CPU usage. > >> > > >> > But that does surprise me a bit. Did you check that you were using all > >> > cores? > >> > > >> > Thanks, > >> > > >> > -- Ken > >> > > >> > PS - back in 2006 I spent a week of hell debugging an occasion job > >> failure > >> > on Hadoop (this is when it was still part of Nutch). Turns out one of > >>our > >> > 12 slaves was accidentally using OpenJDK, and this had a JIT compiler > >>bug > >> > that would occasionally rear its ugly head. Obviously the Sun/Oracle > >>JRE > >> > isn't bug-free, but it gets a lot more stress testing. So one of my > >>basic > >> > guidelines in the ops portion of the Hadoop class I teach is that > >>every > >> > server must have exactly the same version of Oracle's JRE. > >> > > >> > > ________________________________________ > >> > > From: Ken Krugler [kkrugler_li...@transpac.com] > >> > > Sent: Wednesday, May 22, 2013 17:23 > >> > > To: users@kafka.apache.org > >> > > Subject: Re: Apache Kafka in AWS > >> > > > >> > > Hi Jason, > >> > > > >> > > Thanks for the notes. > >> > > > >> > > I'm curious whether you went with using local drives (ephemeral > >> storage) > >> > or EBS, and if with EBS then what IOPS. > >> > > > >> > > Thanks, > >> > > > >> > > -- Ken > >> > > > >> > > On May 22, 2013, at 1:42pm, Jason Weiss wrote: > >> > > > >> > >> All, > >> > >> > >> > >> I asked a number of questions of the group over the last week, and > >>I'm > >> > happy to report that I've had great success getting Kafka up and > >>running > >> in > >> > AWS. I am using 3 EC2 instances, each of which is a M2 High-Memory > >> > Quadruple Extra Large with 8 cores and 58.4 GiB of memory according to > >> the > >> > AWS specs. I have co-located Zookeeper instances next to Zafka on each > >> > machine. > >> > >> > >> > >> I am able to publish in a repeatable fashion 273,000 events per > >> second, > >> > with each event payload consisting of a fixed size of 2048 bytes! This > >> > represents the maximum throughput possible on this configuration, as > >>the > >> > servers became CPU constrained, averaging 97% utilization in a > >>relatively > >> > flat line. This isn't a "burst" speed it represents a sustained > >> > throughput from 20 M1 Large EC2 Kafka multi-threaded producers. > >>Putting > >> > this into perspective, if my log retention period was a month, I'd be > >> > aggregating 1.3 petabytes of data on my disk drives. Suffice to say, I > >> > don't see us retaining data for more than a few hours! > >> > >> > >> > >> Here were the keys to tuning for future folks to consider: > >> > >> > >> > >> First and foremost, be sure to configure your Java heap size > >> > accordingly when you launch Kafka. The default is like 512MB, which > >>in my > >> > case left virtually all of my RAM inaccessible to Kafka. > >> > >> Second, stay away from OpenJDK. No, seriously this was a huge > >>thorn > >> > in my side, and I almost gave up on Kafka because of the problems I > >> > encountered. The OpenJDK NIO functions repeatedly resulted in Kafka > >> > crashing and burning in dramatic fashion. The moment I switched over > >>to > >> > Oracle's JDK for linux, Kafka didn't puke once- I mean, like not even > >>a > >> > hiccup. > >> > >> Third know your message size. In my opinion, the more you > >>understand > >> > about your event payload characteristics, the better you can tune the > >> > system. The two knobs to really turn are the log.flush.interval and > >> > log.default.flush.interval.ms. The values here are intrinsically > >> > connected to the types of payloads you are putting through the system. > >> > >> Fourth and finally, to maximize throughput you have to code against > >> the > >> > async paradigm, and be prepared to tweak the batch size, queue > >> properties, > >> > and compression codec (wait for itŠ) in a way that matches the message > >> > payload you are putting through the system and the capabilities of the > >> > producer system itself. > >> > >> > >> > >> > >> > >> Jason > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> > >> This electronic message contains information which may be > >>confidential > >> > or privileged. The information is intended for the use of the > >>individual > >> or > >> > entity named above. If you are not the intended recipient, be aware > >>that > >> > any disclosure, copying, distribution or use of the contents of this > >> > information is prohibited. If you have received this electronic > >> > transmission in error, please notify us by e-mail at ( > >> > postmas...@rapid7.com) immediately. > >> > > > >> > > -------------------------- > >> > > Ken Krugler > >> > > +1 530-210-6378 > >> > > http://www.scaleunlimited.com > >> > > custom big data solutions & training > >> > > Hadoop, Cascading, Cassandra & Solr > >> > > > >> > > > >> > > > >> > > > >> > > > >> > > This electronic message contains information which may be > >>confidential > >> > or privileged. The information is intended for the use of the > >>individual > >> or > >> > entity named above. If you are not the intended recipient, be aware > >>that > >> > any disclosure, copying, distribution or use of the contents of this > >> > information is prohibited. If you have received this electronic > >> > transmission in error, please notify us by e-mail at ( > >> > postmas...@rapid7.com) immediately. > >> > > > >> > > >> > -------------------------- > >> > Ken Krugler > >> > +1 530-210-6378 > >> > http://www.scaleunlimited.com > >> > custom big data solutions & training > >> > Hadoop, Cascading, Cassandra & Solr > >> > > >> > > >> > > >> > > >> > > >> > This electronic message contains information which may be > >>confidential or > >> > privileged. The information is intended for the use of the individual > >>or > >> > entity named above. If you are not the intended recipient, be aware > >>that > >> > any disclosure, copying, distribution or use of the contents of this > >> > information is prohibited. If you have received this electronic > >> > transmission in error, please notify us by e-mail at ( > >> > postmas...@rapid7.com) immediately. > >> > > >> > > >> This electronic message contains information which may be confidential > >>or > >> privileged. The information is intended for the use of the individual or > >> entity named above. If you are not the intended recipient, be aware that > >> any disclosure, copying, distribution or use of the contents of this > >> information is prohibited. If you have received this electronic > >> transmission in error, please notify us by e-mail at ( > >> postmas...@rapid7.com) immediately. > >> > > This electronic message contains information which may be confidential or > privileged. The information is intended for the use of the individual or > entity named above. If you are not the intended recipient, be aware that > any disclosure, copying, distribution or use of the contents of this > information is prohibited. If you have received this electronic > transmission in error, please notify us by e-mail at ( > postmas...@rapid7.com) immediately. > >