Nope, it was specific to the application I'm working on carrying real-world data. Sorry :-(
Jason ________________________________________ From: S Ahmed [sahmed1...@gmail.com] Sent: Wednesday, May 29, 2013 13:40 To: users@kafka.apache.org Subject: Re: Apache Kafka in AWS Is the code you used to benchmark open source by any chance? On Tue, May 28, 2013 at 4:29 PM, Jason Weiss <jason_we...@rapid7.com> wrote: > Nope, sorry. > > > ________________________________________ > From: S Ahmed [sahmed1...@gmail.com] > Sent: Tuesday, May 28, 2013 15:47 > To: users@kafka.apache.org > Subject: Re: Apache Kafka in AWS > > Curious if you tested with larger message sizes, like around 20-30kb (you > mentioned 2kb). > > Any numbers on that size? > > > On Thu, May 23, 2013 at 10:12 AM, Jason Weiss <jason_we...@rapid7.com > >wrote: > > > Bummer. > > > > Yes, but it will be several days. I'll post back to the forum with a URL > > once I'm done. > > > > Jason > > > > > > > > On 5/23/13 10:11 AM, "Jun Rao" <jun...@gmail.com> wrote: > > > > >Jason, > > > > > >Unfortunately, Apache mailing lists don't support attachments. Could you > > >document your experience (with the graphs) in a blog (or a wiki page in > > >Kafka)? > > > > > >Thanks, > > > > > >Jun > > > > > > > > >On Thu, May 23, 2013 at 2:00 AM, Jason Weiss <jason_we...@rapid7.com> > > >wrote: > > > > > >> Jun, > > >> > > >> Here is a screenshot from AWS's statistics (per-minute sampling is the > > >> finest granularity I believe that they chart). I don't have a > > >>screenshot of > > >> the top output. > > >> > > >> This shows when I added a 4th machine to the cluster with the same > > >>number > > >> of clients, my CPU utilization fell- but remained constant. The > > >>flatline is > > >> pretty obvious in the extended 4 minute test-- it ramps up, flat > lines, > > >> then ramps down. > > >> > > >> Jason > > >> > > >> ________________________________________ > > >> From: Jun Rao [jun...@gmail.com] > > >> Sent: Thursday, May 23, 2013 00:17 > > >> To: users@kafka.apache.org > > >> Subject: Re: Apache Kafka in AWS > > >> > > >> Jason, > > >> > > >> Thanks for sharing. This is very interesting. Normally, Kafka brokers > > >>don't > > >> use too much CPU. Are most of the 750% CPU actually used by Kafka > > >>brokers? > > >> > > >> Jun > > >> > > >> > > >> On Wed, May 22, 2013 at 6:11 PM, Jason Weiss <jason_we...@rapid7.com> > > >> wrote: > > >> > > >> > >>Did you check that you were using all cores? > > >> > > > >> > top was reporting over 750% > > >> > > > >> > Jason > > >> > > > >> > ________________________________________ > > >> > From: Ken Krugler [kkrugler_li...@transpac.com] > > >> > Sent: Wednesday, May 22, 2013 20:59 > > >> > To: users@kafka.apache.org > > >> > Subject: Re: Apache Kafka in AWS > > >> > > > >> > Hi Jason, > > >> > > > >> > On May 22, 2013, at 3:35pm, Jason Weiss wrote: > > >> > > > >> > > Ken, > > >> > > > > >> > > Great question! I should have indicated I was using EBS, 500GB > with > > >> 2000 > > >> > provisioned IOPs. > > >> > > > >> > OK, thanks. Sounds like you were pegged on CPU usage. > > >> > > > >> > But that does surprise me a bit. Did you check that you were using > all > > >> > cores? > > >> > > > >> > Thanks, > > >> > > > >> > -- Ken > > >> > > > >> > PS - back in 2006 I spent a week of hell debugging an occasion job > > >> failure > > >> > on Hadoop (this is when it was still part of Nutch). Turns out one > of > > >>our > > >> > 12 slaves was accidentally using OpenJDK, and this had a JIT > compiler > > >>bug > > >> > that would occasionally rear its ugly head. Obviously the Sun/Oracle > > >>JRE > > >> > isn't bug-free, but it gets a lot more stress testing. So one of my > > >>basic > > >> > guidelines in the ops portion of the Hadoop class I teach is that > > >>every > > >> > server must have exactly the same version of Oracle's JRE. > > >> > > > >> > > ________________________________________ > > >> > > From: Ken Krugler [kkrugler_li...@transpac.com] > > >> > > Sent: Wednesday, May 22, 2013 17:23 > > >> > > To: users@kafka.apache.org > > >> > > Subject: Re: Apache Kafka in AWS > > >> > > > > >> > > Hi Jason, > > >> > > > > >> > > Thanks for the notes. > > >> > > > > >> > > I'm curious whether you went with using local drives (ephemeral > > >> storage) > > >> > or EBS, and if with EBS then what IOPS. > > >> > > > > >> > > Thanks, > > >> > > > > >> > > -- Ken > > >> > > > > >> > > On May 22, 2013, at 1:42pm, Jason Weiss wrote: > > >> > > > > >> > >> All, > > >> > >> > > >> > >> I asked a number of questions of the group over the last week, > and > > >>I'm > > >> > happy to report that I've had great success getting Kafka up and > > >>running > > >> in > > >> > AWS. I am using 3 EC2 instances, each of which is a M2 High-Memory > > >> > Quadruple Extra Large with 8 cores and 58.4 GiB of memory according > to > > >> the > > >> > AWS specs. I have co-located Zookeeper instances next to Zafka on > each > > >> > machine. > > >> > >> > > >> > >> I am able to publish in a repeatable fashion 273,000 events per > > >> second, > > >> > with each event payload consisting of a fixed size of 2048 bytes! > This > > >> > represents the maximum throughput possible on this configuration, as > > >>the > > >> > servers became CPU constrained, averaging 97% utilization in a > > >>relatively > > >> > flat line. This isn't a "burst" speed it represents a sustained > > >> > throughput from 20 M1 Large EC2 Kafka multi-threaded producers. > > >>Putting > > >> > this into perspective, if my log retention period was a month, I'd > be > > >> > aggregating 1.3 petabytes of data on my disk drives. Suffice to > say, I > > >> > don't see us retaining data for more than a few hours! > > >> > >> > > >> > >> Here were the keys to tuning for future folks to consider: > > >> > >> > > >> > >> First and foremost, be sure to configure your Java heap size > > >> > accordingly when you launch Kafka. The default is like 512MB, which > > >>in my > > >> > case left virtually all of my RAM inaccessible to Kafka. > > >> > >> Second, stay away from OpenJDK. No, seriously this was a huge > > >>thorn > > >> > in my side, and I almost gave up on Kafka because of the problems I > > >> > encountered. The OpenJDK NIO functions repeatedly resulted in Kafka > > >> > crashing and burning in dramatic fashion. The moment I switched over > > >>to > > >> > Oracle's JDK for linux, Kafka didn't puke once- I mean, like not > even > > >>a > > >> > hiccup. > > >> > >> Third know your message size. In my opinion, the more you > > >>understand > > >> > about your event payload characteristics, the better you can tune > the > > >> > system. The two knobs to really turn are the log.flush.interval and > > >> > log.default.flush.interval.ms. The values here are intrinsically > > >> > connected to the types of payloads you are putting through the > system. > > >> > >> Fourth and finally, to maximize throughput you have to code > against > > >> the > > >> > async paradigm, and be prepared to tweak the batch size, queue > > >> properties, > > >> > and compression codec (wait for itŠ) in a way that matches the > message > > >> > payload you are putting through the system and the capabilities of > the > > >> > producer system itself. > > >> > >> > > >> > >> > > >> > >> Jason > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> > > >> > >> This electronic message contains information which may be > > >>confidential > > >> > or privileged. The information is intended for the use of the > > >>individual > > >> or > > >> > entity named above. If you are not the intended recipient, be aware > > >>that > > >> > any disclosure, copying, distribution or use of the contents of this > > >> > information is prohibited. If you have received this electronic > > >> > transmission in error, please notify us by e-mail at ( > > >> > postmas...@rapid7.com) immediately. > > >> > > > > >> > > -------------------------- > > >> > > Ken Krugler > > >> > > +1 530-210-6378 > > >> > > http://www.scaleunlimited.com > > >> > > custom big data solutions & training > > >> > > Hadoop, Cascading, Cassandra & Solr > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > > > >> > > This electronic message contains information which may be > > >>confidential > > >> > or privileged. The information is intended for the use of the > > >>individual > > >> or > > >> > entity named above. If you are not the intended recipient, be aware > > >>that > > >> > any disclosure, copying, distribution or use of the contents of this > > >> > information is prohibited. If you have received this electronic > > >> > transmission in error, please notify us by e-mail at ( > > >> > postmas...@rapid7.com) immediately. > > >> > > > > >> > > > >> > -------------------------- > > >> > Ken Krugler > > >> > +1 530-210-6378 > > >> > http://www.scaleunlimited.com > > >> > custom big data solutions & training > > >> > Hadoop, Cascading, Cassandra & Solr > > >> > > > >> > > > >> > > > >> > > > >> > > > >> > This electronic message contains information which may be > > >>confidential or > > >> > privileged. The information is intended for the use of the > individual > > >>or > > >> > entity named above. If you are not the intended recipient, be aware > > >>that > > >> > any disclosure, copying, distribution or use of the contents of this > > >> > information is prohibited. If you have received this electronic > > >> > transmission in error, please notify us by e-mail at ( > > >> > postmas...@rapid7.com) immediately. > > >> > > > >> > > > >> This electronic message contains information which may be confidential > > >>or > > >> privileged. The information is intended for the use of the individual > or > > >> entity named above. If you are not the intended recipient, be aware > that > > >> any disclosure, copying, distribution or use of the contents of this > > >> information is prohibited. If you have received this electronic > > >> transmission in error, please notify us by e-mail at ( > > >> postmas...@rapid7.com) immediately. > > >> > > > > This electronic message contains information which may be confidential or > > privileged. The information is intended for the use of the individual or > > entity named above. If you are not the intended recipient, be aware that > > any disclosure, copying, distribution or use of the contents of this > > information is prohibited. If you have received this electronic > > transmission in error, please notify us by e-mail at ( > > postmas...@rapid7.com) immediately. > > > > > This electronic message contains information which may be confidential or > privileged. The information is intended for the use of the individual or > entity named above. If you are not the intended recipient, be aware that > any disclosure, copying, distribution or use of the contents of this > information is prohibited. If you have received this electronic > transmission in error, please notify us by e-mail at ( > postmas...@rapid7.com) immediately. > > This electronic message contains information which may be confidential or privileged. The information is intended for the use of the individual or entity named above. If you are not the intended recipient, be aware that any disclosure, copying, distribution or use of the contents of this information is prohibited. If you have received this electronic transmission in error, please notify us by e-mail at (postmas...@rapid7.com) immediately.