Cool. By the way, I do mean you should use 'atop'. That was not a typo on my part.
http://www.atoptool.nl/downloadatop.php apt-get install atop on Ubuntu systems. Philip On May 21, 2013, at 4:51 PM, Jason Weiss <jason_we...@rapid7.com> wrote: > Philip, > > Thanks for the response. I used top yesterday and determined that part of > my problem was that the kafaka shell script is pre-configured to only use > 512M of RAM, and thus it wasn't using memory efficiently. That has helped > out tremendously. Adding an echo at the start of the script that it was > defaulting to such a low value probably would have saved me some time. In > the same vein, I should have inspected the launch command more closely. > > The virtualization of AWS makes it difficult to truly know what your > performance is, IMHO. There are lots of people arguing on the web about > the value of bare metal versus virtualization. I am still baffled how > companies like Urban Airship are purportedly seeing bursts of 750,000 > messages per second on a 3-cluster machine, but by playing with the knobs > in a controlled manner, I'm starting to better understand the relationship > and effect on the overall system. > > Jason > > > On 5/21/13 11:44 AM, "Philip O'Toole" <phi...@loggly.com> wrote: > >> As a test, why not just use a disk with provisioned IOPs of 4000? Just as >> a test - see if it improves. >> >> Also, you have not supplied any metrics regarding the VM's performance. >> Is the CPU busy? Is IO maxed out? Network? Disk? Use a tool like atop, >> and tell us what you find. >> >> Philip >> >> On May 20, 2013, at 6:43 PM, Ken Krugler <kkrugler_li...@transpac.com> >> wrote: >> >>> Hi Jason, >>> >>> On May 20, 2013, at 10:01am, Jason Weiss wrote: >>> >>>> Hi Scott. >>>> >>>> I'm using Kafka 0.7.2. I am using the default replication factor, >>>> since I >>>> don't recall changing that configuration at all. >>>> >>>> I'm using provisioned IOPS, which from attending the AWS event in NYC a >>>> few weeks ago was presented as the "fastest storage option" for EC2. A >>>> number of partners presented success stories in terms of throughput >>>> with >>>> provisioned IOPS. I've tried to follow that model. >>> >>> In my experience directly hitting an ephemeral drive on m1.large is >>> faster than using EBS. >>> >>> I've seen some articles where RAIDing multiple EBS volumes can exceed >>> the performance of ephemeral drives, but with high variability. >>> >>> If you want to maximize performance, set up up a (smaller) cluster of >>> SSD-backed instances with 10Gb Ethernet in the same cluster group. >>> >>> E.g. test with three cr1.8xlarge instances. >>> >>> -- Ken >>> >>> >>>> On 5/20/13 12:56 PM, "Scott Clasen" <sc...@heroku.com> wrote: >>>> >>>>> My guess, EBS is likely your bottleneck. Try running on instance >>>>> local >>>>> disks, and compare your results. Is this 0.8? What replication >>>>> factor are >>>>> you using? >>>>> >>>>> >>>>> On Mon, May 20, 2013 at 8:11 AM, Jason Weiss <jason_we...@rapid7.com> >>>>> wrote: >>>>> >>>>>> I'm trying to maximize my throughput and seem to have hit a ceiling. >>>>>> Everything described below is running in AWS. >>>>>> >>>>>> I have configured a Kafka cluster with 5 machines, M1.Large, with 600 >>>>>> provisioned IOPS storage for each EC2 instance. I have a Zookeeper >>>>>> server >>>>>> (we aren't in production yet, so I didn't take the time to setup a ZK >>>>>> cluster). Publishing to a single topic from 7 different clients, I >>>>>> seem >>>>>> to >>>>>> max out at around 20,000 eps with a fixed 2K message size. Each >>>>>> brokers >>>>>> defines 10 file segments, with a 25000 message / 5 second flush >>>>>> configuration in server.properties. I have stuck with 8 threads. My >>>>>> producers (Java) are configured with batch.num.messages at 50, and >>>>>> queue.buffering.max.messages at 100. >>>>>> >>>>>> When I went from 4 servers in the cluster to 5 servers, I only saw an >>>>>> increase of about 500 events per second in throughput. In sharp >>>>>> contrast, >>>>>> when I run a complete environment on my MacBook Pro, tuned as >>>>>> described >>>>>> above but with a single ZK and a single Kafka broker, I am seeing >>>>>> 61,000 >>>>>> events per second. I don't think I'm network constrained in the AWS >>>>>> environment (producer side) because when I add one more client, my >>>>>> MacBook >>>>>> Pro, I see a proportionate decrease in EC2 client throughput, and the >>>>>> net >>>>>> result is an identical 20,000 eps. Stated differently, my EC2 >>>>>> instance >>>>>> give >>>>>> up throughput when my local MacBook Pro joins the array of producers >>>>>> such >>>>>> that the throughput is exactly the same. >>>>>> >>>>>> Does anyone have any additional suggestions on what else I could >>>>>> tune to >>>>>> try and hit our goal, 50,000 eps with a 5 machine cluster? Based on >>>>>> the >>>>>> whitepapers published, LinkedIn describes a peak of 170,000 events >>>>>> per >>>>>> second across their cluster. My 20,000 seems so far away from their >>>>>> production figures. >>>>>> >>>>>> What is the relationship, in terms of performance, between ZK and >>>>>> Kafka? >>>>>> Do I need to have a more performant ZK cluster, the same, or does it >>>>>> really >>>>>> not matter in terms of maximizing throughput. >>>>>> >>>>>> Thanks for any suggestions I've been pulling knobs and turning >>>>>> levers >>>>>> on >>>>>> this for several days now. >>>>>> >>>>>> >>>>>> Jason >>> >>> -------------------------- >>> Ken Krugler >>> +1 530-210-6378 >>> http://www.scaleunlimited.com >>> custom big data solutions & training >>> Hadoop, Cascading, Cassandra & Solr > > This electronic message contains information which may be confidential or > privileged. The information is intended for the use of the individual or > entity named above. If you are not the intended recipient, be aware that any > disclosure, copying, distribution or use of the contents of this information > is prohibited. If you have received this electronic transmission in error, > please notify us by e-mail at (postmas...@rapid7.com) immediately.