As a test, why not just use a disk with provisioned IOPs of 4000? Just as a test - see if it improves.
Also, you have not supplied any metrics regarding the VM's performance. Is the CPU busy? Is IO maxed out? Network? Disk? Use a tool like atop, and tell us what you find. Philip On May 20, 2013, at 6:43 PM, Ken Krugler <kkrugler_li...@transpac.com> wrote: > Hi Jason, > > On May 20, 2013, at 10:01am, Jason Weiss wrote: > >> Hi Scott. >> >> I'm using Kafka 0.7.2. I am using the default replication factor, since I >> don't recall changing that configuration at all. >> >> I'm using provisioned IOPS, which from attending the AWS event in NYC a >> few weeks ago was presented as the "fastest storage option" for EC2. A >> number of partners presented success stories in terms of throughput with >> provisioned IOPS. I've tried to follow that model. > > In my experience directly hitting an ephemeral drive on m1.large is faster > than using EBS. > > I've seen some articles where RAIDing multiple EBS volumes can exceed the > performance of ephemeral drives, but with high variability. > > If you want to maximize performance, set up up a (smaller) cluster of > SSD-backed instances with 10Gb Ethernet in the same cluster group. > > E.g. test with three cr1.8xlarge instances. > > -- Ken > > >> On 5/20/13 12:56 PM, "Scott Clasen" <sc...@heroku.com> wrote: >> >>> My guess, EBS is likely your bottleneck. Try running on instance local >>> disks, and compare your results. Is this 0.8? What replication factor are >>> you using? >>> >>> >>> On Mon, May 20, 2013 at 8:11 AM, Jason Weiss <jason_we...@rapid7.com> >>> wrote: >>> >>>> I'm trying to maximize my throughput and seem to have hit a ceiling. >>>> Everything described below is running in AWS. >>>> >>>> I have configured a Kafka cluster with 5 machines, M1.Large, with 600 >>>> provisioned IOPS storage for each EC2 instance. I have a Zookeeper >>>> server >>>> (we aren't in production yet, so I didn't take the time to setup a ZK >>>> cluster). Publishing to a single topic from 7 different clients, I seem >>>> to >>>> max out at around 20,000 eps with a fixed 2K message size. Each brokers >>>> defines 10 file segments, with a 25000 message / 5 second flush >>>> configuration in server.properties. I have stuck with 8 threads. My >>>> producers (Java) are configured with batch.num.messages at 50, and >>>> queue.buffering.max.messages at 100. >>>> >>>> When I went from 4 servers in the cluster to 5 servers, I only saw an >>>> increase of about 500 events per second in throughput. In sharp >>>> contrast, >>>> when I run a complete environment on my MacBook Pro, tuned as described >>>> above but with a single ZK and a single Kafka broker, I am seeing 61,000 >>>> events per second. I don't think I'm network constrained in the AWS >>>> environment (producer side) because when I add one more client, my >>>> MacBook >>>> Pro, I see a proportionate decrease in EC2 client throughput, and the >>>> net >>>> result is an identical 20,000 eps. Stated differently, my EC2 instance >>>> give >>>> up throughput when my local MacBook Pro joins the array of producers >>>> such >>>> that the throughput is exactly the same. >>>> >>>> Does anyone have any additional suggestions on what else I could tune to >>>> try and hit our goal, 50,000 eps with a 5 machine cluster? Based on the >>>> whitepapers published, LinkedIn describes a peak of 170,000 events per >>>> second across their cluster. My 20,000 seems so far away from their >>>> production figures. >>>> >>>> What is the relationship, in terms of performance, between ZK and Kafka? >>>> Do I need to have a more performant ZK cluster, the same, or does it >>>> really >>>> not matter in terms of maximizing throughput. >>>> >>>> Thanks for any suggestions I've been pulling knobs and turning levers >>>> on >>>> this for several days now. >>>> >>>> >>>> Jason > > -------------------------- > Ken Krugler > +1 530-210-6378 > http://www.scaleunlimited.com > custom big data solutions & training > Hadoop, Cascading, Cassandra & Solr > > > > >