This is absolutely your bottleneck, as Brandon mentioned before. Your client machine is maxing out at 37K requests per second.
-----Original Message----- From: "David Schoonover" <david.schoono...@gmail.com> Sent: Monday, July 19, 2010 12:30pm To: user@cassandra.apache.org Subject: Re: Cassandra benchmarking on Rackspace Cloud > How many physical client machines are running stress.py? One with 50 threads; it is remote from the cluster but within the same DC in both cases. I also run the test with multiple clients and saw similar results when summing the reqs/sec. On Mon, Jul 19, 2010 at 1:22 PM, Stu Hood <stu.h...@rackspace.com> wrote: > How many physical client machines are running stress.py? > > -----Original Message----- > From: "David Schoonover" <david.schoono...@gmail.com> > Sent: Monday, July 19, 2010 12:11pm > To: user@cassandra.apache.org > Subject: Re: Cassandra benchmarking on Rackspace Cloud > > Hello all, I'm Oren's partner in crime on all this. I've got a few more > numbers to add. > > In an effort to eliminate everything but the scaling issue, I set up a > cluster on dedicated hardware (non-virtualized; 8-core, 16G RAM). > > No data was loaded into Cassandra -- 100% of requests were misses. This is, > so far as we can reason about the problem, as fast as the database can > perform; disk is out of the picture, and the hardware is certainly more than > sufficient. > > nodes reads/sec > 1 53,000 > 2 37,000 > 4 37,000 > > I ran this test previously on the cloud, with similar results: > > nodes reads/sec > 1 24,000 > 2 21,000 > 3 21,000 > 4 21,000 > 5 21,000 > 6 21,000 > > In fact, I ran it twice out of disbelief (on different nodes the second time) > to essentially identical results. > > Other Notes: > - stress.py was run in both random and gaussian mode; there was no > difference. > - Runs were 10+ minutes (where the above number represents an average > excluding the beginning and the end of the run). > - Supplied node lists covered all boxes in the cluster. > - Data and commitlog directories were deleted between each run. > - Tokens were evenly spaced across the ring, and changed to match cluster > size before each run. > > If anyone has explanations or suggestions, they would be quite welcome. This > is surprising to say the least. > > Cheers, > > Dave > > > > On Jul 19, 2010, at 11:42 AM, Stu Hood wrote: > >> Hey Oren, >> >> The Cloud Servers REST API returns a "hostId" for each server that indicates >> which physical host you are on: I'm not sure if you can see it from the >> control panel, but a quick curl session should get you the answer. >> >> Thanks, >> Stu >> >> -----Original Message----- >> From: "Oren Benjamin" <o...@clearspring.com> >> Sent: Monday, July 19, 2010 10:30am >> To: "user@cassandra.apache.org" <user@cassandra.apache.org> >> Subject: Re: Cassandra benchmarking on Rackspace Cloud >> >> Certainly I'm using multiple cloud servers for the multiple client tests. >> Whether or not they are resident on the same physical machine, I just don't >> know. >> >> -- Oren >> >> On Jul 18, 2010, at 11:35 PM, Brandon Williams wrote: >> >> On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin >> <o...@clearspring.com<mailto:o...@clearspring.com>> wrote: >> Thanks for the info. Very helpful in validating what I've been seeing. As >> for the scaling limit... >> >>>> The above was single node testing. I'd expect to be able to add nodes and >>>> scale throughput. Unfortunately, I seem to be running into a cap of >>>> 21,000 reads/s regardless of the number of nodes in the cluster. >>> >>> This is what I would expect if a single machine is handling all the >>> Thrift requests. Are you spreading the client connections to all the >>> machines? >> >> Yes - in all tests I add all nodes in the cluster to the --nodes list. The >> client requests are in fact being dispersed among all the nodes as evidenced >> by the intermittent TimedOutExceptions in the log which show up against the >> various nodes in the input list. Could it be a result of all the virtual >> nodes being hosted on the same physical hardware? Am I running into some >> connection limit? I don't see anything pegged in the JMX stats. >> >> It's unclear if you're using multiple client machines for stress.py or not, >> a limitation of 24k/21k for a single quad-proc machine is normal in my >> experience. >> >> -Brandon >> >> >> > > > > -- LOVE DAVE