Re: Cassandra benchmarking on Rackspace Cloud

Stu Hood Mon, 19 Jul 2010 10:36:07 -0700

This is absolutely your bottleneck, as Brandon mentioned before. Your client 
machine is maxing out at 37K requests per second.


-----Original Message-----
From: "David Schoonover" <david.schoono...@gmail.com>
Sent: Monday, July 19, 2010 12:30pm
To: user@cassandra.apache.org
Subject: Re: Cassandra benchmarking on Rackspace Cloud

> How many physical client machines are running stress.py?

One with 50 threads; it is remote from the cluster but within the same
DC in both cases. I also run the test with multiple clients and saw
similar results when summing the reqs/sec.


On Mon, Jul 19, 2010 at 1:22 PM, Stu Hood <stu.h...@rackspace.com> wrote:
> How many physical client machines are running stress.py?
>
> -----Original Message-----
> From: "David Schoonover" <david.schoono...@gmail.com>
> Sent: Monday, July 19, 2010 12:11pm
> To: user@cassandra.apache.org
> Subject: Re: Cassandra benchmarking on Rackspace Cloud
>
> Hello all, I'm Oren's partner in crime on all this. I've got a few more 
> numbers to add.
>
> In an effort to eliminate everything but the scaling issue, I set up a 
> cluster on dedicated hardware (non-virtualized; 8-core, 16G RAM).
>
> No data was loaded into Cassandra -- 100% of requests were misses. This is, 
> so far as we can reason about the problem, as fast as the database can 
> perform; disk is out of the picture, and the hardware is certainly more than 
> sufficient.
>
> nodes   reads/sec
> 1       53,000
> 2       37,000
> 4       37,000
>
> I ran this test previously on the cloud, with similar results:
>
> nodes   reads/sec
> 1       24,000
> 2       21,000
> 3       21,000
> 4       21,000
> 5       21,000
> 6       21,000
>
> In fact, I ran it twice out of disbelief (on different nodes the second time) 
> to essentially identical results.
>
> Other Notes:
>  - stress.py was run in both random and gaussian mode; there was no 
> difference.
>  - Runs were 10+ minutes (where the above number represents an average 
> excluding the beginning and the end of the run).
>  - Supplied node lists covered all boxes in the cluster.
>  - Data and commitlog directories were deleted between each run.
>  - Tokens were evenly spaced across the ring, and changed to match cluster 
> size before each run.
>
> If anyone has explanations or suggestions, they would be quite welcome. This 
> is surprising to say the least.
>
> Cheers,
>
> Dave
>
>
>
> On Jul 19, 2010, at 11:42 AM, Stu Hood wrote:
>
>> Hey Oren,
>>
>> The Cloud Servers REST API returns a "hostId" for each server that indicates 
>> which physical host you are on: I'm not sure if you can see it from the 
>> control panel, but a quick curl session should get you the answer.
>>
>> Thanks,
>> Stu
>>
>> -----Original Message-----
>> From: "Oren Benjamin" <o...@clearspring.com>
>> Sent: Monday, July 19, 2010 10:30am
>> To: "user@cassandra.apache.org" <user@cassandra.apache.org>
>> Subject: Re: Cassandra benchmarking on Rackspace Cloud
>>
>> Certainly I'm using multiple cloud servers for the multiple client tests.  
>> Whether or not they are resident on the same physical machine, I just don't 
>> know.
>>
>>   -- Oren
>>
>> On Jul 18, 2010, at 11:35 PM, Brandon Williams wrote:
>>
>> On Sun, Jul 18, 2010 at 8:45 PM, Oren Benjamin 
>> <o...@clearspring.com<mailto:o...@clearspring.com>> wrote:
>> Thanks for the info.  Very helpful in validating what I've been seeing.  As 
>> for the scaling limit...
>>
>>>> The above was single node testing.  I'd expect to be able to add nodes and 
>>>> scale throughput.  Unfortunately, I seem to be running into a cap of 
>>>> 21,000 reads/s regardless of the number of nodes in the cluster.
>>>
>>> This is what I would expect if a single machine is handling all the
>>> Thrift requests.  Are you spreading the client connections to all the
>>> machines?
>>
>> Yes - in all tests I add all nodes in the cluster to the --nodes list.  The 
>> client requests are in fact being dispersed among all the nodes as evidenced 
>> by the intermittent TimedOutExceptions in the log which show up against the 
>> various nodes in the input list.  Could it be a result of all the virtual 
>> nodes being hosted on the same physical hardware?  Am I running into some 
>> connection limit?  I don't see anything pegged in the JMX stats.
>>
>> It's unclear if you're using multiple client machines for stress.py or not, 
>> a limitation of 24k/21k for a single quad-proc machine is normal in my 
>> experience.
>>
>> -Brandon
>>
>>
>>
>
>
>
>



-- 
LOVE DAVE

Re: Cassandra benchmarking on Rackspace Cloud

Reply via email to