>From the stress.py code, it looks like the default storage-conf.xml was used 
>(at least schema-wise).  I'll give that a go for now.

On Jul 1, 2010, at 1:31 PM, Oren Benjamin wrote:

As a first step, I'd like to reproduce the test from 
http://spyced.blogspot.com/2010/01/cassandra-05.html on my current setup.

Can you post the storage-conf.xml that was used so that I can match the 
settings as much as possible?

Thanks,

   -- Oren

On Jul 1, 2010, at 3:15 AM, Oren Benjamin wrote:

Thanks Jonathan,

It's great that you still manage to help out individual users.  I first came 
across your blog while looking for a good reusable bloom filter implementation 
a while back.  Having surveyed every other Java implementation I could find, I 
ended up extracting the implementation from Cassandra along with the unit tests 
as you suggested in the post.  I added a few tests of my own and have been 
using it in projects ever since.  Saved me the trouble of reimplementing and 
testing - drinks are on me if we ever run into each other.
// End of digression

Yes, I did increase the heap size, however, the pauses were occurring during 
normal operations (no streaming, compacting, flushing etc.) and the heap was 
nowhere near full.  After discovering 
https://issues.apache.org/jira/browse/CASSANDRA-1214 , I changed disk access 
mode to standard IO and things appear to have stabilized somewhat (albeit at a 
steep performance cost).

I haven't seen any examples of Cassandra configurations for Rackspace Cloud, so 
I'll post what I've got running now and the results I've seen so far.

Overview:

6 8GB Rackspace Cloud servers (each configured identically with the exception 
of two nodes acting as Cassandra seeds)
Applications [mem allocated to JVM]: Cassandra [5GB], Tomcat [500MB],   Zabbix 
agent (for monitoring)
storage-conf.xml<http://docs.google.com/leaf?id=0B-f3cU2kufYpODc4NmJlOWQtYTY5Ni00MjllLThiNDUtODVjNjliZTAyODVh&sort=name&layout=list&num=50>

Setup:

The Tomcat instance hosts a servlet which communicates only with the Cassandra 
node on localhost (via Hector for monitoring and connection pooling).  The web 
service provided by the servlet is accessed through HAProxy (although I've done 
testing both with and without the LB).

Testing:

In my current test setup, I have the key cache on (600,000 keys / node) but row 
cache and mmap disabled.  The DB is preloaded with 200,000,000 fabricated test 
keys ("key0", "key1", "key2", ...).  Each key has 3 columns with a small amount 
of data (between 4 and 64 bytes per column).

Right now I'm testing reads only.  I have 4 servers (also Rackspace Cloud) 
running multi-threaded query agents to generate concurrent query load 
(unfortunately, I only just discovered stress.py in contrib - I'll post test 
results from stress.py as soon as I can).  Each request is for a single key.  
Cache hit rate 50%.

Before switching to standard IO, aggregate reads/sec across the cluster would 
briefly spike to as much as 1000 reads/sec before quickly dropping off, 
presumably having used up all available RAM.  After switching to standard IO, 
reads/sec stays relatively stable at 210 reads/sec.  The average read latency 
across the cluster is about 40 milliseconds.

I realize the dataset is rather large - perhaps more nodes with less RAM would 
perform better?  On deck is a test with 12 4GB nodes for comparison.

Again, thanks for any pointers that would help in optimizing and validating the 
installation.  If I can get to a state of performance in the cloud that's in 
line with expectations from other installations, I'd gladly post the setup 
instructions and results to help fill out this page: 
http://wiki.apache.org/cassandra/CloudConfig (Rackspace is conspicuously 
missing).

  -- Oren


On Jun 30, 2010, at 1:58 AM, Jonathan Ellis wrote:

You could be seeing GC pauses. Did you increase the heap size you gave
Cassandra, when you increased your VM size?

On Tue, Jun 29, 2010 at 11:57 AM, Oren Benjamin 
<o...@clearspring.com<mailto:o...@clearspring.com>> wrote:
Hi all - first timer here.

I'm experimenting with Cassandra on Rackspace Cloud.  Started with 4GB nodes 
and saw read latency spikes while streaming was taking place, so I increased to 
8GB to see if limited memory was the issue.  Now I'm seeing very strange 
behavior during any period that writes are taking place.  The entire (6 node) 
cluster seems to pause for periods of as much as 5-8 sec.  By that I mean all 
the stats (cpu, disk, network IO monitored via dstat) drop to zero or near zero 
on all nodes simultaneously.  Does anyone have experience with Cassandra on 
Rackspace or any idea what's going on here?

The pauses are short enough that it's difficult to introspect the application 
and determine what it's doing during the pause, but long enough to cause 
unacceptable latency for any service built on top of it.

Any ideas or debugging methods would be greatly appreciated,

 -- Oren



--
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com<http://riptano.com/>



Reply via email to