I'm trying to determine if there are any practical limits on the amount of data 
that a single node can handle efficiently, and if so, whether I've hit that 
limit or not.

We've just set up a new 7-node cluster with Cassandra 1.1.2 running under 
OpenJDK6. Each node is 12-core Xeon with 24GB of RAM and is connected to a 
stripe of 10 3TB disk mirrors (a total of 20 spindles each) and connected via 
dual SATA-3 interconnects. I can read and write around 900MB/s sequentially on 
the arrays. I started out with Cassandra tuned with all-default values, with 
the exception of the compaction throughput which was increased from 16MB/s to 
100MB/s. These defaults will set the heap size to 6GB.

Our schema is pretty simple; only 4 column families and each has one secondary 
index. The replication factor was set to four, and compression disabled. Our 
access patterns are intended to be about equal numbers of inserts and selects, 
with no updates, and the occasional delete.

The first thing we did was begin to load data into the cluster. We could 
perform about 3000 inserts per second, which stayed mostly flat. Things started 
to go wrong around the time the nodes exceeded 800GB. Cassandra began to 
generate a lot of "mutations messages dropped" warnings, and was complaining 
that the heap was over 75% capacity.

At that point, we stopped all activity on the cluster and attempted a repair. 
We did this so we could be sure that the data was fully consistent before 
continuing. Our mistake was probably trying to repair all of the nodes 
simultaneously - within an hour, Java terminated on one of the nodes with a 
heap out-of-memory message. I then increased all of the heap sizes to 8GB, and 
reduced the heap_newsize to 800MB. All of the nodes were restarted, and there 
was no no outside activity on the cluster. I then began a repair on a single 
node. Within a few hours, it OOMed again and exited. I then increased the heap 
to 12GB, and attempted the same thing. This time, the repair ran for about 7 
hours before exiting from an OOM condition.

By now, the repair had increased the amount of data on some of the nodes to 
over 1.2TB. There is no going back to a 6GB heap size - Cassandra now exits 
with an OOM during startup unless the heap is set higher. It's at 16GB now, and 
a single node has been repairing for a couple of days. Though I have no 
personal experience with this, I've been told that Java's garbage collector 
doesn't perform well with heaps above 8GB. I'm wary of setting it higher, but I 
can add up to 192GB of RAM to each node if necessary.

How much heap does cassandra need for this amount of data with only four CFs? 
Am I scaling this cluster in completely the wrong direction? Is there a magic 
garbage collection setting that I need to add in cassandra-env that isn't there 
by default?

Thanks,

  - .Dustin

Reply via email to