Hi,

We have a small production cluster with two nodes. The load on the nodes is
very small, around 20 reads / sec and about the same for writes. There are
around 2.5 million keys in the cluster and a RF of 2.

About 2.4 million of the rows are skinny (6 columns) and around 3kb in size
(each). Currently, scripts are running, accessing all of the keys in
timeorder to do some calculations.

While running the scripts, the nodes go down and then come back up 6-7
minutes later. This seems to be due to GC. I get lines like this in the log:
INFO [ScheduledTasks:1] 2013-04-15 14:00:02,749 GCInspector.java (line 122)
GC for ParNew: 338798 ms for 1 collections, 592212416 used; max is
1046937600

However, the heap is not full. The heap usage has a jagged pattern going
from 60% up to 70% during 5 minutes and then back down to 60% the next 5
minutes and so on. I get no "Heap is X full..." messages. Every once in a
while at one of these peaks, I get these stop-the-world GC for 6-7
minutes. Why does GC take up so much time even though the heap isn't full?

I am aware that my access patterns make key caching very unlikely to be
high. And indeed, my average key cache hit ratio during the run of the
scripts is around 0.5%. I tried disabling key caching on the accessed
column family (UPDATE COLUMN FAMILY cf WITH caching=none;) through the
cassandra-cli but I get the same behaviour. Is the turning key cache off
effective immediately?

Stop-the-world GC is fine if it happens for a few seconds but having them
for several minutes doesn't work. Any other suggestions to remove them?

Best regards,
Joel Samuelsson

Reply via email to