In a 4 node cluster running Cassandra 1.1.5 with sun jvm 1.6.0_29-b11
(64-bit), the nodes are often getting "stuck" in state where CMS
collections of the old space are constantly running.

The JVM configuration is using the standard settings in cassandra-env --
relevant settings are included below.  The max heap is currently set to 5
GB with 800MB for new size.  I don't believe that the cluster is overly
busy and seems to be performing well enough other than this issue.  When
nodes get into this state they never seem to leave it (by freeing up old
space memory) without restarting cassandra.  They typically enter this
state while running "nodetool repair -pr" but once they start doing this,
restarting them only "fixes" it for a couple of hours.

Compactions are completing and are generally not queued up.  All CF are
using STCS.  The busiest CF consumes about 100GB of space on disk, is write
heavy, and all columns have a TTL of 3 days.  Overall, there are 41 CF
including those used for system keyspace and secondary indexes.  The number
of SSTables per node currently varies from 185-212.

Other than frequent log warnings about "GCInspector  - Heap is 0.xxx full..."
and "StorageService  - Flushing CFS(...) to relieve memory pressure" there
are no other log entries to indicate there is a problem.

Does the memory needed vary depending on the amount of data stored?  If so,
how can I predict how much jvm space is needed?  I don't want to make the
heap too large as that's bad too.  Maybe there's a memory leak related to
compaction that doesn't allow meta-data to be purged?


-Bryan


12 GB of RAM in host with ~6 GB used by java and ~6 GB for OS and buffer
cache.
$> free -m
             total       used       free     shared    buffers     cached
Mem:         12001      11870        131          0          4       5778
-/+ buffers/cache:       6087       5914
Swap:            0          0          0


jvm settings in cassandra-env
MAX_HEAP_SIZE="5G"
HEAP_NEWSIZE="800M"

# GC tuning options
JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseCompressedOops"


jstat shows about 12 full collections per minute with old heap usage
constantly over 75% so CMS is always over the
CMSInitiatingOccupancyFraction threshold.

$> jstat -gcutil -t 22917 5000 4
Timestamp         S0     S1     E      O      P     YGC     YGCT    FGC
 FGCT     GCT
       132063.0  34.70   0.00  26.03  82.29  59.88  21580  506.887 17523
3078.941 3585.829
       132068.0  34.70   0.00  50.02  81.23  59.88  21580  506.887 17524
3079.220 3586.107
       132073.1   0.00  24.92  46.87  81.41  59.88  21581  506.932 17525
3079.583 3586.515
       132078.1   0.00  24.92  64.71  81.40  59.88  21581  506.932 17527
3079.853 3586.785


Other hosts not currently experiencing the high CPU load have a heap less
than .75 full.

$> jstat -gcutil -t 6063 5000 4
Timestamp         S0     S1     E      O      P     YGC     YGCT    FGC
 FGCT     GCT
       520731.6   0.00  12.70  36.37  71.33  59.26  46453 1688.809 14785
2130.779 3819.588
       520736.5   0.00  12.70  53.25  71.33  59.26  46453 1688.809 14785
2130.779 3819.588
       520741.5   0.00  12.70  68.92  71.33  59.26  46453 1688.809 14785
2130.779 3819.588
       520746.5   0.00  12.70  83.11  71.33  59.26  46453 1688.809 14785
2130.779 3819.588

Reply via email to