Re: cassandra performance degrades after 12 hours

Mohit Anchlia Mon, 03 Oct 2011 11:34:11 -0700

I am wondering if you are seeing issues because of more frequent
compactions kicking in. Is this primarily write ops or reads too?
During the period of test gather data like:


1. cfstats
2. tpstats
3. compactionstats
4. netstats
5. iostat

You have RSS memory close to 17gb. Maybe someone can give further
advise if that could be because of mmap. You might want to lower your
heap size to 6-8G and see if that helps.

Also, check if you have jna.jar deployed and you see malloc successful
message in the logs.

On Mon, Oct 3, 2011 at 10:36 AM, Ramesh Natarajan <rames...@gmail.com> wrote:
> We have 5 CF.  Attached is the output from the describe command.  We don't
> have row cache enabled.
> Thanks
> Ramesh
> Keyspace: MSA:
>   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>   Durable Writes: true
>     Options: [replication_factor:3]
>   Column Families:
>     ColumnFamily: admin
>       Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>       Default column value validator:
> org.apache.cassandra.db.marshal.UTF8Type
>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 200000.0/14400
>       Memtable thresholds: 0.5671875/1440/121 (millions of ops/minutes/MB)
>       GC grace seconds: 3600
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: true
>       Built indexes: []
>     ColumnFamily: modseq
>       Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>       Default column value validator:
> org.apache.cassandra.db.marshal.UTF8Type
>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 500000.0/14400
>       Memtable thresholds: 0.5671875/1440/121 (millions of ops/minutes/MB)
>       GC grace seconds: 3600
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: true
>       Built indexes: []
>     ColumnFamily: msgid
>       Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>       Default column value validator:
> org.apache.cassandra.db.marshal.UTF8Type
>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 500000.0/14400
>       Memtable thresholds: 0.5671875/1440/121 (millions of ops/minutes/MB)
>       GC grace seconds: 864000
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: true
>       Built indexes: []
>     ColumnFamily: participants
>       Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>       Default column value validator:
> org.apache.cassandra.db.marshal.UTF8Type
>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 500000.0/14400
>       Memtable thresholds: 0.5671875/1440/121 (millions of ops/minutes/MB)
>       GC grace seconds: 3600
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: true
>       Built indexes: []
>     ColumnFamily: uid
>       Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>       Default column value validator:
> org.apache.cassandra.db.marshal.UTF8Type
>       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>       Row cache size / save period in seconds: 0.0/0
>       Key cache size / save period in seconds: 2000000.0/14400
>       Memtable thresholds: 0.4/1440/121 (millions of ops/minutes/MB)
>       GC grace seconds: 3600
>       Compaction min/max thresholds: 4/32
>       Read repair chance: 1.0
>       Replicate on write: true
>       Built indexes: []
>
>
>
>
> On Mon, Oct 3, 2011 at 12:26 PM, Mohit Anchlia <mohitanch...@gmail.com>
> wrote:
>>
>> On Mon, Oct 3, 2011 at 10:12 AM, Ramesh Natarajan <rames...@gmail.com>
>> wrote:
>> > I am running a cassandra cluster of  6 nodes running RHEL6 virtualized
>> > by
>> > ESXi 5.0.  Each VM is configured with 20GB of ram and 12 cores. Our test
>> > setup performs about 3000  inserts per second.  The cassandra data
>> > partition
>> > is on a XFS filesystem mounted with options
>> > (noatime,nodiratime,nobarrier,logbufs=8). We have no swap enabled on the
>> > VMs
>> > and the vm.swappiness set to 0. To avoid any contention issues our
>> > cassandra
>> > VMs are not running any other application other than cassandra.
>> > The test runs fine for about 12 hours or so. After that the performance
>> > starts to degrade to about 1500 inserts per sec. By 18-20 hours the
>> > inserts
>> > go down to 300 per sec.
>> > if i do a truncate, it starts clean, runs for a few hours (not as clean
>> > as
>> > rebooting).
>> > We find a direct correlation between kswapd kicking in after 12 hours or
>> > so
>> > and the performance degradation.   If i look at the cached memory it is
>> > close to 10G.  I am not getting a OOM error in cassandra. So looks like
>> > we
>> > are not running out of memory. Can some one explain if we can optimize
>> > this
>> > so that kswapd doesn't kick in.
>> >
>> > Our top output shows
>> > top - 16:23:54 up 2 days, 23:17,  4 users,  load average: 2.21, 2.08,
>> > 2.02
>> > Tasks: 213 total,   1 running, 212 sleeping,   0 stopped,   0 zombie
>> > Cpu(s):  1.6%us,  0.8%sy,  0.0%ni, 90.9%id,  6.3%wa,  0.0%hi,  0.2%si,
>> >  0.0%st
>> > Mem:  20602812k total, 20320424k used,   282388k free,     1020k buffers
>> > Swap:        0k total,        0k used,        0k free, 10145516k cached
>> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>> >
>> >  2586 root      20   0 36.3g  17g 8.4g S 32.1 88.9   8496:37 java
>> >
>> > java output
>> > root      2453     1 99 Sep30 pts/0    9-13:51:38 java -ea
>> > -javaagent:./apache-cassandra-0.8.6/bin/../lib/jamm-0.2.2.jar
>> > -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10059M
>> > -Xmx10059M
>> > -Xmn1200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k -XX:+UseParNewGC
>> > -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>> > -XX:SurvivorRatio=8
>> > -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>> > -XX:+UseCMSInitiatingOccupancyOnly -Djava.net.preferIPv4Stack=true
>> > -Dcom.sun.management.jmxremote.port=7199
>> > -Dcom.sun.management.jmxremote.ssl=false
>> > -Dcom.sun.management.jmxremote.authenticate=false
>> > -Djava.rmi.server.hostname=10.19.104.14 -Djava.net.preferIPv4Stack=true
>> > -Dlog4j.configuration=log4j-server.properties
>> > -Dlog4j.defaultInitOverride=true -cp
>> >
>> > ./apache-cassandra-0.8.6/bin/../conf:./apache-cassandra-0.8.6/bin/../build/classes/main:./apache-cassandra-0.8.6/bin/../build/classes/thrift:./apache-cassandra-0.8.6/bin/../lib/antlr-3.2.jar:./apache-cassandra-0.8.6/bin/../lib/apache-cassandra-0.8.6.jar:./apache-cassandra-0.8.6/bin/../lib/apache-cassandra-thrift-0.8.6.jar:./apache-cassandra-0.8.6/bin/../lib/avro-1.4.0-fixes.jar:./apache-cassandra-0.8.6/bin/../lib/avro-1.4.0-sources-fixes.jar:./apache-cassandra-0.8.6/bin/../lib/commons-cli-1.1.jar:./apache-cassandra-0.8.6/bin/../lib/commons-codec-1.2.jar:./apache-cassandra-0.8.6/bin/../lib/commons-collections-3.2.1.jar:./apache-cassandra-0.8.6/bin/../lib/commons-lang-2.4.jar:./apache-cassandra-0.8.6/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:./apache-cassandra-0.8.6/bin/../lib/guava-r08.jar:./apache-cassandra-0.8.6/bin/../lib/high-scale-lib-1.1.2.jar:./apache-cassandra-0.8.6/bin/../lib/jackson-core-asl-1.4.0.jar:./apache-cassandra-0.8.6/bin/../lib/jackson-mapper-asl-1.4.0.jar:./apache-cassandra-0.8.6/bin/../lib/jamm-0.2.2.jar:./apache-cassandra-0.8.6/bin/../lib/jline-0.9.94.jar:./apache-cassandra-0.8.6/bin/../lib/json-simple-1.1.jar:./apache-cassandra-0.8.6/bin/../lib/libthrift-0.6.jar:./apache-cassandra-0.8.6/bin/../lib/log4j-1.2.16.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-examples.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-impl.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-jmx.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-remote.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-rimpl.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-rjmx.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-tools.jar:./apache-cassandra-0.8.6/bin/../lib/servlet-api-2.5-20081211.jar:./apache-cassandra-0.8.6/bin/../lib/slf4j-api-1.6.1.jar:./apache-cassandra-0.8.6/bin/../lib/slf4j-log4j12-1.6.1.jar:./apache-cassandra-0.8.6/bin/../lib/snakeyaml-1.6.jar
>> > org.apache.cassandra.thrift.CassandraDaemon
>> >
>> >
>> > Ring output
>> > [root@CAP4-CNode4 apache-cassandra-0.8.6]# ./bin/nodetool -h 127.0.0.1
>> > ring
>> > Address         DC          Rack        Status State   Load
>> >  Owns
>> >    Token
>> >
>> >    141784319550391026443072753096570088105
>> > 10.19.104.11    datacenter1 rack1       Up     Normal  19.92 GB
>> >  16.67%  0
>> > 10.19.104.12    datacenter1 rack1       Up     Normal  19.3 GB
>> > 16.67%  28356863910078205288614550619314017621
>> > 10.19.104.13    datacenter1 rack1       Up     Normal  18.57 GB
>> >  16.67%  56713727820156410577229101238628035242
>> > 10.19.104.14    datacenter1 rack1       Up     Normal  19.34 GB
>> >  16.67%  85070591730234615865843651857942052863
>> > 10.19.105.11    datacenter1 rack1       Up     Normal  19.88 GB
>> >  16.67%  113427455640312821154458202477256070484
>> > 10.19.105.12    datacenter1 rack1       Up     Normal  20 GB
>> > 16.67%  141784319550391026443072753096570088105
>> > [root@CAP4-CNode4 apache-cassandra-0.8.6]#
>>
>> How many CFs? can you describe CF and post the configuration? Do you
>> have row cache enabled?
>
>

Re: cassandra performance degrades after 12 hours

Reply via email to