Re: cassandra performance degrades after 12 hours

Mohit Anchlia Mon, 03 Oct 2011 16:15:25 -0700

On Mon, Oct 3, 2011 at 1:19 PM, Ramesh Natarajan <rames...@gmail.com> wrote:
> Thanks for the pointers.  I checked the system and the iostat showed that we
> are saturating the disk to 100%. The disk is SCSI device exposed by ESXi and
> it is running on a dedicated lun as RAID10 (4 600GB 15k drives) connected to
> ESX host via iSCSI.
> When I run compactionstats I see we are compacting a column family which has
> about 10GB of data. During this time I also see dropped messages in the
> system.log file.
> Since my io rates are constant in my tests I think the compaction is
> throwing things off.  Is there a way I can throttle compaction on cassandra?
>   Rather than run multiple compaction run at the same time, i would like to
> throttle it by io rate.. It is possible?
> If instead of having 5 big column families, if I create say 1000 each (5000
> total), do you think it will help me in this case? ( smaller files and so
> smaller load on compaction )
> Is it normal to have 5000 column families?


I don't think it's good idea to design your cluster this way.
Cassandra can be scaled out. You might want to add more nodes instead.


> thanks
> Ramesh
>
>
> On Mon, Oct 3, 2011 at 2:50 PM, Chris Goffinet <c...@chrisgoffinet.com> wrote:
>>
>> Most likely what could be happening is you are running single threaded
>> compaction. Look at the cassandra.yaml of how to enable multi-threaded
>> compaction. As more data comes into the system, bigger files get created
>> during compaction. You could be in a situation where you might be compacting
>> at a higher bucket N level, and compactions build up at lower buckets.
>> Run "nodetool -host localhost compactionstats" to get an idea of what's
>> going on.
>>
>> On Mon, Oct 3, 2011 at 12:05 PM, Mohit Anchlia <mohitanch...@gmail.com>
>> wrote:
>>>
>>> In order to understand what's going on you might want to first just do
>>> write test, look at the results and then do just the read tests and
>>> then do both read / write tests.
>>>
>>> Since you mentioned high update/deletes I should also ask your CL for
>>> writes/reads? with high updates/delete + high CL I think one should
>>> expect reads to slow down when sstables have not been compacted.
>>>
>>> You have 20G space and 17G is used by your process and I also see 36G
>>> VIRT which I don't really understand why it's that high when swap is
>>> disabled. Look at sar -r output too to make sure there are no swaps
>>> occurring. Also, verify jna.jar is installed.
>>>
>>> On Mon, Oct 3, 2011 at 11:52 AM, Ramesh Natarajan <rames...@gmail.com>
>>> wrote:
>>> > I will start another test run to collect these stats. Our test model is
>>> > in
>>> > the neighborhood of  4500 inserts, 8000 updates&deletes and 1500 reads
>>> > every
>>> > second across 6 servers.
>>> > Can you elaborate more on reducing the heap space? Do you think it is a
>>> > problem with 17G RSS?
>>> > thanks
>>> > Ramesh
>>> >
>>> >
>>> > On Mon, Oct 3, 2011 at 1:33 PM, Mohit Anchlia <mohitanch...@gmail.com>
>>> > wrote:
>>> >>
>>> >> I am wondering if you are seeing issues because of more frequent
>>> >> compactions kicking in. Is this primarily write ops or reads too?
>>> >> During the period of test gather data like:
>>> >>
>>> >> 1. cfstats
>>> >> 2. tpstats
>>> >> 3. compactionstats
>>> >> 4. netstats
>>> >> 5. iostat
>>> >>
>>> >> You have RSS memory close to 17gb. Maybe someone can give further
>>> >> advise if that could be because of mmap. You might want to lower your
>>> >> heap size to 6-8G and see if that helps.
>>> >>
>>> >> Also, check if you have jna.jar deployed and you see malloc successful
>>> >> message in the logs.
>>> >>
>>> >> On Mon, Oct 3, 2011 at 10:36 AM, Ramesh Natarajan <rames...@gmail.com>
>>> >> wrote:
>>> >> > We have 5 CF.  Attached is the output from the describe command.  We
>>> >> > don't
>>> >> > have row cache enabled.
>>> >> > Thanks
>>> >> > Ramesh
>>> >> > Keyspace: MSA:
>>> >> >   Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
>>> >> >   Durable Writes: true
>>> >> >     Options: [replication_factor:3]
>>> >> >   Column Families:
>>> >> >     ColumnFamily: admin
>>> >> >       Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Default column value validator:
>>> >> > org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Row cache size / save period in seconds: 0.0/0
>>> >> >       Key cache size / save period in seconds: 200000.0/14400
>>> >> >       Memtable thresholds: 0.5671875/1440/121 (millions of
>>> >> > ops/minutes/MB)
>>> >> >       GC grace seconds: 3600
>>> >> >       Compaction min/max thresholds: 4/32
>>> >> >       Read repair chance: 1.0
>>> >> >       Replicate on write: true
>>> >> >       Built indexes: []
>>> >> >     ColumnFamily: modseq
>>> >> >       Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Default column value validator:
>>> >> > org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Row cache size / save period in seconds: 0.0/0
>>> >> >       Key cache size / save period in seconds: 500000.0/14400
>>> >> >       Memtable thresholds: 0.5671875/1440/121 (millions of
>>> >> > ops/minutes/MB)
>>> >> >       GC grace seconds: 3600
>>> >> >       Compaction min/max thresholds: 4/32
>>> >> >       Read repair chance: 1.0
>>> >> >       Replicate on write: true
>>> >> >       Built indexes: []
>>> >> >     ColumnFamily: msgid
>>> >> >       Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Default column value validator:
>>> >> > org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Row cache size / save period in seconds: 0.0/0
>>> >> >       Key cache size / save period in seconds: 500000.0/14400
>>> >> >       Memtable thresholds: 0.5671875/1440/121 (millions of
>>> >> > ops/minutes/MB)
>>> >> >       GC grace seconds: 864000
>>> >> >       Compaction min/max thresholds: 4/32
>>> >> >       Read repair chance: 1.0
>>> >> >       Replicate on write: true
>>> >> >       Built indexes: []
>>> >> >     ColumnFamily: participants
>>> >> >       Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Default column value validator:
>>> >> > org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Row cache size / save period in seconds: 0.0/0
>>> >> >       Key cache size / save period in seconds: 500000.0/14400
>>> >> >       Memtable thresholds: 0.5671875/1440/121 (millions of
>>> >> > ops/minutes/MB)
>>> >> >       GC grace seconds: 3600
>>> >> >       Compaction min/max thresholds: 4/32
>>> >> >       Read repair chance: 1.0
>>> >> >       Replicate on write: true
>>> >> >       Built indexes: []
>>> >> >     ColumnFamily: uid
>>> >> >       Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Default column value validator:
>>> >> > org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type
>>> >> >       Row cache size / save period in seconds: 0.0/0
>>> >> >       Key cache size / save period in seconds: 2000000.0/14400
>>> >> >       Memtable thresholds: 0.4/1440/121 (millions of ops/minutes/MB)
>>> >> >       GC grace seconds: 3600
>>> >> >       Compaction min/max thresholds: 4/32
>>> >> >       Read repair chance: 1.0
>>> >> >       Replicate on write: true
>>> >> >       Built indexes: []
>>> >> >
>>> >> >
>>> >> >
>>> >> >
>>> >> > On Mon, Oct 3, 2011 at 12:26 PM, Mohit Anchlia
>>> >> > <mohitanch...@gmail.com>
>>> >> > wrote:
>>> >> >>
>>> >> >> On Mon, Oct 3, 2011 at 10:12 AM, Ramesh Natarajan
>>> >> >> <rames...@gmail.com>
>>> >> >> wrote:
>>> >> >> > I am running a cassandra cluster of  6 nodes running RHEL6
>>> >> >> > virtualized
>>> >> >> > by
>>> >> >> > ESXi 5.0.  Each VM is configured with 20GB of ram and 12 cores.
>>> >> >> > Our
>>> >> >> > test
>>> >> >> > setup performs about 3000  inserts per second.  The cassandra
>>> >> >> > data
>>> >> >> > partition
>>> >> >> > is on a XFS filesystem mounted with options
>>> >> >> > (noatime,nodiratime,nobarrier,logbufs=8). We have no swap enabled
>>> >> >> > on
>>> >> >> > the
>>> >> >> > VMs
>>> >> >> > and the vm.swappiness set to 0. To avoid any contention issues
>>> >> >> > our
>>> >> >> > cassandra
>>> >> >> > VMs are not running any other application other than cassandra.
>>> >> >> > The test runs fine for about 12 hours or so. After that the
>>> >> >> > performance
>>> >> >> > starts to degrade to about 1500 inserts per sec. By 18-20 hours
>>> >> >> > the
>>> >> >> > inserts
>>> >> >> > go down to 300 per sec.
>>> >> >> > if i do a truncate, it starts clean, runs for a few hours (not as
>>> >> >> > clean
>>> >> >> > as
>>> >> >> > rebooting).
>>> >> >> > We find a direct correlation between kswapd kicking in after 12
>>> >> >> > hours
>>> >> >> > or
>>> >> >> > so
>>> >> >> > and the performance degradation.   If i look at the cached memory
>>> >> >> > it
>>> >> >> > is
>>> >> >> > close to 10G.  I am not getting a OOM error in cassandra. So
>>> >> >> > looks
>>> >> >> > like
>>> >> >> > we
>>> >> >> > are not running out of memory. Can some one explain if we can
>>> >> >> > optimize
>>> >> >> > this
>>> >> >> > so that kswapd doesn't kick in.
>>> >> >> >
>>> >> >> > Our top output shows
>>> >> >> > top - 16:23:54 up 2 days, 23:17,  4 users,  load average: 2.21,
>>> >> >> > 2.08,
>>> >> >> > 2.02
>>> >> >> > Tasks: 213 total,   1 running, 212 sleeping,   0 stopped,   0
>>> >> >> > zombie
>>> >> >> > Cpu(s):  1.6%us,  0.8%sy,  0.0%ni, 90.9%id,  6.3%wa,  0.0%hi,
>>> >> >> >  0.2%si,
>>> >> >> >  0.0%st
>>> >> >> > Mem:  20602812k total, 20320424k used,   282388k free,     1020k
>>> >> >> > buffers
>>> >> >> > Swap:        0k total,        0k used,        0k free, 10145516k
>>> >> >> > cached
>>> >> >> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
>>> >> >> >  COMMAND
>>> >> >> >
>>> >> >> >  2586 root      20   0 36.3g  17g 8.4g S 32.1 88.9   8496:37 java
>>> >> >> >
>>> >> >> > java output
>>> >> >> > root      2453     1 99 Sep30 pts/0    9-13:51:38 java -ea
>>> >> >> > -javaagent:./apache-cassandra-0.8.6/bin/../lib/jamm-0.2.2.jar
>>> >> >> > -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10059M
>>> >> >> > -Xmx10059M
>>> >> >> > -Xmn1200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k
>>> >> >> > -XX:+UseParNewGC
>>> >> >> > -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled
>>> >> >> > -XX:SurvivorRatio=8
>>> >> >> > -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75
>>> >> >> > -XX:+UseCMSInitiatingOccupancyOnly
>>> >> >> > -Djava.net.preferIPv4Stack=true
>>> >> >> > -Dcom.sun.management.jmxremote.port=7199
>>> >> >> > -Dcom.sun.management.jmxremote.ssl=false
>>> >> >> > -Dcom.sun.management.jmxremote.authenticate=false
>>> >> >> > -Djava.rmi.server.hostname=10.19.104.14
>>> >> >> > -Djava.net.preferIPv4Stack=true
>>> >> >> > -Dlog4j.configuration=log4j-server.properties
>>> >> >> > -Dlog4j.defaultInitOverride=true -cp
>>> >> >> >
>>> >> >> >
>>> >> >> >
>>> >> >> > ./apache-cassandra-0.8.6/bin/../conf:./apache-cassandra-0.8.6/bin/../build/classes/main:./apache-cassandra-0.8.6/bin/../build/classes/thrift:./apache-cassandra-0.8.6/bin/../lib/antlr-3.2.jar:./apache-cassandra-0.8.6/bin/../lib/apache-cassandra-0.8.6.jar:./apache-cassandra-0.8.6/bin/../lib/apache-cassandra-thrift-0.8.6.jar:./apache-cassandra-0.8.6/bin/../lib/avro-1.4.0-fixes.jar:./apache-cassandra-0.8.6/bin/../lib/avro-1.4.0-sources-fixes.jar:./apache-cassandra-0.8.6/bin/../lib/commons-cli-1.1.jar:./apache-cassandra-0.8.6/bin/../lib/commons-codec-1.2.jar:./apache-cassandra-0.8.6/bin/../lib/commons-collections-3.2.1.jar:./apache-cassandra-0.8.6/bin/../lib/commons-lang-2.4.jar:./apache-cassandra-0.8.6/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:./apache-cassandra-0.8.6/bin/../lib/guava-r08.jar:./apache-cassandra-0.8.6/bin/../lib/high-scale-lib-1.1.2.jar:./apache-cassandra-0.8.6/bin/../lib/jackson-core-asl-1.4.0.jar:./apache-cassandra-0.8.6/bin/../lib/jackson-mapper-asl-1.4.0.jar:./apache-cassandra-0.8.6/bin/../lib/jamm-0.2.2.jar:./apache-cassandra-0.8.6/bin/../lib/jline-0.9.94.jar:./apache-cassandra-0.8.6/bin/../lib/json-simple-1.1.jar:./apache-cassandra-0.8.6/bin/../lib/libthrift-0.6.jar:./apache-cassandra-0.8.6/bin/../lib/log4j-1.2.16.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-examples.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-impl.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-jmx.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-remote.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-rimpl.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-rjmx.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-tools.jar:./apache-cassandra-0.8.6/bin/../lib/servlet-api-2.5-20081211.jar:./apache-cassandra-0.8.6/bin/../lib/slf4j-api-1.6.1.jar:./apache-cassandra-0.8.6/bin/../lib/slf4j-log4j12-1.6.1.jar:./apache-cassandra-0.8.6/bin/../lib/snakeyaml-1.6.jar
>>> >> >> > org.apache.cassandra.thrift.CassandraDaemon
>>> >> >> >
>>> >> >> >
>>> >> >> > Ring output
>>> >> >> > [root@CAP4-CNode4 apache-cassandra-0.8.6]# ./bin/nodetool -h
>>> >> >> > 127.0.0.1
>>> >> >> > ring
>>> >> >> > Address         DC          Rack        Status State   Load
>>> >> >> >  Owns
>>> >> >> >    Token
>>> >> >> >
>>> >> >> >    141784319550391026443072753096570088105
>>> >> >> > 10.19.104.11    datacenter1 rack1       Up     Normal  19.92 GB
>>> >> >> >  16.67%  0
>>> >> >> > 10.19.104.12    datacenter1 rack1       Up     Normal  19.3 GB
>>> >> >> > 16.67%  28356863910078205288614550619314017621
>>> >> >> > 10.19.104.13    datacenter1 rack1       Up     Normal  18.57 GB
>>> >> >> >  16.67%  56713727820156410577229101238628035242
>>> >> >> > 10.19.104.14    datacenter1 rack1       Up     Normal  19.34 GB
>>> >> >> >  16.67%  85070591730234615865843651857942052863
>>> >> >> > 10.19.105.11    datacenter1 rack1       Up     Normal  19.88 GB
>>> >> >> >  16.67%  113427455640312821154458202477256070484
>>> >> >> > 10.19.105.12    datacenter1 rack1       Up     Normal  20 GB
>>> >> >> > 16.67%  141784319550391026443072753096570088105
>>> >> >> > [root@CAP4-CNode4 apache-cassandra-0.8.6]#
>>> >> >>
>>> >> >> How many CFs? can you describe CF and post the configuration? Do
>>> >> >> you
>>> >> >> have row cache enabled?
>>> >> >
>>> >> >
>>> >
>>> >
>>
>
>

Re: cassandra performance degrades after 12 hours

Reply via email to