On Mon, Oct 3, 2011 at 1:19 PM, Ramesh Natarajan <rames...@gmail.com> wrote: > Thanks for the pointers. I checked the system and the iostat showed that we > are saturating the disk to 100%. The disk is SCSI device exposed by ESXi and > it is running on a dedicated lun as RAID10 (4 600GB 15k drives) connected to > ESX host via iSCSI. > When I run compactionstats I see we are compacting a column family which has > about 10GB of data. During this time I also see dropped messages in the > system.log file. > Since my io rates are constant in my tests I think the compaction is > throwing things off. Is there a way I can throttle compaction on cassandra? > Rather than run multiple compaction run at the same time, i would like to > throttle it by io rate.. It is possible? > If instead of having 5 big column families, if I create say 1000 each (5000 > total), do you think it will help me in this case? ( smaller files and so > smaller load on compaction ) > Is it normal to have 5000 column families?
I don't think it's good idea to design your cluster this way. Cassandra can be scaled out. You might want to add more nodes instead. > thanks > Ramesh > > > On Mon, Oct 3, 2011 at 2:50 PM, Chris Goffinet <c...@chrisgoffinet.com> wrote: >> >> Most likely what could be happening is you are running single threaded >> compaction. Look at the cassandra.yaml of how to enable multi-threaded >> compaction. As more data comes into the system, bigger files get created >> during compaction. You could be in a situation where you might be compacting >> at a higher bucket N level, and compactions build up at lower buckets. >> Run "nodetool -host localhost compactionstats" to get an idea of what's >> going on. >> >> On Mon, Oct 3, 2011 at 12:05 PM, Mohit Anchlia <mohitanch...@gmail.com> >> wrote: >>> >>> In order to understand what's going on you might want to first just do >>> write test, look at the results and then do just the read tests and >>> then do both read / write tests. >>> >>> Since you mentioned high update/deletes I should also ask your CL for >>> writes/reads? with high updates/delete + high CL I think one should >>> expect reads to slow down when sstables have not been compacted. >>> >>> You have 20G space and 17G is used by your process and I also see 36G >>> VIRT which I don't really understand why it's that high when swap is >>> disabled. Look at sar -r output too to make sure there are no swaps >>> occurring. Also, verify jna.jar is installed. >>> >>> On Mon, Oct 3, 2011 at 11:52 AM, Ramesh Natarajan <rames...@gmail.com> >>> wrote: >>> > I will start another test run to collect these stats. Our test model is >>> > in >>> > the neighborhood of 4500 inserts, 8000 updates&deletes and 1500 reads >>> > every >>> > second across 6 servers. >>> > Can you elaborate more on reducing the heap space? Do you think it is a >>> > problem with 17G RSS? >>> > thanks >>> > Ramesh >>> > >>> > >>> > On Mon, Oct 3, 2011 at 1:33 PM, Mohit Anchlia <mohitanch...@gmail.com> >>> > wrote: >>> >> >>> >> I am wondering if you are seeing issues because of more frequent >>> >> compactions kicking in. Is this primarily write ops or reads too? >>> >> During the period of test gather data like: >>> >> >>> >> 1. cfstats >>> >> 2. tpstats >>> >> 3. compactionstats >>> >> 4. netstats >>> >> 5. iostat >>> >> >>> >> You have RSS memory close to 17gb. Maybe someone can give further >>> >> advise if that could be because of mmap. You might want to lower your >>> >> heap size to 6-8G and see if that helps. >>> >> >>> >> Also, check if you have jna.jar deployed and you see malloc successful >>> >> message in the logs. >>> >> >>> >> On Mon, Oct 3, 2011 at 10:36 AM, Ramesh Natarajan <rames...@gmail.com> >>> >> wrote: >>> >> > We have 5 CF. Attached is the output from the describe command. We >>> >> > don't >>> >> > have row cache enabled. >>> >> > Thanks >>> >> > Ramesh >>> >> > Keyspace: MSA: >>> >> > Replication Strategy: org.apache.cassandra.locator.SimpleStrategy >>> >> > Durable Writes: true >>> >> > Options: [replication_factor:3] >>> >> > Column Families: >>> >> > ColumnFamily: admin >>> >> > Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type >>> >> > Default column value validator: >>> >> > org.apache.cassandra.db.marshal.UTF8Type >>> >> > Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type >>> >> > Row cache size / save period in seconds: 0.0/0 >>> >> > Key cache size / save period in seconds: 200000.0/14400 >>> >> > Memtable thresholds: 0.5671875/1440/121 (millions of >>> >> > ops/minutes/MB) >>> >> > GC grace seconds: 3600 >>> >> > Compaction min/max thresholds: 4/32 >>> >> > Read repair chance: 1.0 >>> >> > Replicate on write: true >>> >> > Built indexes: [] >>> >> > ColumnFamily: modseq >>> >> > Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type >>> >> > Default column value validator: >>> >> > org.apache.cassandra.db.marshal.UTF8Type >>> >> > Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type >>> >> > Row cache size / save period in seconds: 0.0/0 >>> >> > Key cache size / save period in seconds: 500000.0/14400 >>> >> > Memtable thresholds: 0.5671875/1440/121 (millions of >>> >> > ops/minutes/MB) >>> >> > GC grace seconds: 3600 >>> >> > Compaction min/max thresholds: 4/32 >>> >> > Read repair chance: 1.0 >>> >> > Replicate on write: true >>> >> > Built indexes: [] >>> >> > ColumnFamily: msgid >>> >> > Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type >>> >> > Default column value validator: >>> >> > org.apache.cassandra.db.marshal.UTF8Type >>> >> > Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type >>> >> > Row cache size / save period in seconds: 0.0/0 >>> >> > Key cache size / save period in seconds: 500000.0/14400 >>> >> > Memtable thresholds: 0.5671875/1440/121 (millions of >>> >> > ops/minutes/MB) >>> >> > GC grace seconds: 864000 >>> >> > Compaction min/max thresholds: 4/32 >>> >> > Read repair chance: 1.0 >>> >> > Replicate on write: true >>> >> > Built indexes: [] >>> >> > ColumnFamily: participants >>> >> > Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type >>> >> > Default column value validator: >>> >> > org.apache.cassandra.db.marshal.UTF8Type >>> >> > Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type >>> >> > Row cache size / save period in seconds: 0.0/0 >>> >> > Key cache size / save period in seconds: 500000.0/14400 >>> >> > Memtable thresholds: 0.5671875/1440/121 (millions of >>> >> > ops/minutes/MB) >>> >> > GC grace seconds: 3600 >>> >> > Compaction min/max thresholds: 4/32 >>> >> > Read repair chance: 1.0 >>> >> > Replicate on write: true >>> >> > Built indexes: [] >>> >> > ColumnFamily: uid >>> >> > Key Validation Class: org.apache.cassandra.db.marshal.UTF8Type >>> >> > Default column value validator: >>> >> > org.apache.cassandra.db.marshal.UTF8Type >>> >> > Columns sorted by: org.apache.cassandra.db.marshal.UTF8Type >>> >> > Row cache size / save period in seconds: 0.0/0 >>> >> > Key cache size / save period in seconds: 2000000.0/14400 >>> >> > Memtable thresholds: 0.4/1440/121 (millions of ops/minutes/MB) >>> >> > GC grace seconds: 3600 >>> >> > Compaction min/max thresholds: 4/32 >>> >> > Read repair chance: 1.0 >>> >> > Replicate on write: true >>> >> > Built indexes: [] >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > On Mon, Oct 3, 2011 at 12:26 PM, Mohit Anchlia >>> >> > <mohitanch...@gmail.com> >>> >> > wrote: >>> >> >> >>> >> >> On Mon, Oct 3, 2011 at 10:12 AM, Ramesh Natarajan >>> >> >> <rames...@gmail.com> >>> >> >> wrote: >>> >> >> > I am running a cassandra cluster of 6 nodes running RHEL6 >>> >> >> > virtualized >>> >> >> > by >>> >> >> > ESXi 5.0. Each VM is configured with 20GB of ram and 12 cores. >>> >> >> > Our >>> >> >> > test >>> >> >> > setup performs about 3000 inserts per second. The cassandra >>> >> >> > data >>> >> >> > partition >>> >> >> > is on a XFS filesystem mounted with options >>> >> >> > (noatime,nodiratime,nobarrier,logbufs=8). We have no swap enabled >>> >> >> > on >>> >> >> > the >>> >> >> > VMs >>> >> >> > and the vm.swappiness set to 0. To avoid any contention issues >>> >> >> > our >>> >> >> > cassandra >>> >> >> > VMs are not running any other application other than cassandra. >>> >> >> > The test runs fine for about 12 hours or so. After that the >>> >> >> > performance >>> >> >> > starts to degrade to about 1500 inserts per sec. By 18-20 hours >>> >> >> > the >>> >> >> > inserts >>> >> >> > go down to 300 per sec. >>> >> >> > if i do a truncate, it starts clean, runs for a few hours (not as >>> >> >> > clean >>> >> >> > as >>> >> >> > rebooting). >>> >> >> > We find a direct correlation between kswapd kicking in after 12 >>> >> >> > hours >>> >> >> > or >>> >> >> > so >>> >> >> > and the performance degradation. If i look at the cached memory >>> >> >> > it >>> >> >> > is >>> >> >> > close to 10G. I am not getting a OOM error in cassandra. So >>> >> >> > looks >>> >> >> > like >>> >> >> > we >>> >> >> > are not running out of memory. Can some one explain if we can >>> >> >> > optimize >>> >> >> > this >>> >> >> > so that kswapd doesn't kick in. >>> >> >> > >>> >> >> > Our top output shows >>> >> >> > top - 16:23:54 up 2 days, 23:17, 4 users, load average: 2.21, >>> >> >> > 2.08, >>> >> >> > 2.02 >>> >> >> > Tasks: 213 total, 1 running, 212 sleeping, 0 stopped, 0 >>> >> >> > zombie >>> >> >> > Cpu(s): 1.6%us, 0.8%sy, 0.0%ni, 90.9%id, 6.3%wa, 0.0%hi, >>> >> >> > 0.2%si, >>> >> >> > 0.0%st >>> >> >> > Mem: 20602812k total, 20320424k used, 282388k free, 1020k >>> >> >> > buffers >>> >> >> > Swap: 0k total, 0k used, 0k free, 10145516k >>> >> >> > cached >>> >> >> > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ >>> >> >> > COMMAND >>> >> >> > >>> >> >> > 2586 root 20 0 36.3g 17g 8.4g S 32.1 88.9 8496:37 java >>> >> >> > >>> >> >> > java output >>> >> >> > root 2453 1 99 Sep30 pts/0 9-13:51:38 java -ea >>> >> >> > -javaagent:./apache-cassandra-0.8.6/bin/../lib/jamm-0.2.2.jar >>> >> >> > -XX:+UseThreadPriorities -XX:ThreadPriorityPolicy=42 -Xms10059M >>> >> >> > -Xmx10059M >>> >> >> > -Xmn1200M -XX:+HeapDumpOnOutOfMemoryError -Xss128k >>> >> >> > -XX:+UseParNewGC >>> >> >> > -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled >>> >> >> > -XX:SurvivorRatio=8 >>> >> >> > -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 >>> >> >> > -XX:+UseCMSInitiatingOccupancyOnly >>> >> >> > -Djava.net.preferIPv4Stack=true >>> >> >> > -Dcom.sun.management.jmxremote.port=7199 >>> >> >> > -Dcom.sun.management.jmxremote.ssl=false >>> >> >> > -Dcom.sun.management.jmxremote.authenticate=false >>> >> >> > -Djava.rmi.server.hostname=10.19.104.14 >>> >> >> > -Djava.net.preferIPv4Stack=true >>> >> >> > -Dlog4j.configuration=log4j-server.properties >>> >> >> > -Dlog4j.defaultInitOverride=true -cp >>> >> >> > >>> >> >> > >>> >> >> > >>> >> >> > ./apache-cassandra-0.8.6/bin/../conf:./apache-cassandra-0.8.6/bin/../build/classes/main:./apache-cassandra-0.8.6/bin/../build/classes/thrift:./apache-cassandra-0.8.6/bin/../lib/antlr-3.2.jar:./apache-cassandra-0.8.6/bin/../lib/apache-cassandra-0.8.6.jar:./apache-cassandra-0.8.6/bin/../lib/apache-cassandra-thrift-0.8.6.jar:./apache-cassandra-0.8.6/bin/../lib/avro-1.4.0-fixes.jar:./apache-cassandra-0.8.6/bin/../lib/avro-1.4.0-sources-fixes.jar:./apache-cassandra-0.8.6/bin/../lib/commons-cli-1.1.jar:./apache-cassandra-0.8.6/bin/../lib/commons-codec-1.2.jar:./apache-cassandra-0.8.6/bin/../lib/commons-collections-3.2.1.jar:./apache-cassandra-0.8.6/bin/../lib/commons-lang-2.4.jar:./apache-cassandra-0.8.6/bin/../lib/concurrentlinkedhashmap-lru-1.1.jar:./apache-cassandra-0.8.6/bin/../lib/guava-r08.jar:./apache-cassandra-0.8.6/bin/../lib/high-scale-lib-1.1.2.jar:./apache-cassandra-0.8.6/bin/../lib/jackson-core-asl-1.4.0.jar:./apache-cassandra-0.8.6/bin/../lib/jackson-mapper-asl-1.4.0.jar:./apache-cassandra-0.8.6/bin/../lib/jamm-0.2.2.jar:./apache-cassandra-0.8.6/bin/../lib/jline-0.9.94.jar:./apache-cassandra-0.8.6/bin/../lib/json-simple-1.1.jar:./apache-cassandra-0.8.6/bin/../lib/libthrift-0.6.jar:./apache-cassandra-0.8.6/bin/../lib/log4j-1.2.16.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-examples.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-impl.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-jmx.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-remote.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-rimpl.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-rjmx.jar:./apache-cassandra-0.8.6/bin/../lib/mx4j-tools.jar:./apache-cassandra-0.8.6/bin/../lib/servlet-api-2.5-20081211.jar:./apache-cassandra-0.8.6/bin/../lib/slf4j-api-1.6.1.jar:./apache-cassandra-0.8.6/bin/../lib/slf4j-log4j12-1.6.1.jar:./apache-cassandra-0.8.6/bin/../lib/snakeyaml-1.6.jar >>> >> >> > org.apache.cassandra.thrift.CassandraDaemon >>> >> >> > >>> >> >> > >>> >> >> > Ring output >>> >> >> > [root@CAP4-CNode4 apache-cassandra-0.8.6]# ./bin/nodetool -h >>> >> >> > 127.0.0.1 >>> >> >> > ring >>> >> >> > Address DC Rack Status State Load >>> >> >> > Owns >>> >> >> > Token >>> >> >> > >>> >> >> > 141784319550391026443072753096570088105 >>> >> >> > 10.19.104.11 datacenter1 rack1 Up Normal 19.92 GB >>> >> >> > 16.67% 0 >>> >> >> > 10.19.104.12 datacenter1 rack1 Up Normal 19.3 GB >>> >> >> > 16.67% 28356863910078205288614550619314017621 >>> >> >> > 10.19.104.13 datacenter1 rack1 Up Normal 18.57 GB >>> >> >> > 16.67% 56713727820156410577229101238628035242 >>> >> >> > 10.19.104.14 datacenter1 rack1 Up Normal 19.34 GB >>> >> >> > 16.67% 85070591730234615865843651857942052863 >>> >> >> > 10.19.105.11 datacenter1 rack1 Up Normal 19.88 GB >>> >> >> > 16.67% 113427455640312821154458202477256070484 >>> >> >> > 10.19.105.12 datacenter1 rack1 Up Normal 20 GB >>> >> >> > 16.67% 141784319550391026443072753096570088105 >>> >> >> > [root@CAP4-CNode4 apache-cassandra-0.8.6]# >>> >> >> >>> >> >> How many CFs? can you describe CF and post the configuration? Do >>> >> >> you >>> >> >> have row cache enabled? >>> >> > >>> >> > >>> > >>> > >> > >