> RF=2 I would recommend moving the RF 3, the QUOURM for 2 is 2. > We can't find anything in the cassandra logs indicating that something's up > (such as a slow GC or compaction), and there's no corresponding traffic spike > in the application either Does the CPU load correlate with compaction or repair times ?
The node is not waiting on IO and is using all the available CPU, which is a good thing. Have you seen an increase in latency ? Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 8/10/2012, at 10:25 PM, Adeel Akbar <adeel.ak...@panasiangroup.com> wrote: > Hi, > > We're running a small Cassandra cluster (1.1.4) with two nodes and serving > data to our Web and Java application. After up-gradation of Cassandra from > 1.0.8 to 1.1.4, we're starting to see some weird issues. > > If we run 'ring' command from second node, its show that failed to connect > 7199 of node 1. > > $ /opt/apache-cassandra-1.1.4/bin/nodetool -h XX.XX.XX.01 ring > Failed to connect to 'XX.XX.XX.01:7199': Connection refused > > We're using Network Monitoring System and Monit to monitor the servers, and > in NMS the average CPU usage is around increased upto 500%, on our quad-core > Xeon servers with 16 GB RAM. But occasionally through Monit we can see that > the 1-min load average goes above 7. Is this common? Does this happen to > everyone else? And why the spikiness in load? We can't find anything in the > cassandra logs indicating that something's up (such as a slow GC or > compaction), and there's no corresponding traffic spike in the application > either. Should we just add more nodes if any single one gets CPU spikes? > > Another explanation could also be that we've configured it wrong. We're > running pretty much default config and each node has 16G of RAM. > > A single keyspace with 15 to 20 column families, RF=2, and we have 260 GB of > actual data. Please find below top and I/O stats for further reference; > > top - 14:21:51 up 29 days, 9:52, 1 user, load average: 6.59, 3.16, 1.42 > Tasks: 163 total, 2 running, 161 sleeping, 0 stopped, 0 zombie > Cpu0 : 29.0%us, 0.0%sy, 0.0%ni, 71.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu1 : 28.0%us, 0.0%sy, 0.0%ni, 72.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu2 : 13.3%us, 0.0%sy, 0.0%ni, 86.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu3 : 23.5%us, 0.7%sy, 0.0%ni, 75.5%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st > Cpu4 : 89.4%us, 0.3%sy, 0.0%ni, 10.0%id, 0.0%wa, 0.0%hi, 0.3%si, 0.0%st > Cpu5 : 29.2%us, 0.0%sy, 0.0%ni, 70.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu6 : 25.1%us, 0.0%sy, 0.0%ni, 74.9%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu7 : 24.3%us, 0.0%sy, 0.0%ni, 72.0%id, 0.0%wa, 2.3%hi, 1.3%si, 0.0%st > Mem: 16427844k total, 16317416k used, 110428k free, 128824k buffers > Swap: 0k total, 0k used, 0k free, 11344696k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > > 5284 root 18 0 265g 7.7g 3.6g S 266.6 49.0 474:24.38 java -ea > -javaagent:/opt/apache-cassandra-1.1.4/bin/../lib/jamm-0.2.5.jar > -XX:+UseThreadPriorities -XX:Thr > 1 root 15 0 10368 660 548 S 0.0 0.0 0:01.64 init [3] > > > > # iostat -xmn 2 10 > -x and -n options are mutually exclusive > > avg-cpu: %user %nice %system %iowait %steal %idle > 9.77 0.03 0.54 0.98 0.00 88.68 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > sda 0.59 3.97 5.54 0.42 0.20 0.02 75.52 > 0.11 19.10 3.55 2.11 > sda1 0.00 0.00 0.01 0.00 0.00 0.00 88.69 > 0.00 1.36 1.31 0.00 > sda2 0.59 3.97 5.53 0.42 0.20 0.02 75.51 > 0.11 19.12 3.55 2.11 > sdb 1.54 7.82 10.39 0.64 0.28 0.03 57.77 > 0.36 32.61 4.27 4.70 > sdb1 1.54 7.82 10.39 0.64 0.28 0.03 57.77 > 0.36 32.61 4.27 4.70 > dm-0 0.00 0.00 1.73 0.62 0.02 0.00 19.27 > 0.02 6.75 0.90 0.21 > dm-1 0.00 0.00 16.32 12.23 0.46 0.05 36.47 > 0.50 17.67 2.07 5.92 > dm-2 0.00 0.00 0.00 0.00 0.00 0.00 8.00 > 0.00 7.10 3.41 0.00 > > Device: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s > rMB_svr/s wMB_svr/s ops/s rops/s wops/s > > avg-cpu: %user %nice %system %iowait %steal %idle > 12.46 0.00 0.00 0.19 0.00 87.35 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > sda 0.00 2.50 0.00 1.00 0.00 0.01 28.00 > 0.00 0.00 0.00 0.00 > sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sda2 0.00 2.50 0.00 1.00 0.00 0.01 28.00 > 0.00 0.00 0.00 0.00 > sdb 0.00 4.50 0.50 1.50 0.00 0.02 28.00 > 0.01 6.00 6.00 1.20 > sdb1 0.00 4.50 0.50 1.50 0.00 0.02 28.00 > 0.01 6.00 6.00 1.20 > dm-0 0.00 0.00 0.50 4.50 0.00 0.02 8.80 > 0.04 8.00 2.40 1.20 > dm-1 0.00 0.00 0.00 5.00 0.00 0.02 8.00 > 0.00 0.00 0.00 0.00 > dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > > Device: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s > rMB_svr/s wMB_svr/s ops/s rops/s wops/s > > avg-cpu: %user %nice %system %iowait %steal %idle > 12.52 0.00 0.00 0.00 0.00 87.48 > > Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz > avgqu-sz await svctm %util > sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00 > 0.00 0.00 0.00 0.00 > > Device: rMB_nor/s wMB_nor/s rMB_dir/s wMB_dir/s > rMB_svr/s wMB_svr/s ops/s rops/s wops/s > > Please help us to improve performance of Cassandra cluster as well as fix all > issues. > -- > > Thanks & Regards > > Adeel Akbar >