Hi,
We're running a small Cassandra cluster (1.1.4) with two nodes and
serving data to our Web and Java application. After up-gradation of
Cassandra from 1.0.8 to 1.1.4, we're starting to see some weird issues.
If we run 'ring' command from second node, its show that failed to
connect 7199 of node 1.
$ /opt/apache-cassandra-1.1.4/bin/nodetool -h XX.XX.XX.01 ring
Failed to connect to 'XX.XX.XX.01:7199': Connection refused
We're using Network Monitoring System and Monit to monitor the servers,
and in NMS the average CPU usage is around increased upto 500%, on our
quad-core Xeon servers with 16 GB RAM. But occasionally through Monit we
can see that the 1-min load average goes above 7. Is this common? Does
this happen to everyone else? And why the spikiness in load? We can't
find anything in the cassandra logs indicating that something's up (such
as a slow GC or compaction), and there's no corresponding traffic spike
in the application either. Should we just add more nodes if any single
one gets CPU spikes?
Another explanation could also be that we've configured it wrong. We're
running pretty much default config and each node has 16G of RAM.
A single keyspace with 15 to 20 column families, RF=2, and we have 260
GB of actual data. Please find below top and I/O stats for further
reference;
top - 14:21:51 up 29 days, 9:52, 1 user, load average: 6.59, 3.16, 1.42
Tasks: 163 total, 2 running, 161 sleeping, 0 stopped, 0 zombie
Cpu0 : 29.0%us, 0.0%sy, 0.0%ni, 71.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Cpu1 : 28.0%us, 0.0%sy, 0.0%ni, 72.0%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Cpu2 : 13.3%us, 0.0%sy, 0.0%ni, 86.7%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Cpu3 : 23.5%us, 0.7%sy, 0.0%ni, 75.5%id, 0.0%wa, 0.0%hi, 0.3%si,
0.0%st
Cpu4 : 89.4%us, 0.3%sy, 0.0%ni, 10.0%id, 0.0%wa, 0.0%hi, 0.3%si,
0.0%st
Cpu5 : 29.2%us, 0.0%sy, 0.0%ni, 70.8%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Cpu6 : 25.1%us, 0.0%sy, 0.0%ni, 74.9%id, 0.0%wa, 0.0%hi, 0.0%si,
0.0%st
Cpu7 : 24.3%us, 0.0%sy, 0.0%ni, 72.0%id, 0.0%wa, 2.3%hi, 1.3%si,
0.0%st
Mem: 16427844k total, 16317416k used, 110428k free, 128824k buffers
Swap: 0k total, 0k used, 0k free, 11344696k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5284 root 18 0 265g 7.7g 3.6g S 266.6 49.0 474:24.38 java -ea
-javaagent:/opt/apache-cassandra-1.1.4/bin/../lib/jamm-0.2.5.jar
-XX:+UseThreadPriorities -XX:Thr
1 root 15 0 10368 660 548 S 0.0 0.0 0:01.64 init [3]
# iostat -xmn 2 10
-x and -n options are mutually exclusive
avg-cpu: %user %nice %system %iowait %steal %idle
9.77 0.03 0.54 0.98 0.00 88.68
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.59 3.97 5.54 0.42 0.20 0.02 75.52
0.11 19.10 3.55 2.11
sda1 0.00 0.00 0.01 0.00 0.00 0.00 88.69
0.00 1.36 1.31 0.00
sda2 0.59 3.97 5.53 0.42 0.20 0.02 75.51
0.11 19.12 3.55 2.11
sdb 1.54 7.82 10.39 0.64 0.28 0.03 57.77
0.36 32.61 4.27 4.70
sdb1 1.54 7.82 10.39 0.64 0.28 0.03 57.77
0.36 32.61 4.27 4.70
dm-0 0.00 0.00 1.73 0.62 0.02 0.00 19.27
0.02 6.75 0.90 0.21
dm-1 0.00 0.00 16.32 12.23 0.46 0.05 36.47
0.50 17.67 2.07 5.92
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 8.00
0.00 7.10 3.41 0.00
Device: rMB_nor/s wMB_nor/s rMB_dir/s
wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
avg-cpu: %user %nice %system %iowait %steal %idle
12.46 0.00 0.00 0.19 0.00 87.35
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 2.50 0.00 1.00 0.00 0.01 28.00
0.00 0.00 0.00 0.00
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
sda2 0.00 2.50 0.00 1.00 0.00 0.01 28.00
0.00 0.00 0.00 0.00
sdb 0.00 4.50 0.50 1.50 0.00 0.02 28.00
0.01 6.00 6.00 1.20
sdb1 0.00 4.50 0.50 1.50 0.00 0.02 28.00
0.01 6.00 6.00 1.20
dm-0 0.00 0.00 0.50 4.50 0.00 0.02 8.80
0.04 8.00 2.40 1.20
dm-1 0.00 0.00 0.00 5.00 0.00 0.02 8.00
0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
Device: rMB_nor/s wMB_nor/s rMB_dir/s
wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
avg-cpu: %user %nice %system %iowait %steal %idle
12.52 0.00 0.00 0.00 0.00 87.48
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz
avgqu-sz await svctm %util
sda 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
sda1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
sda2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
sdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
sdb1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
dm-0 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
dm-2 0.00 0.00 0.00 0.00 0.00 0.00 0.00
0.00 0.00 0.00 0.00
Device: rMB_nor/s wMB_nor/s rMB_dir/s
wMB_dir/s rMB_svr/s wMB_svr/s ops/s rops/s wops/s
Please help us to improve performance of Cassandra cluster as well as
fix all issues.
--
Thanks & Regards
*Adeel**Akbar*