> RF=2
I would recommend moving the RF 3, the QUOURM for 2 is 2. 

> We can't find anything in the cassandra logs indicating that something's up 
> (such as a slow GC or compaction), and there's no corresponding traffic spike 
> in the application either
Does the CPU load correlate with compaction or repair times ?

The node is not waiting on IO and is using all the available CPU, which is a 
good thing. Have you seen an increase in latency ? 

Cheers


-----------------
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 8/10/2012, at 10:25 PM, Adeel Akbar <adeel.ak...@panasiangroup.com> wrote:

> Hi,
> 
> We're running a small Cassandra cluster (1.1.4) with two nodes and serving 
> data to our Web and Java application. After up-gradation of Cassandra from 
> 1.0.8 to 1.1.4, we're starting to see some weird issues. 
> 
> If we run 'ring' command from second node, its show that failed to connect 
> 7199 of node 1. 
> 
> $ /opt/apache-cassandra-1.1.4/bin/nodetool -h XX.XX.XX.01  ring
> Failed to connect to 'XX.XX.XX.01:7199': Connection refused
> 
> We're using Network Monitoring System and Monit to monitor the servers, and 
> in NMS the average CPU usage is around increased upto 500%, on our quad-core 
> Xeon servers with 16 GB RAM. But occasionally through Monit we can see that 
> the 1-min load average goes above 7. Is this common? Does this happen to 
> everyone else? And why the spikiness in load? We can't find anything in the 
> cassandra logs indicating that something's up (such as a slow GC or 
> compaction), and there's no corresponding traffic spike in the application 
> either. Should we just add more nodes if any single one gets CPU spikes?
> 
> Another explanation could also be that we've configured it wrong. We're 
> running pretty much default config and each node has 16G of RAM.
> 
> A single keyspace with 15 to 20 column families, RF=2, and we have 260 GB of 
> actual data. Please find below top and I/O stats for further reference;
> 
> top - 14:21:51 up 29 days,  9:52,  1 user,  load average: 6.59, 3.16, 1.42
> Tasks: 163 total,   2 running, 161 sleeping,   0 stopped,   0 zombie
> Cpu0  : 29.0%us,  0.0%sy,  0.0%ni, 71.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu1  : 28.0%us,  0.0%sy,  0.0%ni, 72.0%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu2  : 13.3%us,  0.0%sy,  0.0%ni, 86.7%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu3  : 23.5%us,  0.7%sy,  0.0%ni, 75.5%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
> Cpu4  : 89.4%us,  0.3%sy,  0.0%ni, 10.0%id,  0.0%wa,  0.0%hi,  0.3%si,  0.0%st
> Cpu5  : 29.2%us,  0.0%sy,  0.0%ni, 70.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu6  : 25.1%us,  0.0%sy,  0.0%ni, 74.9%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
> Cpu7  : 24.3%us,  0.0%sy,  0.0%ni, 72.0%id,  0.0%wa,  2.3%hi,  1.3%si,  0.0%st
> Mem:  16427844k total, 16317416k used,   110428k free,   128824k buffers
> Swap:        0k total,        0k used,        0k free, 11344696k cached
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
>                                                                               
>                   
>  5284 root      18   0  265g 7.7g 3.6g S 266.6 49.0 474:24.38 java -ea 
> -javaagent:/opt/apache-cassandra-1.1.4/bin/../lib/jamm-0.2.5.jar 
> -XX:+UseThreadPriorities -XX:Thr
>     1 root      15   0 10368  660  548 S  0.0  0.0   0:01.64 init [3]         
>                                                                               
>                   
> 
> # iostat -xmn 2 10
> -x and -n options are mutually exclusive
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>            9.77    0.03    0.54    0.98    0.00   88.68
> 
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> sda               0.59     3.97  5.54  0.42     0.20     0.02    75.52     
> 0.11   19.10   3.55   2.11
> sda1              0.00     0.00  0.01  0.00     0.00     0.00    88.69     
> 0.00    1.36   1.31   0.00
> sda2              0.59     3.97  5.53  0.42     0.20     0.02    75.51     
> 0.11   19.12   3.55   2.11
> sdb               1.54     7.82 10.39  0.64     0.28     0.03    57.77     
> 0.36   32.61   4.27   4.70
> sdb1              1.54     7.82 10.39  0.64     0.28     0.03    57.77     
> 0.36   32.61   4.27   4.70
> dm-0              0.00     0.00  1.73  0.62     0.02     0.00    19.27     
> 0.02    6.75   0.90   0.21
> dm-1              0.00     0.00 16.32 12.23     0.46     0.05    36.47     
> 0.50   17.67   2.07   5.92
> dm-2              0.00     0.00  0.00  0.00     0.00     0.00     8.00     
> 0.00    7.10   3.41   0.00
> 
> Device:                   rMB_nor/s    wMB_nor/s    rMB_dir/s    wMB_dir/s    
> rMB_svr/s    wMB_svr/s     ops/s    rops/s    wops/s
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           12.46    0.00    0.00    0.19    0.00   87.35
> 
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> sda               0.00     2.50  0.00  1.00     0.00     0.01    28.00     
> 0.00    0.00   0.00   0.00
> sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     
> 0.00    0.00   0.00   0.00
> sda2              0.00     2.50  0.00  1.00     0.00     0.01    28.00     
> 0.00    0.00   0.00   0.00
> sdb               0.00     4.50  0.50  1.50     0.00     0.02    28.00     
> 0.01    6.00   6.00   1.20
> sdb1              0.00     4.50  0.50  1.50     0.00     0.02    28.00     
> 0.01    6.00   6.00   1.20
> dm-0              0.00     0.00  0.50  4.50     0.00     0.02     8.80     
> 0.04    8.00   2.40   1.20
> dm-1              0.00     0.00  0.00  5.00     0.00     0.02     8.00     
> 0.00    0.00   0.00   0.00
> dm-2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     
> 0.00    0.00   0.00   0.00
> 
> Device:                   rMB_nor/s    wMB_nor/s    rMB_dir/s    wMB_dir/s    
> rMB_svr/s    wMB_svr/s     ops/s    rops/s    wops/s
> 
> avg-cpu:  %user   %nice %system %iowait  %steal   %idle
>           12.52    0.00    0.00    0.00    0.00   87.48
> 
> Device:         rrqm/s   wrqm/s   r/s   w/s    rMB/s    wMB/s avgrq-sz 
> avgqu-sz   await  svctm  %util
> sda               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
> 0.00    0.00   0.00   0.00
> sda1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     
> 0.00    0.00   0.00   0.00
> sda2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     
> 0.00    0.00   0.00   0.00
> sdb               0.00     0.00  0.00  0.00     0.00     0.00     0.00     
> 0.00    0.00   0.00   0.00
> sdb1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     
> 0.00    0.00   0.00   0.00
> dm-0              0.00     0.00  0.00  0.00     0.00     0.00     0.00     
> 0.00    0.00   0.00   0.00
> dm-1              0.00     0.00  0.00  0.00     0.00     0.00     0.00     
> 0.00    0.00   0.00   0.00
> dm-2              0.00     0.00  0.00  0.00     0.00     0.00     0.00     
> 0.00    0.00   0.00   0.00
> 
> Device:                   rMB_nor/s    wMB_nor/s    rMB_dir/s    wMB_dir/s    
> rMB_svr/s    wMB_svr/s     ops/s    rops/s    wops/s
> 
> Please help us to improve performance of Cassandra cluster as well as fix all 
> issues. 
> -- 
> 
> Thanks & Regards
> 
> Adeel Akbar
> 

Reply via email to