Severe Reliability Problems - 0.7 RC2

Dan Hendry Mon, 20 Dec 2010 10:49:26 -0800

I have been having severe and strange reliability problems within my
Cassandra cluster. This weekend, all four of my nodes were down at once.
Even now I am loosing one every few hours. I have attached output from all
the system monitoring commands I can think of.


What seems to happen is that the java process locks up and sits and has 100%
system cpu usage (but no user-CPU) (there are 8 cores so 100%=1/8 total
capacity). JMX freezes and the node effectively dies, but there is typically
nothing unusual in the Cassandra logs. About the only thing which seems to
be correlated is the flushing of memtables tables. One of the strangest
stats I am getting when in this state is memory paging: 3727168.00 pages
scanned/second (see sar -B output). Occasionally, if I leave the process
alone (~1 h) it recovers (maybe 1 in 5 times), otherwise the only way to
terminate the Cassandra process is with a kill -9. When this happens,
Cassandra memory usage (as reported by JMX before it dies) is
also reasonable (ex 6 GB out of 12 GB heap and 24 GB system).

This feels more like a system level problem than a Cassandra problem so I
have tried diversifying my cluster, one node runs Ubuntu 10.10, the other
three 10.04. One runs OpenJDK (1.6.0_20), the rest run Sun JDK (1.6.0_22).
Neither change seems be correlated with the problem. These are pretty much
stock ubuntu installs so nothing special on that front.

Now this has been a relatively sudden development and I can potentially
attribute it to a few things:
1. Upgrading to RC2
2. Ever increasing amounts of data (there is less than 100 gb per node so
this should not be the problem).
3. Migrating from a set of machines where data+commit log directories were
on four small raid 5 hds to machines with two 500 gig drives: one for data
and one for commitlog + os. I have seen more IO wait on these new machines.
But they have the same memory and system settings.

I am about at my wits end on this one, any help would be appreciated.

vmstat output

procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa
 1  0   5472  93056   8528 10748436    0    0    90    74   88   11  3  2 94  2
 1  0   5472  93052   8528 10748460    0    0     0     4  409  521  0 18 82  0
 1  0   5472  93052   8528 10748460    0    0     0     0  365  485  0 16 84  0
 1  0   5472  93052   8528 10748460    0    0     0     0  358  481  0 11 89  0
 1  0   5472  93052   8528 10748460    0    0     0     0  369  484  0 12 88  0

TOP Output 

top - 10:58:19 up 12:57,  2 users,  load average: 1.07, 1.21, 1.51
Tasks: 185 total,   1 running, 184 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us, 11.1%sy,  0.0%ni, 88.8%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  24733336k total, 24643380k used,    89956k free,     8752k buffers
Swap:  1952760k total,     5472k used,  1947288k free, 10748616k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
                                      
 1442 dhendry   20   0 91.9g  20g 8.4g S  100 88.4 263:15.73 java               
                                      
 1101 root      20   0 11280  584  448 S    0  0.0   0:46.48 irqbalance         
                                      
 4352 dhendry   20   0 19220 1472 1064 R    0  0.0   0:00.02 top                
                                      
    1 root      20   0 23704 1764 1216 S    0  0.0   0:01.54 init               
                                      
    2 root      20   0     0    0    0 S    0  0.0   0:00.02 kthreadd           
                                      
    3 root      RT   0     0    0    0 S    0  0.0   0:00.00 migration/0


output of sar -u 1 5

Linux 2.6.32-21-server (talk12)         12/20/2010      _x86_64_        (8 CPU)

11:12:25 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
11:12:26 AM     all      0.00      0.00     18.20      0.00      0.00     81.80
11:12:27 AM     all      0.00      0.00     13.70      0.41      0.00     85.89
11:12:28 AM     all      0.00      0.00     10.52      0.00      0.00     89.48
11:12:29 AM     all      0.00      0.00     16.94      0.00      0.00     83.06
11:12:30 AM     all      0.00      0.00     14.92      0.00      0.00     85.08
Average:        all      0.00      0.00     14.32      0.08      0.00     85.60

Output of sar -B 1 5

Linux 2.6.32-21-server (talk12)         12/20/2010      _x86_64_        (8 CPU)

11:12:59 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s 
pgscand/s pgsteal/s    %vmeff
11:13:00 AM      0.00      0.00     38.00      0.00     61.00      0.00 
3727168.00      0.00      0.00
11:13:01 AM      0.00      0.00     39.00      0.00     61.00      0.00 
3714400.00      0.00      0.00
11:13:02 AM      0.00      0.00     33.00      0.00     61.00      0.00 
3724384.00      0.00      0.00
11:13:03 AM      0.00      0.00     33.00      0.00     61.00      0.00 
3716192.00      0.00      0.00
11:13:04 AM      0.00      0.00     33.00      0.00     61.00      0.00 
3713184.00      0.00      0.00
Average:         0.00      0.00     35.20      0.00     61.00      0.00 
3719065.60      0.00      0.00

iostat output:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           2.30    0.39    1.95    1.83    0.00   93.54

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               2.03        26.04       496.73    1216622   23207610
sdb              21.59      1425.70       697.33   66609722   32579816

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   16.53    0.00    0.00   83.47

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           0.00    0.00   10.59    0.00    0.00   89.41

Device:            tps   Blk_read/s   Blk_wrtn/s   Blk_read   Blk_wrtn
sda               0.00         0.00         0.00          0          0
sdb               0.00         0.00         0.00          0          0


output of free -m

             total       used       free     shared    buffers     cached
Mem:         24153      24039        114          0          9      10459
-/+ buffers/cache:      13570      10582
Swap:         1906          5       1901

Cassandra log:

 INFO [COMMIT-LOG-WRITER] 2010-12-20 10:44:41,936 CommitLogSegment.java (line 
50) Creating new commitlog segment 
/var/lib/cassandra/commitlog/CommitLog-1292863481936.log
 INFO [COMMIT-LOG-WRITER] 2010-12-20 10:45:39,167 CommitLogSegment.java (line 
50) Creating new commitlog segment 
/var/lib/cassandra/commitlog/CommitLog-1292863539166.log
 INFO [MutationStage:26] 2010-12-20 10:48:29,140 ColumnFamilyStore.java (line 
640) switching in a fresh Memtable for UserEventsByUser at 
CommitLogContext(file='/var/lib/cassandra/commitlog/CommitLog-1292863539166.log',
 position=49812421)
 INFO [MutationStage:26] 2010-12-20 10:48:29,140 ColumnFamilyStore.java (line 
944) Enqueuing flush of memtable-usereventsbyu...@1784371754(162606261 bytes, 
2097613 operations)
 INFO [FlushWriter:1] 2010-12-20 10:48:29,140 Memtable.java (line 155) Writing 
memtable-usereventsbyu...@1784371754(162606261 bytes, 2097613 operations)

Severe Reliability Problems - 0.7 RC2

Reply via email to