Re: High CPU usage on all nodes without any read or write

Olivier Rosello Thu, 08 Jul 2010 06:45:34 -0700

Hi,

Thank you for your help.


I don't know if data is writing too fast to the cluster, but I don't think so 
(nodes are heavy, big CPU, 12GB RAM...) and there is no so much data (2000 
inserts/sec for about 300 KB/sec of raw data).


I trashed all data yesterday 6pm (GMT+2) and launched all again.

All was fine since now : one node (total of 4) begins to make timeouts on write 
(see cfstats and tpstats below).

CPU is between 100% and 250% (writing on cluster continues).


r...@cassandra-2:~# iostat -x -k 5
Linux 2.6.31-22-server (cassandra-2)    07/08/2010      _x86_64_        (8 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           3.35    0.00    0.60    0.02    0.00   96.03

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda               0.92    96.85    0.34    2.35    44.31   396.77   328.83     
0.08   28.18   1.09   0.29

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          17.95    0.00    1.27    0.05    0.00   80.73

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda               0.00    76.00    0.60   10.60     2.40   346.40    62.29     
0.00    0.36   0.36   0.40

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          17.40    0.00    0.15    0.00    0.00   82.45

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz 
avgqu-sz   await  svctm  %util
sda               0.00    23.00    0.00    1.20     0.00    96.80   161.33     
0.00    0.00   0.00   0.00



But in Cassandra output log :
r...@cassandra-2:~#  tail -f /var/log/cassandra/output.log 
 INFO 15:32:05,390 GC for ConcurrentMarkSweep: 1359 ms, 4295787600 reclaimed 
leaving 1684169392 used; max is 6563430400
 INFO 15:32:09,875 GC for ConcurrentMarkSweep: 1363 ms, 4296991416 reclaimed 
leaving 1684201560 used; max is 6563430400
 INFO 15:32:14,370 GC for ConcurrentMarkSweep: 1341 ms, 4295467880 reclaimed 
leaving 1684879440 used; max is 6563430400
 INFO 15:32:18,906 GC for ConcurrentMarkSweep: 1343 ms, 4296386408 reclaimed 
leaving 1685489208 used; max is 6563430400
 INFO 15:32:23,564 GC for ConcurrentMarkSweep: 1511 ms, 4296407088 reclaimed 
leaving 1685488744 used; max is 6563430400
 INFO 15:32:28,068 GC for ConcurrentMarkSweep: 1347 ms, 4295383216 reclaimed 
leaving 1686469448 used; max is 6563430400
 INFO 15:32:32,617 GC for ConcurrentMarkSweep: 1376 ms, 4295689192 reclaimed 
leaving 1687908304 used; max is 6563430400
 INFO 15:32:37,283 GC for ConcurrentMarkSweep: 1468 ms, 4296056176 reclaimed 
leaving 1687916880 used; max is 6563430400
 INFO 15:32:41,811 GC for ConcurrentMarkSweep: 1358 ms, 4296412232 reclaimed 
leaving 1688437064 used; max is 6563430400
 INFO 15:32:46,436 GC for ConcurrentMarkSweep: 1368 ms, 4296105472 reclaimed 
leaving 1691050032 used; max is 6563430400
 INFO 15:32:51,180 GC for ConcurrentMarkSweep: 1545 ms, 4297439832 reclaimed 
leaving 1691033816 used; max is 6563430400
 INFO 15:32:55,703 GC for ConcurrentMarkSweep: 1379 ms, 4295491928 reclaimed 
leaving 1692891456 used; max is 6563430400
 INFO 15:33:00,328 GC for ConcurrentMarkSweep: 1378 ms, 4296657208 reclaimed 
leaving 1694981528 used; max is 6563430400

(this don't appears to other nodes, which are currently ok)


Is this value could be linked to the problem :
Compacted row maximum size: 1202492950
 ?

I supposed that uncompacted, the row may be bigger than 2^31 bytes as written 
here :
http://wiki.apache.org/cassandra/CassandraLimitations?highlight=(related%20limitation)

Keyspace: system
        Read Count: 184
        Read Latency: 156.3704456521739 ms.
        Write Count: 591571
        Write Latency: 0.8834233777517829 ms.
        Pending Tasks: 0
                Column Family: HintsColumnFamily
                SSTable count: 13
                Space used (live): 4974145
                Space used (total): 4974145
                Memtable Columns Count: 30430
                Memtable Data Size: 289085
                Memtable Switch Count: 55
                Read Count: 181
                Read Latency: 158,899 ms.
                Write Count: 591564
                Write Latency: 0,883 ms.
                Pending Tasks: 0
                Key cache capacity: 16
                Key cache size: 2
                Key cache hit rate: 0.75
                Row cache: disabled
                Compacted row minimum size: 1630
                Compacted row maximum size: 336507
                Compacted row mean size: 155823

                Column Family: LocationInfo
                SSTable count: 2
                Space used (live): 1225
                Space used (total): 1225
                Memtable Columns Count: 0
                Memtable Data Size: 0
                Memtable Switch Count: 2
                Read Count: 3
                Read Latency: 3,844 ms.
                Write Count: 7
                Write Latency: 0,222 ms.
                Pending Tasks: 0
                Key cache capacity: 2
                Key cache size: 0
                Key cache hit rate: NaN
                Row cache: disabled
                Compacted row minimum size: 0
                Compacted row maximum size: 0
                Compacted row mean size: 0

----------------
Keyspace: MyKeyspace
        Read Count: 15755
        Read Latency: 44.23593043478261 ms.
        Write Count: 41379835
        Write Latency: 0.07479275296771966 ms.
        Pending Tasks: 0
                Column Family: Index1
                SSTable count: 14
                Space used (live): 813507746
                Space used (total): 813507746
                Memtable Columns Count: 29656
                Memtable Data Size: 919336
                Memtable Switch Count: 68
                Read Count: 3151
                Read Latency: 1,226 ms.
                Write Count: 5362886
                Write Latency: 0,039 ms.
                Pending Tasks: 0
                Key cache capacity: 200000
                Key cache size: 2488
                Key cache hit rate: 0.0016019223067681217
                Row cache: disabled
                Compacted row minimum size: 241
                Compacted row maximum size: 66680
                Compacted row mean size: 283

                Column Family: Index2
                SSTable count: 10
                Space used (live): 694093939
                Space used (total): 694093939
                Memtable Columns Count: 10916
                Memtable Data Size: 338396
                Memtable Switch Count: 42
                Read Count: 3151
                Read Latency: 5,919 ms.
                Write Count: 3442595
                Write Latency: 0,033 ms.
                Pending Tasks: 0
                Key cache capacity: 200000
                Key cache size: 2813
                Key cache hit rate: 0.01915041782729805
                Row cache: disabled
                Compacted row minimum size: 238
                Compacted row maximum size: 28563
                Compacted row mean size: 264

                Column Family: RAW
                SSTable count: 18
                Space used (live): 8621580313
                Space used (total): 8621580313
                Memtable Columns Count: 67751
                Memtable Data Size: 23322692
                Memtable Switch Count: 322
                Read Count: 3151
                Read Latency: 141,666 ms.
                Write Count: 23739615
                Write Latency: 0,097 ms.
                Pending Tasks: 0
                Key cache capacity: 200000
                Key cache size: 20
                Key cache hit rate: 0.5
                Row cache: disabled
                Compacted row minimum size: 558
                Compacted row maximum size: 1202492950
                Compacted row mean size: 84949879

                Column Family: Index3
                SSTable count: 9
                Space used (live): 715605405
                Space used (total): 715605405
                Memtable Columns Count: 90842
                Memtable Data Size: 2816102
                Memtable Switch Count: 49
                Read Count: 3151
                Read Latency: 69,815 ms.
                Write Count: 3710520
                Write Latency: 0,068 ms.
                Pending Tasks: 0
                Key cache capacity: 200000
                Key cache size: 2369
                Key cache hit rate: 0.09592691282832128
                Row cache: disabled
                Compacted row minimum size: 237
                Compacted row maximum size: 928234
                Compacted row mean size: 269

                Column Family: Index4
                SSTable count: 11
                Space used (live): 1071327924
                Space used (total): 1071327924
                Memtable Columns Count: 91479
                Memtable Data Size: 2835849
                Memtable Switch Count: 59
                Read Count: 3151
                Read Latency: 2,553 ms.
                Write Count: 5124219
                Write Latency: 0,044 ms.
                Pending Tasks: 0
                Key cache capacity: 200000
                Key cache size: 3437
                Key cache hit rate: 0.0534392984379282
                Row cache: disabled
                Compacted row minimum size: 238
                Compacted row maximum size: 861800
                Compacted row mean size: 264



Pool Name                    Active   Pending      Completed
FILEUTILS-DELETE-POOL             0         0            108
STREAM-STAGE                      0         0              0
RESPONSE-STAGE                    0         0       26411958
ROW-READ-STAGE                    0         0              0
LB-OPERATIONS                     0         0              0
MESSAGE-DESERIALIZER-POOL         0         0       56957780
GMFD                              0         0         246541
LB-TARGET                         0         0              0
CONSISTENCY-MANAGER               0         0              0
ROW-MUTATION-STAGE                0         0       41288534
MESSAGE-STREAMING-POOL            0         0              0
LOAD-BALANCER-STAGE               0         0              0
FLUSH-SORTER-POOL                 0         0              0
MEMTABLE-POST-FLUSHER             0         0            465
FLUSH-WRITER-POOL                 0         0            465
AE-SERVICE-STAGE                  0         0              0
HINTED-HANDOFF-POOL               1        43             59



Thank you for your help,


Olivier


----- "Peter Schuller" <peter.schul...@infidyne.com> a écrit :

> De: "Peter Schuller" <peter.schul...@infidyne.com>
> À: user@cassandra.apache.org
> Envoyé: Mercredi 7 Juillet 2010 18:17:19
> Objet: Re: High CPU usage on all nodes without any read or write
>
> > It runs correctly during several days. Last night, we started to
> have timeout exception on insert and high cpu load on all nodes.
> >
> > We stopped inserts. But the CPU remains high (without any insert or
> read).
> 
> Has data been written to the cluster faster than background
> compaction
> is proceeding? If so you may see cassandra eating CPU (and doing I/O)
> in the background for extended periods of time even after you stop
> sending requests to it.
> 
> If this is what is happening it should be visible in the log that
> it's
> doing compaction, and you should see that the data directories
> contain
> lots of files (unless it's just now catching up) rather than the
> fairly few expectation when compaction is up to speed.
> 
> Also consider that even if you're not writing faster than it can
> handle, if you have lots of data in total, the bigger compactions
> will
> take a considerable mount of time so you may see CPU+disk activity
> for
> long periods even if all is otherwise well.
> 
> Of course you say your're seeing timeouts. Is is possible these are
> timeouts that happen during compaction in general? What kind of
> latency are we talking about (a few extra hundre millis or several
> seconds?) and is there a correlation between the timeouts and lots of
> data being flushed to disk (iostat -x -k 1)?
> 
> -- 
> / Peter Schuller

Re: High CPU usage on all nodes without any read or write

Reply via email to