Hi everyone,
this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8.
All 3 nodes were upgraded. SStables are upgraded.
Unfortunetaly we are now experiencing that Cassandra starts to hang every
10 hours or so.
We can see the MemoryMeter being very active, every time it is hanging.
Both in tpstats and in the system.log:
INFO [MemoryMeter:1] 2014-06-14 19:24:09,488 Memtable.java (line 481)
CFS(Keyspace='MDS', ColumnFamily='ResponsePortal') liveRatio is 64.0
(just-counted was 64.0). calculation took 0ms for 0 cells
This line is logged hundreds of times per second (!) when Cassandra is
down. CPU is a 100% busy.
Interestingly this is only logged for this particular Columnfamily. This CF
is used as a queue, which only contains a few entries (datafiles are about
4kb, only ~100 keys, usually 1-2 active, 98-99 tombstones).
Table: ResponsePortal
SSTable count: 1
Space used (live), bytes: 4863
Space used (total), bytes: 4863
SSTable Compression Ratio: 0.9545454545454546
Number of keys (estimate): 128
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 1
Local read count: 0
Local read latency: 0.000 ms
Local write count: 5
Local write latency: 0.000 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 176
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 50
Compacted partition mean bytes: 50
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
Table: ResponsePortal
SSTable count: 1
Space used (live), bytes: 4765
Space used (total), bytes: 5777
SSTable Compression Ratio: 0.75
Number of keys (estimate): 128
Memtable cell count: 0
Memtable data size, bytes: 0
Memtable switch count: 12
Local read count: 0
Local read latency: 0.000 ms
Local write count: 1096
Local write latency: 0.000 ms
Pending tasks: 0
Bloom filter false positives: 0
Bloom filter false ratio: 0.00000
Bloom filter space used, bytes: 16
Compacted partition minimum bytes: 43
Compacted partition maximum bytes: 50
Compacted partition mean bytes: 50
Average live cells per slice (last five minutes): 0.0
Average tombstones per slice (last five minutes): 0.0
Has anyone ever seen this or has an idea what could be wrong? It seems that
2.0 can handle this column family not as good as 1.2 could.
Any hints on what could be wrong are greatly appreciated :-)
Cheers,
Christian