Hi everyone, this week we upgraded one of our Systems from Cassandra 1.2.16 to 2.0.8. All 3 nodes were upgraded. SStables are upgraded.
Unfortunetaly we are now experiencing that Cassandra starts to hang every 10 hours or so. We can see the MemoryMeter being very active, every time it is hanging. Both in tpstats and in the system.log: INFO [MemoryMeter:1] 2014-06-14 19:24:09,488 Memtable.java (line 481) CFS(Keyspace='MDS', ColumnFamily='ResponsePortal') liveRatio is 64.0 (just-counted was 64.0). calculation took 0ms for 0 cells This line is logged hundreds of times per second (!) when Cassandra is down. CPU is a 100% busy. Interestingly this is only logged for this particular Columnfamily. This CF is used as a queue, which only contains a few entries (datafiles are about 4kb, only ~100 keys, usually 1-2 active, 98-99 tombstones). Table: ResponsePortal SSTable count: 1 Space used (live), bytes: 4863 Space used (total), bytes: 4863 SSTable Compression Ratio: 0.9545454545454546 Number of keys (estimate): 128 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 1 Local read count: 0 Local read latency: 0.000 ms Local write count: 5 Local write latency: 0.000 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 176 Compacted partition minimum bytes: 43 Compacted partition maximum bytes: 50 Compacted partition mean bytes: 50 Average live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Table: ResponsePortal SSTable count: 1 Space used (live), bytes: 4765 Space used (total), bytes: 5777 SSTable Compression Ratio: 0.75 Number of keys (estimate): 128 Memtable cell count: 0 Memtable data size, bytes: 0 Memtable switch count: 12 Local read count: 0 Local read latency: 0.000 ms Local write count: 1096 Local write latency: 0.000 ms Pending tasks: 0 Bloom filter false positives: 0 Bloom filter false ratio: 0.00000 Bloom filter space used, bytes: 16 Compacted partition minimum bytes: 43 Compacted partition maximum bytes: 50 Compacted partition mean bytes: 50 Average live cells per slice (last five minutes): 0.0 Average tombstones per slice (last five minutes): 0.0 Has anyone ever seen this or has an idea what could be wrong? It seems that 2.0 can handle this column family not as good as 1.2 could. Any hints on what could be wrong are greatly appreciated :-) Cheers, Christian