I appear to have a problem illustrated by
https://issues.apache.org/jira/browse/CASSANDRA-1955. At low data
rates, I'm seeing mutation messages dropped because writers are
blocked as I get a storm of memtables being flushed. OpsCenter
memtables seem to also contribute to this:

INFO [OptionalTasks:1] 2013-08-23 01:53:58,522 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-runratecountforiczone@1281182121(14976/120803 serialized/live
bytes, 360 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,523 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-runratecountforchannel@705923070(278200/1048576
serialized/live bytes, 6832 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,525 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-solr_resources@1615459594(66362/66362 serialized/live bytes,
4 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,525 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-scheduleddaychannelie@393647337(33203968/36700160
serialized/live bytes, 865620 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,530 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-failediecountfornetwork@1781160199(8680/124903
serialized/live bytes, 273 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,530 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-rollups7200@37425413(6504/236666 serialized/live bytes, 271
ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,531 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-rollups60@1943691367(638176/1048576 serialized/live bytes,
39894 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,531 ColumnFamilyStore.java
(line 630) Enqueuing flush of Memtable-events@99567005(1133/1133
serialized/live bytes, 39 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,532 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-rollups300@532892022(184296/1048576 serialized/live bytes,
7679 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,532 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-ie@1309405764(457390051/152043520 serialized/live bytes,
16956160 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,823 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-videoexpectedformat@1530999508(684/24557 serialized/live
bytes, 12453 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:58,929 ColumnFamilyStore.java
(line 630) Enqueuing flush of
Memtable-failediecountforzone@411870848(9200/95294 serialized/live
bytes, 284 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:59,012 ColumnFamilyStore.java
(line 630) Enqueuing flush of Memtable-rollups86400@744253892(456/456
serialized/live bytes, 19 ops)
INFO [OptionalTasks:1] 2013-08-23 01:53:59,364 ColumnFamilyStore.java
(line 630) Enqueuing flush of Memtable-peers@2024878954(2006/40629
serialized/live bytes, 452 ops)

I had a tpstats running across all the nodes in my cluster every 5
seconds or so and observe the following:

2013-08-23T01:53:47 192.168.131.227 FlushWriter 0 0 33 0 0
2013-08-23T01:53:55 192.168.131.227 FlushWriter 0 0 33 0 0
2013-08-23T01:54:00 192.168.131.227 FlushWriter 2 10 37 1 5
2013-08-23T01:54:07 192.168.131.227 FlushWriter 1 1 53 0 11
2013-08-23T01:54:12 192.168.131.227 FlushWriter 1 1 53 0 11

Now I can increase memtable_flush_queue_size, but it seems based on
the above that in order to solve the problem, I need to set this to
count(CF). What's the downside of this approach? It seems a backwards
solution to the real problem...

Reply via email to