Are you using repairParallelism = sequential or parallel ? As said by Alain: - try to decrease streamthroughput to avoid overflooding nodes with a lots of (small) streamed sstables - if you are using // repair, switch to sequential - don't start too much repair simultaneously. - Do you really need to use LCS for your tables ? LCS make the problem even worse. Use it with parcimony ;)
2016-05-06 18:05 GMT+02:00 Jean-Francois Gosselin <jfgosse...@gmail.com>: > - Cassandra 2.1.13 > - SSDs > - LeveledCompactionStrategy > - Range repair (not incremental) with Spotify's Reaper > https://github.com/spotify/cassandra-reaper > > Problem : When we run a repair job sometimes the SSTable count goes to 10K > on one of nodes (not always the same node). The Reaper is smart enough to > postpone the repair on this node since the number of pending compactions is > > 20 but number of SSTables stays around 10K. > Even If I set the compactionthroughput 0 (disable throttling) the SSTable > count stays around 10K. > > Workaround: If we abort the repair, and restart the node it quickly (in 15 > minutes) goes back to 200 SSTables ... > > Any suggestions as what I should look at ? > > When it occurs, I've noticed that nodetool compactionstats and cfstats (on > the table with 10K SSTables) takes minutes to return with a result. > > I thought that the issue might be related to > https://issues.apache.org/jira/browse/CASSANDRA-10766 as I see the > MemtablePostFlush waiting on the countdown latch but the Pending > MemtablePostFlush is going up and down according to tpstats. > > Complete stack trace : http://pastebin.com/K1r3CUff > > I took some tpstats (roughly every minutes). Only these pools are not at 0 > (Active/Pending). > > Pool Name Active Pending Completed Blocked > All time blocked > MemtableFlushWriter 2 2 139864 0 > 0 > MemtablePostFlush 1 13 223714 0 > 0 > CompactionExecutor 10 10 804964 0 > 0 > > MemtableFlushWriter 4 4 139889 0 > 0 > MemtablePostFlush 1 12 223744 0 > 0 > CompactionExecutor 12 12 805365 0 > 0 > > MemtableFlushWriter 5 5 139896 0 > 0 > MemtablePostFlush 1 10 223755 0 > 0 > CompactionExecutor 9 9 805503 0 > 0 > > MemtableFlushWriter 4 4 139907 0 > 0 > MemtablePostFlush 1 13 223762 0 > 0 > CompactionExecutor 9 9 805703 0 > 0 > > > MemtableFlushWriter 5 5 139927 0 > 0 > MemtablePostFlush 1 14 223783 0 > 0 > CompactionExecutor 10 10 805971 0 > 0 > > MemtableFlushWriter 7 7 139956 0 > 0 > MemtablePostFlush 1 23 223806 0 > 0 > CompactionExecutor 10 10 806428 0 > 0 > > nodetool compactionstats shows pending tasks 66 > > Keyspace: foo > Read Count: 6308735 > Read Latency: 12.132909585836147 ms. > Write Count: 15394697 > Write Latency: 0.09054346675351908 ms. > Pending Flushes: 15 > Table: bar > SSTable count: 10326 > SSTables in each level: [10090/4, 10, 106/100, 112, 0, 0, > 0, 0, 0] > Space used (live): 69204087872 > Space used (total): 69206400092 > Space used by snapshots (total): 2708047105 > Off heap memory used (total): 35230672 > SSTable Compression Ratio: 0.339043411676821 > Number of keys (estimate): 1601158 > Memtable cell count: 86524 > Memtable data size: 6508214 > Memtable off heap memory used: 0 > Memtable switch count: 22719 > Local read count: 6310549 > Local read latency: 12.135 ms > Local write count: 15397653 > Local write latency: 0.091 ms > Pending flushes: 10 > Bloom filter false positives: 2282107 > Bloom filter false ratio: 0.38494 > Bloom filter space used: 3244792 > Bloom filter off heap memory used: 3162168 > Index summary off heap memory used: 3348360 > Compression metadata off heap memory used: 28720144 > Compacted partition minimum bytes: 87 > Compacted partition maximum bytes: 2816159 > Compacted partition mean bytes: 69860 > Average live cells per slice (last five minutes): > 817.6059838850788 > Maximum live cells per slice (last five minutes): 5002.0 > Average tombstones per slice (last five minutes): 0.0 > Maximum tombstones per slice (last five minutes): 0.0 > > Thanks > > J-F Gosselin > > -- Close the World, Open the Net http://www.linux-wizard.net