Are you using repairParallelism = sequential or parallel ?

As said by Alain:
- try to decrease streamthroughput to avoid overflooding nodes with a lots
of (small) streamed sstables
- if you are using // repair, switch to sequential
- don't start too much repair simultaneously.
- Do you really need to use LCS for your tables ? LCS make the problem even
worse. Use it with parcimony ;)



2016-05-06 18:05 GMT+02:00 Jean-Francois Gosselin <jfgosse...@gmail.com>:

> - Cassandra 2.1.13
> - SSDs
> - LeveledCompactionStrategy
> - Range repair (not incremental) with Spotify's Reaper
> https://github.com/spotify/cassandra-reaper
>
> Problem : When we run a repair job sometimes the SSTable count goes to 10K
> on one of nodes (not always the same node). The Reaper is smart enough to
> postpone the repair on this node since the number of pending compactions is
> > 20 but number of SSTables stays around 10K.
> Even If I set the compactionthroughput 0 (disable throttling) the SSTable
> count stays around 10K.
>
> Workaround: If we abort the repair, and restart the node it quickly (in 15
> minutes) goes back to 200 SSTables ...
>
> Any suggestions as what I should look at ?
>
> When it occurs, I've noticed that nodetool compactionstats and cfstats (on
> the table with 10K SSTables) takes minutes to return with a result.
>
> I thought that the issue might be related to
> https://issues.apache.org/jira/browse/CASSANDRA-10766 as I see the
> MemtablePostFlush waiting on the countdown latch but the Pending
> MemtablePostFlush is going up and down according to tpstats.
>
> Complete stack trace : http://pastebin.com/K1r3CUff
>
> I took some tpstats (roughly every minutes). Only these pools are not at 0
> (Active/Pending).
>
> Pool Name                    Active   Pending      Completed   Blocked
>  All time blocked
> MemtableFlushWriter               2         2         139864         0
>             0
> MemtablePostFlush                 1        13         223714         0
>             0
> CompactionExecutor               10        10         804964         0
>             0
>
> MemtableFlushWriter               4         4         139889         0
>             0
> MemtablePostFlush                 1        12         223744         0
>             0
> CompactionExecutor               12        12         805365         0
>             0
>
> MemtableFlushWriter               5         5         139896         0
>             0
> MemtablePostFlush                 1        10         223755         0
>             0
> CompactionExecutor                9         9         805503         0
>             0
>
> MemtableFlushWriter               4         4         139907         0
>             0
> MemtablePostFlush                 1        13         223762         0
>             0
> CompactionExecutor                9         9         805703         0
>             0
>
>
> MemtableFlushWriter               5         5         139927         0
>             0
> MemtablePostFlush                 1        14         223783         0
>             0
> CompactionExecutor               10        10         805971         0
>             0
>
> MemtableFlushWriter               7         7         139956         0
>             0
> MemtablePostFlush                 1        23         223806         0
>             0
> CompactionExecutor               10        10         806428         0
>             0
>
> nodetool compactionstats shows pending tasks 66
>
> Keyspace: foo
>         Read Count: 6308735
>         Read Latency: 12.132909585836147 ms.
>         Write Count: 15394697
>         Write Latency: 0.09054346675351908 ms.
>         Pending Flushes: 15
>                 Table: bar
>                 SSTable count: 10326
>                 SSTables in each level: [10090/4, 10, 106/100, 112, 0, 0,
> 0, 0, 0]
>                 Space used (live): 69204087872
>                 Space used (total): 69206400092
>                 Space used by snapshots (total): 2708047105
>                 Off heap memory used (total): 35230672
>                 SSTable Compression Ratio: 0.339043411676821
>                 Number of keys (estimate): 1601158
>                 Memtable cell count: 86524
>                 Memtable data size: 6508214
>                 Memtable off heap memory used: 0
>                 Memtable switch count: 22719
>                 Local read count: 6310549
>                 Local read latency: 12.135 ms
>                 Local write count: 15397653
>                 Local write latency: 0.091 ms
>                 Pending flushes: 10
>                 Bloom filter false positives: 2282107
>                 Bloom filter false ratio: 0.38494
>                 Bloom filter space used: 3244792
>                 Bloom filter off heap memory used: 3162168
>                 Index summary off heap memory used: 3348360
>                 Compression metadata off heap memory used: 28720144
>                 Compacted partition minimum bytes: 87
>                 Compacted partition maximum bytes: 2816159
>                 Compacted partition mean bytes: 69860
>                 Average live cells per slice (last five minutes):
> 817.6059838850788
>                 Maximum live cells per slice (last five minutes): 5002.0
>                 Average tombstones per slice (last five minutes): 0.0
>                 Maximum tombstones per slice (last five minutes): 0.0
>
> Thanks
>
> J-F Gosselin
>
>


-- 
Close the World, Open the Net
http://www.linux-wizard.net

Reply via email to