Any news on this ? We also have issues during repairs when using many LCS tables. We end up with 8k sstables, many pending tasks and dropped mutations
We are using Cassandra 2.0.10, on a 24 cores server, with multithreaded compactions enabled. ~$ nodetool getstreamthroughput Current stream throughput: 200 MB/s ~$ nodetool getcompactionthroughput Current compaction throughput: 16 MB/s Most sstables are tiny 4K or 8K/12K sstables: ~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | grep -Ev 'M' | wc -l 7405 ~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | wc -l 7440 ~$ ls -sh /var/lib/cassandra/data/xxxx/xxx/*-Data.db | grep -Ev 'M' | cut -f1 -d" " | sort | uniq -c 36 7003 4.0K 396 8.0K Pool Name Active Pending Completed Blocked All time blocked ReadStage 0 0 258098148 0 0 RequestResponseStage 0 0 613994884 0 0 MutationStage 0 0 332242206 0 0 ReadRepairStage 0 0 3360040 0 0 ReplicateOnWriteStage 0 0 0 0 0 GossipStage 0 0 2471033 0 0 CacheCleanupExecutor 0 0 0 0 0 MigrationStage 0 0 0 0 0 MemoryMeter 0 0 25160 0 0 FlushWriter 1 1 134083 0 521 ValidationExecutor 1 1 89514 0 0 InternalResponseStage 0 0 0 0 0 AntiEntropyStage 0 0 636471 0 0 MemtablePostFlusher 1 1 334667 0 0 MiscStage 0 0 0 0 0 PendingRangeCalculator 0 0 181 0 0 commitlog_archiver 0 0 0 0 0 CompactionExecutor 24 24 5241768 0 0 AntiEntropySessions 0 0 15184 0 0 HintedHandoff 0 0 278 0 0 Message type Dropped RANGE_SLICE 0 READ_REPAIR 267 PAGED_RANGE 0 BINARY 0 READ 0 MUTATION 150970 _TRACE 0 REQUEST_RESPONSE 0 COUNTER_MUTATION 0 2016-02-12 20:08 GMT+01:00 Michał Łowicki <mlowi...@gmail.com>: > I had to decrease streaming throughput to 10 (from default 200) in order to > avoid effect or rising number of SSTables and number of compaction tasks > while running repair. It's working very slow but it's stable and doesn't > hurt the whole cluster. Will try to adjust configuration gradually to see if > can make it any better. Thanks! > > On Thu, Feb 11, 2016 at 8:10 PM, Michał Łowicki <mlowi...@gmail.com> wrote: >> >> >> >> On Thu, Feb 11, 2016 at 5:38 PM, Alain RODRIGUEZ <arodr...@gmail.com> >> wrote: >>> >>> Also, are you using incremental repairs (not sure about the available >>> options in Spotify Reaper) what command did you run ? >>> >> >> No. >> >>> >>> 2016-02-11 17:33 GMT+01:00 Alain RODRIGUEZ <arodr...@gmail.com>: >>>>> >>>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses >>>> >>>> >>>> >>>> What is your current compaction throughput ? The current value of >>>> 'concurrent_compactors' (cassandra.yaml or through JMX) ? >> >> >> >> Throughput was initially set to 1024 and I've gradually increased it to >> 2048, 4K and 16K but haven't seen any changes. Tried to change it both from >> `nodetool` and also cassandra.yaml (with restart after changes). >> >>>> >>>> >>>> nodetool getcompactionthroughput >>>> >>>>> How to speed up compaction? Increased compaction throughput and >>>>> concurrent compactors but no change. Seems there is plenty idle resources >>>>> but can't force C* to use it. >>>> >>>> >>>> You might want to try un-throttle the compaction throughput through: >>>> >>>> nodetool setcompactionsthroughput 0 >>>> >>>> Choose a canari node. Monitor compaction pending and disk throughput >>>> (make sure server is ok too - CPU...) >> >> >> >> Yes, I'll try it out but if increasing it 16 times didn't help I'm a bit >> sceptical about it. >> >>>> >>>> >>>> Some other information could be useful: >>>> >>>> What is your number of cores per machine and the compaction strategies >>>> for the 'most compacting' tables. What are write/update patterns, any TTL >>>> or >>>> tombstones ? Do you use a high number of vnodes ? >> >> >> I'm using bare-metal box, 40CPU, 64GB, 2 SSD each. num_tokens is set to >> 256. >> >> Using LCS for all tables. Write / update heavy. No warnings about large >> number of tombstones but we're removing items frequently. >> >> >>>> >>>> >>>> Also what is your repair routine and your values for gc_grace_seconds ? >>>> When was your last repair and do you think your cluster is suffering of a >>>> high entropy ? >> >> >> We're having problem with repair for months (CASSANDRA-9935). >> gc_grace_seconds is set to 345600 now. Yes, as we haven't launched it >> successfully for long time I guess cluster is suffering of high entropy. >> >>>> >>>> >>>> You can lower the stream throughput to make sure nodes can cope with >>>> what repairs are feeding them. >>>> >>>> nodetool getstreamthroughput >>>> nodetool setstreamthroughput X >> >> >> Yes, this sounds interesting. As we're having problem with repair for >> months it could that lots of things are transferred between nodes. >> >> Thanks! >> >>>> >>>> >>>> C*heers, >>>> >>>> ----------------- >>>> Alain Rodriguez >>>> France >>>> >>>> The Last Pickle >>>> http://www.thelastpickle.com >>>> >>>> 2016-02-11 16:55 GMT+01:00 Michał Łowicki <mlowi...@gmail.com>: >>>>> >>>>> Hi, >>>>> >>>>> Using 2.1.12 across 3 DCs. Each DC has 8 nodes. Trying to run repair >>>>> using Cassandra Reaper but nodes after couple of hours are full of pending >>>>> compaction tasks (regular not the ones about validation) >>>>> >>>>> CPU load is fine, SSD disks below 30% utilization, no long GC pauses. >>>>> >>>>> How to speed up compaction? Increased compaction throughput and >>>>> concurrent compactors but no change. Seems there is plenty idle resources >>>>> but can't force C* to use it. >>>>> >>>>> Any clue where there might be a bottleneck? >>>>> >>>>> >>>>> -- >>>>> BR, >>>>> Michał Łowicki >>>>> >>>> >>> >> >> >> >> -- >> BR, >> Michał Łowicki > > > > > -- > BR, > Michał Łowicki -- Close the World, Open the Net http://www.linux-wizard.net