Thank you for the recommendation! Most of pending compactions are for another (~100 times larger) keyspace. They are always running in the background.
2018-03-16 13:28 GMT+05:00 Nicolas Guyomar <nicolas.guyo...@gmail.com>: > Hi, > > You also have 62 pending compactions at the same time, which is odd for > such a small dataset IHMO, are you triggering 'nodetool compact' with some > kind of cron you may have forgot after a test or something else ? > Do you have any monitoring in place ? If not, you could let some 'dstat > -tnrvl 10' for a while and look for inconsistency (huge I/O wait at some > point, blocked proc etc) > > > > > On 16 March 2018 at 07:33, Dmitry Simonov <dimmobor...@gmail.com> wrote: > >> Hello! >> >> We are experiencing problems with Cassandra 2.2.8. >> There is a cluster with 3 nodes. >> Problematic keyspace has RF=3 and contains 3 tables (current table sizes: >> 1Gb, 700Mb, 12Kb). >> >> Several times per day there are bursts of "READ messages were dropped ... >> for internal timeout" messages in logs (on every cassandra node). Duration: >> 5 - 15 minutes. >> >> During periods of drops there is always a queue of pending ReadStage >> tasks: >> >> Pool Name Active Pending Completed Blocked All >> time blocked >> ReadStage 32 67 2976548410 0 >> 0 >> CompactionExecutor 2 62 802136 0 >> 0 >> >> Others Active and Pending counters of tpstats are 0. >> >> During drops iostat says there is no read requests to disks, probably >> because all data fits in a disk cache: >> >> avg-cpu: %user %nice %system %iowait %steal %idle >> 56,53 0,94 39,84 0,01 0,00 2,68 >> >> Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz >> avgqu-sz await r_await w_await svctm %util >> sda 0,00 11,00 0,00 26,00 0,00 9,09 715,92 >> 0,78 30,31 0,00 30,31 2,46 6,40 >> sdb 0,00 11,00 0,00 33,00 0,00 10,57 655,70 >> 0,83 26,00 0,00 26,00 2,00 6,60 >> sdc 0,00 1,00 0,00 30,50 0,00 10,98 737,07 >> 0,91 30,49 0,00 30,49 2,10 6,40 >> sdd 0,00 31,50 0,00 35,00 0,00 11,17 653,50 >> 0,98 28,17 0,00 28,17 1,83 6,40 >> sde 0,00 31,50 0,00 34,50 0,00 10,82 642,10 >> 0,67 19,54 0,00 19,54 1,39 4,80 >> sdf 0,00 1,00 0,00 24,50 0,00 9,71 811,78 >> 0,60 24,33 0,00 24,33 1,88 4,60 >> sdg 0,00 1,00 0,00 23,00 0,00 8,93 795,15 >> 0,51 22,26 0,00 22,26 1,91 4,40 >> sdh 0,00 1,00 0,00 21,50 0,00 8,37 797,05 >> 0,45 21,02 0,00 21,02 1,86 4,00 >> >> Disks are SSDs. >> >> Before that drops "Local write count" for problematic table increases >> very fast (10k-30k/sec, while ordinary write rate is 10-30/sec) during 1 >> minute. After that drops start. >> >> Tried useding probabilistic tracing to determine which requests cause >> "write count" to increase, but see no "batch_mutate" queries at all, only >> reads! >> >> There are no GC warnings about long pauses >> >> Could you please help troubleshooting the issue? >> >> -- >> Best Regards, >> Dmitry Simonov >> > > -- Best Regards, Dmitry Simonov