nodetool cfhistograms, nodetool compactionstats would be helpful Compaction is probably behind from streaming, and reads are touching many sstables.
-- Jeff Jirsa > On Feb 21, 2018, at 8:20 AM, Fd Habash <fmhab...@gmail.com> wrote: > > We have had a 15 node cluster across three zones and cluster repairs using > ‘nodetool repair -pr’ took about 3 hours to finish. Lately, we shrunk the > cluster to 12. Since then, same repair job has taken up to 12 hours to finish > and most times, it never does. > > More importantly, at some point during the repair cycle, we see read > latencies jumping to 1-2 seconds and applications immediately notice the > impact. > > stream_throughput_outbound_megabits_per_sec is set at 200 and > compaction_throughput_mb_per_sec at 64. The /data dir on the nodes is around > ~500GB at 44% usage. > > When shrinking the cluster, the ‘nodetool decommision’ was eventless. It > completed successfully with no issues. > > What could possibly cause repairs to cause this impact following cluster > downsizing? Taking three nodes out does not seem compatible with such a > drastic effect on repair and read latency. > > Any expert insights will be appreciated. > ---------------- > Thank you >