Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

Jeff Jirsa Wed, 21 Feb 2018 08:24:47 -0800

nodetool cfhistograms, nodetool compactionstats would be helpful

Compaction is probably behind from streaming, and reads are touching many 
sstables.


-- 
Jeff Jirsa


> On Feb 21, 2018, at 8:20 AM, Fd Habash <fmhab...@gmail.com> wrote:
> 
> We have had a 15 node cluster across three zones and cluster repairs using 
> ‘nodetool repair -pr’ took about 3 hours to finish. Lately, we shrunk the 
> cluster to 12. Since then, same repair job has taken up to 12 hours to finish 
> and most times, it never does.
>  
> More importantly, at some point during the repair cycle, we see read 
> latencies jumping to 1-2 seconds and applications immediately notice the 
> impact.
>  
> stream_throughput_outbound_megabits_per_sec is set at 200 and 
> compaction_throughput_mb_per_sec at 64. The /data dir on the nodes is around 
> ~500GB at 44% usage.
>  
> When shrinking the cluster, the ‘nodetool decommision’ was eventless. It 
> completed successfully with no issues.
>  
> What could possibly cause repairs to cause this impact following cluster 
> downsizing? Taking three nodes out does not seem compatible with such a 
> drastic effect on repair and read latency.
>  
> Any expert insights will be appreciated.
> ----------------
> Thank you
>

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

Reply via email to