Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

Fred Habash Wed, 21 Feb 2018 10:33:04 -0800

One node at a time

On Feb 21, 2018 10:23 AM, "Carl Mueller" <carl.muel...@smartthings.com>
wrote:


> What is your replication factor?
> Single datacenter, three availability zones, is that right?
> You removed one node at a time or three at once?
>
> On Wed, Feb 21, 2018 at 10:20 AM, Fd Habash <fmhab...@gmail.com> wrote:
>
>> We have had a 15 node cluster across three zones and cluster repairs
>> using ‘nodetool repair -pr’ took about 3 hours to finish. Lately, we shrunk
>> the cluster to 12. Since then, same repair job has taken up to 12 hours to
>> finish and most times, it never does.
>>
>>
>>
>> More importantly, at some point during the repair cycle, we see read
>> latencies jumping to 1-2 seconds and applications immediately notice the
>> impact.
>>
>>
>>
>> stream_throughput_outbound_megabits_per_sec is set at 200 and
>> compaction_throughput_mb_per_sec at 64. The /data dir on the nodes is
>> around ~500GB at 44% usage.
>>
>>
>>
>> When shrinking the cluster, the ‘nodetool decommision’ was eventless. It
>> completed successfully with no issues.
>>
>>
>>
>> What could possibly cause repairs to cause this impact following cluster
>> downsizing? Taking three nodes out does not seem compatible with such a
>> drastic effect on repair and read latency.
>>
>>
>>
>> Any expert insights will be appreciated.
>>
>> ----------------
>> Thank you
>>
>>
>>
>
>

Re: Cluster Repairs 'nodetool repair -pr' Cause Severe Increase in Read Latency After Shrinking Cluster

Reply via email to