Repairing the range is an expensive operation and don't forget--just
because a node is down does not mean it's dead.  I take nodes down for
maintenance all the time--maybe there was a security update that needed to
be applied, for example, or perhaps a kernel update.  There are a multitude
of reasons why a node would be dead, but not replaced.

If you really, really wanted this to be automated, it'd be trivial to setup
a cron job that looked for dead nodes and removed them from the
cluster--then ran a repair on all of the nodes in your cluster.  This will
cause spikes, especially if you have a large cluster.

Andrew


On Tue, Jun 3, 2014 at 9:45 AM, Ipremyadav <ipremya...@gmail.com> wrote:

> Thanks Mongo maven:)
> I understand why you need to to do this.
> My question was more from the architecture point if view. Why doesn't
> Cassandra just redistribute the data? Is it because of the gossip protocol?
>
> Thanks,
> Prem
>
> On 3 Jun 2014, at 17:35, Curious Patient <mongoma...@gmail.com> wrote:
>
> Assuming replication factor is >2, if a node dies, why does it matter? If
>> we add a new node is added, shouldn't it just take the chunk of data it
>> server as the "primary" node from the other existing nodes.
>> Why do we need to worry about replacing the dead node?
>
>
> The reason this matters is because I am unable to do a nodetool repair on
> my keyspace with the dead node still being listed in nodetool status.  It
> fails complaining that it cant't reach the dead node.
>
>
> On Tue, Jun 3, 2014 at 12:18 PM, Jeremy Jongsma <jer...@barchart.com>
> wrote:
>
>> A dead node is still allocated key ranges, and Cassandra will wait for it
>> to come back online rather than redistributing its data. It needs to be
>> decommissioned or replaced by a new node for it to be truly dead as far as
>> the cluster is concerned.
>>
>>
>> On Tue, Jun 3, 2014 at 11:12 AM, Prem Yadav <ipremya...@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> in the last week week, we saw at least two emails about dead node
>>> replacement. Though I saw the documentation about how to do this, i am not
>>> sure I understand why this is required.
>>>
>>> Assuming replication factor is >2, if a node dies, why does it matter?
>>> If we add a new node is added, shouldn't it just take the chunk of data it
>>> server as the "primary" node from the other existing nodes.
>>> Why do we need to worry about replacing the dead node?
>>>
>>> Thanks
>>>
>>
>>
>

Reply via email to