Re: Bringing a node back online after failure

Jonathan Ellis Sun, 30 Jan 2011 21:56:32 -0800

I think we'd need a new operation type
(https://issues.apache.org/jira/browse/CASSANDRA-957) to go from "some
of the data gets streamed" to "all of the data gets streamed."  A node
that claims a token that is in the ring is assumed to actually have
that data and IMO trying to guess when to break that would be
error-prone -- better to have some explicit signal.


On Sun, Jan 30, 2011 at 1:38 AM, Chris Goffinet <c...@chrisgoffinet.com> wrote:
> I was looking over the Operations wiki, and with the many improvements with 
> 0.7, I wanted to bring up a thought.
>
> The two options today for replacing a node that has lost all data is:
>
> (Recommended approach) Bring up the replacement node with a new IP address, 
> and AutoBootstrap set to true in storage-conf.xml. This will place the 
> replacement node in the cluster and find the appropriate position 
> automatically. Then the bootstrap process begins. While this process runs, 
> the node will not receive reads until finished. Once this process is finished 
> on the replacement node, run nodetool removetoken once, supplying the token 
> of the dead node, and nodetool cleanup on each node.
> (Alternative approach) Bring up a replacement node with the same IP and token 
> as the old, and run nodetool repair. Until the repair process is complete, 
> clients reading only from this node may get no data back. Using a higher 
> ConsistencyLevel on reads will avoid this.
>
> For nodes that might have a drive failure, but same ip address, what do you 
> think about supplying the node's same token + autobootstrap set to true? This 
> process works in trunk, but not all the data seems to be streamed over from 
> it's replicas. This would provide the option to not let a node take on reads 
> until replicas stream the SSTables over and would eliminate the alternative 
> approach of forcing higher consistency levels.
>
> -Chris
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Bringing a node back online after failure

Reply via email to