So, what does "anti-entropy repair" do then? Sounds like you have to 'decommission' the dead node, then I thought run 'nodeprobe repair' to get the data adjusted back to a replication factor of 3, right?
Also, what is the method to decommission a dead node? pass in the IP address of the dead node to nodeprobe on a member of the cluster? I've only used 'decommission' to remove the node I ran it on from the cluster... not a different node. It seems like if you decommission a node it should fix the replication factor for data that was on that node in this case... On Mon, Mar 29, 2010 at 10:32 AM, Jonathan Ellis <jbel...@gmail.com> wrote: > On Mon, Mar 29, 2010 at 12:27 PM, Ned Wolpert <ned.wolp...@imemories.com> > wrote: > > Folks- > > > > Can someone point out what happens during a node failure. Here is the > > Specific usecase: > > > > - Cassandra cluster with 4 nodes, replication factor of 3 > > - One node fails. > > - At this point, data that existed on the one failed node has copies on > 2 > > live nodes. > > - The failed node never comes back > > > > First question: At what point does Cassandra re-migrate that data that > only > > exists on 2 nodes to another node to retain the replication factor of 3? > > When you tell it to decommission the dead one. > > > Second question: Given the above case, if a brand new node is added to > the > > cluster, does anything happen to the data that now only exists on 2 > nodes? > > No, Cassandra doesn't automatically assume that "this node is never > coming back" w/o intervention, by design. (Temporary failures are > much more common than permanent ones.) > > -Jonathan > -- Virtually, Ned Wolpert "Settle thy studies, Faustus, and begin..." --Marlowe