I thought the joining node would not participate in quorum? How are we counting things like how many replicas ACK a write when we are adding a new node for expansion? The token ownership won't change until the new node is fully joined right?
On Mon, May 8, 2023 at 8:58 PM Jeff Jirsa <jji...@gmail.com> wrote: > You can't have two nodes with the same token (in the current metadata > implementation) - it causes problems counting things like how many replicas > ACK a write, and what happens if the one you're replacing ACKs a write but > the joining host doesn't? It's harder than it seems to maintain consistency > guarantees in that model, because you have 2 nodes where either may end up > becoming the sole true owner of the token, and you have to handle both > cases where one of them fails. > > An easier option is to add it with new token set to old token +1 (as an > expansion), then decom the leaving node (shrink). That'll minimize > streaming when you decommission that node. > > > > On Mon, May 8, 2023 at 7:19 PM Runtian Liu <curly...@gmail.com> wrote: > >> Hi all, >> >> Sometimes we want to replace a node for various reasons, we can replace a >> node by shutting down the old node and letting the new node stream data >> from other replicas, but this approach may have availability issues or data >> consistency issues if one more node in the same cluster went down. Why >> Cassandra doesn't support replacing a node without shutting down the old >> one? Can we treat the new node as normal node addition while having exactly >> the same token ranges as the node to be replaced. After the new node's >> joining process is complete, we just need to cut off the old node. With >> this, we don't lose any availability and the token range is not moved so no >> clean up is needed. Is there any downside of doing this? >> >> Thanks, >> Runtian >> >