cool, thank you. This looks like a very good setup for us and cleanup
should be very fast for this case.

On Tue, May 16, 2023 at 5:53 AM Jeff Jirsa <jji...@gmail.com> wrote:

>
> In-line
>
> On May 15, 2023, at 5:26 PM, Runtian Liu <curly...@gmail.com> wrote:
>
> 
> Hi Jeff,
>
> I tried the setup with vnode 16 and NetworkTopologyStrategy replication
> strategy with replication factor 3 with 3 racks in one cluster. When using
> the new node token as the old node token - 1
>
>
> I had said +1 but you’re right that it’s actually -1 , sorry about that.
> You want the new node to be lower than the existing host. The lower token
> will take most of the data.
>
> I see the new node is streaming from the old node only. And the decom
> phase of the old node is extremely fast. Does this mean the new node will
> only take data ownership from the old node?
>
>
> With exactly three racks, yes. With more racks or fewer racks, no.
>
> I also did some cleanups after replacing node with old token - 1 and the
> cleanup sstable count was not increasing. Looks like adding a node with
> old_token - 1 and decom the old node will not generate stale data on the
> rest of the cluster. Do you know if  there are any edge cases that in this
> replacement process can generate any stale data on other nodes of the
> cluster with the setup I mentioned?
>
>
> Should do exactly what you want. I’d still run cleanup but it should be a
> no-op.
>
>
> Thanks,
> Runtian
>
> On Mon, May 8, 2023 at 9:59 PM Runtian Liu <curly...@gmail.com> wrote:
>
>> I thought the joining node would not participate in quorum? How are we
>> counting things like how many replicas ACK a write when we are adding a new
>> node for expansion? The token ownership won't change until the new node is
>> fully joined right?
>>
>> On Mon, May 8, 2023 at 8:58 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>
>>> You can't have two nodes with the same token (in the current metadata
>>> implementation) - it causes problems counting things like how many replicas
>>> ACK a write, and what happens if the one you're replacing ACKs a write but
>>> the joining host doesn't? It's harder than it seems to maintain consistency
>>> guarantees in that model, because you have 2 nodes where either may end up
>>> becoming the sole true owner of the token, and you have to handle both
>>> cases where one of them fails.
>>>
>>> An easier option is to add it with new token set to old token +1 (as an
>>> expansion), then decom the leaving node (shrink). That'll minimize
>>> streaming when you decommission that node.
>>>
>>>
>>>
>>> On Mon, May 8, 2023 at 7:19 PM Runtian Liu <curly...@gmail.com> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Sometimes we want to replace a node for various reasons, we can replace
>>>> a node by shutting down the old node and letting the new node stream data
>>>> from other replicas, but this approach may have availability issues or data
>>>> consistency issues if one more node in the same cluster went down. Why
>>>> Cassandra doesn't support replacing a node without shutting down the old
>>>> one? Can we treat the new node as normal node addition while having exactly
>>>> the same token ranges as the node to be replaced. After the new node's
>>>> joining process is complete, we just need to cut off the old node. With
>>>> this, we don't lose any availability and the token range is not moved so no
>>>> clean up is needed. Is there any downside of doing this?
>>>>
>>>> Thanks,
>>>> Runtian
>>>>
>>>

Reply via email to