Hi Jeff,

Do you think this is a good workaround to have in the Cassandra itself
until we have CEP-21
<https://cwiki.apache.org/confluence/display/CASSANDRA/CEP-21%3A+Transactional+Cluster+Metadata>
available and cleanup as part of the compaction in the Cassandra itself?
It can work as follows in Cassandra:
Step.1: Add a new flag in bootstrap, say
*-Dcopy_tokens_from=<src_ip_address>*. If set, then the newly joining node
will copy the tokens from *src_ip_address* and add "-1" to it
Step.2: Continue with the remaining bootstrap as is

Thoughts?

Jaydeep

On Tue, May 16, 2023 at 10:23 AM Runtian Liu <curly...@gmail.com> wrote:

> cool, thank you. This looks like a very good setup for us and cleanup
> should be very fast for this case.
>
> On Tue, May 16, 2023 at 5:53 AM Jeff Jirsa <jji...@gmail.com> wrote:
>
>>
>> In-line
>>
>> On May 15, 2023, at 5:26 PM, Runtian Liu <curly...@gmail.com> wrote:
>>
>> 
>> Hi Jeff,
>>
>> I tried the setup with vnode 16 and NetworkTopologyStrategy replication
>> strategy with replication factor 3 with 3 racks in one cluster. When using
>> the new node token as the old node token - 1
>>
>>
>> I had said +1 but you’re right that it’s actually -1 , sorry about that.
>> You want the new node to be lower than the existing host. The lower token
>> will take most of the data.
>>
>> I see the new node is streaming from the old node only. And the decom
>> phase of the old node is extremely fast. Does this mean the new node will
>> only take data ownership from the old node?
>>
>>
>> With exactly three racks, yes. With more racks or fewer racks, no.
>>
>> I also did some cleanups after replacing node with old token - 1 and the
>> cleanup sstable count was not increasing. Looks like adding a node with
>> old_token - 1 and decom the old node will not generate stale data on the
>> rest of the cluster. Do you know if  there are any edge cases that in this
>> replacement process can generate any stale data on other nodes of the
>> cluster with the setup I mentioned?
>>
>>
>> Should do exactly what you want. I’d still run cleanup but it should be a
>> no-op.
>>
>>
>> Thanks,
>> Runtian
>>
>> On Mon, May 8, 2023 at 9:59 PM Runtian Liu <curly...@gmail.com> wrote:
>>
>>> I thought the joining node would not participate in quorum? How are we
>>> counting things like how many replicas ACK a write when we are adding a new
>>> node for expansion? The token ownership won't change until the new node is
>>> fully joined right?
>>>
>>> On Mon, May 8, 2023 at 8:58 PM Jeff Jirsa <jji...@gmail.com> wrote:
>>>
>>>> You can't have two nodes with the same token (in the current metadata
>>>> implementation) - it causes problems counting things like how many replicas
>>>> ACK a write, and what happens if the one you're replacing ACKs a write but
>>>> the joining host doesn't? It's harder than it seems to maintain consistency
>>>> guarantees in that model, because you have 2 nodes where either may end up
>>>> becoming the sole true owner of the token, and you have to handle both
>>>> cases where one of them fails.
>>>>
>>>> An easier option is to add it with new token set to old token +1 (as an
>>>> expansion), then decom the leaving node (shrink). That'll minimize
>>>> streaming when you decommission that node.
>>>>
>>>>
>>>>
>>>> On Mon, May 8, 2023 at 7:19 PM Runtian Liu <curly...@gmail.com> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> Sometimes we want to replace a node for various reasons, we can
>>>>> replace a node by shutting down the old node and letting the new node
>>>>> stream data from other replicas, but this approach may have availability
>>>>> issues or data consistency issues if one more node in the same cluster 
>>>>> went
>>>>> down. Why Cassandra doesn't support replacing a node without shutting down
>>>>> the old one? Can we treat the new node as normal node addition while 
>>>>> having
>>>>> exactly the same token ranges as the node to be replaced. After the new
>>>>> node's joining process is complete, we just need to cut off the old node.
>>>>> With this, we don't lose any availability and the token range is not moved
>>>>> so no clean up is needed. Is there any downside of doing this?
>>>>>
>>>>> Thanks,
>>>>> Runtian
>>>>>
>>>>

Reply via email to