> by default C* does prohibit concurrent bootstraps (behaviour which can be 
> overridden with the cassandra.consistent.rangemovement system property). But 
> there's nothing to stop you fully bootstrapping additional nodes in series, 
> then removing them in the same way.

I think there are multiple important things in which CEP-21 actually might be 
helpful. Right now, in a 5-node cluster with RF-3, each node is holding a range 
that is between its own token and its predecessor in the ring, along with RF-1 
ranges replicated from the neighbours. 

What CEP-21 will allow us to do is to make _some_ RF-sized subset of 5 nodes we 
have in the cluster be owners of an arbitrary range. That will _also_ mean that 
you can add a 6th node, that owns nothing at first, and bootstrap it as a 
participant for read/write quorums of the same ranges node A is a read/write 
replica of, and, in the next step, remove A as a read/write replica.

I believe such approach would still be incredibly costly (i.e. you will have to 
re-stream entire data), but if there are other means are available for sharing 
disk or sstables that would lower the cost for you, this might even work as a 
lower-risk upgrade option, even though I think most operators won't be using 
this. What could be widely beneficial is having an ability to test new version 
as a canary in a write-survey mode, and then add it as a read replica, but for 
a small subset of data (effectively decreasing availability of this particular 
range by extending its RF).

> What you will be able to do post CEP-21, is to run concurrent bootstraps of 
> nodes which don't share ranges

I think we can do even better: we can take an arbitrary range, and split it 
into N parts, effectively making all N items bootstrappable in parallel. I also 
think (however I haven't checked if that's truly the case) that we can prepare 
the plan in which we can allow executing StartJoin for all nodes, while the 
range is locked, but block execution of `MidJoin` for any of the nodes until 
StartJoin for all of them is executed and, similarly, throttling FinishJoin 
before MidJoin is executed for all the nodes. In other words, I think there 
might be a bit of a room for flexibility, the question is what way will be the 
most beneficial. 

On Thu, Oct 20, 2022, at 3:33 PM, Sam Tunnicliffe wrote:
> > Add A' to the cluster with the same keyspace as A.
> 
> Can you clarify what you mean here?
> 
> > Currently these operations have to be performed in sequence.  My 
> > understanding is that you can't add more than one node at a time.  
> 
> To ensure consistency guarantees are honoured, by default C* does prohibit 
> concurrent bootstraps (behaviour which can be overridden with the 
> cassandra.consistent.rangemovement system property). But there's nothing to 
> stop you fully bootstrapping additional nodes in series, then removing them 
> in the same way.
> 
> Why you would want to do this, or to use bootstrap and remove for this at all 
> rather than upgrading in place isn't clear to me though, doing it this way 
> just adds a streaming overhead that doesn't otherwise exist.
> 
> What you will be able to do post CEP-21, is to run concurrent bootstraps of 
> nodes which don't share ranges. This is a definite an improvement on the 
> status quo, but it's only an initial step. CEP-21 is intended to lay the 
> foundations for further improvements down the line.
> 
> 
>> On 20 Oct 2022, at 14:04, Claude Warren, Jr via dev 
>> <dev@cassandra.apache.org> wrote:
>> 
>> My understanding of our process is (assuming we have 3 nodes A,B,C):
>>  * Add A' to the cluster with the same keyspace as A.
>>  * Remove A from the cluster.
>>  * Add B' to the cluster
>>  * Remove B from the cluster
>>  * Add C' to the cluster
>>  * Remove C from the cluster.
>> Currently these operations have to be performed in sequence.  My 
>> understanding is that you can't add more than one node at a time.  What we 
>> would like to do is do this is 3 steps:
>>  * Add A', B', C' to the cluster.
>>  * Wait for all 3 to be accepted and functioning.
>>  * Remove A, B, C from the cluster.
>> Does CEP-21 make this possible?
>> 
>> On Thu, Oct 20, 2022 at 1:43 PM Sam Tunnicliffe <s...@beobal.com> wrote:
>>> I'm not sure I 100% understand the question, but the things covered in 
>>> CEP-21 won't enable you to as an operator to bootstrap all your new nodes 
>>> without fully joining, then perform an atomic CAS to replace the existing 
>>> members. CEP-21 alone also won't solve all cross-version streaming issues, 
>>> which is one reason performing topology-modifying operations like bootstrap 
>>> & decommission during an upgrade are not generally considered a good idea.
>>> 
>>> Transactional metadata will make the bootstrapping (and decommissioning) 
>>> experience a whole lot more stable and predictable, so in the short term I 
>>> would expect the recommended rolling approach to upgrades would improve 
>>> significantly. 
>>> 
>>> 
>>> > On 20 Oct 2022, at 12:24, Claude Warren, Jr via dev 
>>> > <dev@cassandra.apache.org> wrote:
>>> > 
>>> > After CEP-21 would it be possible to take a cluster of 6 nodes, spin up 6 
>>> > new nodes to duplicate the 6 existing nodes and then spin down the 
>>> > original 6 nodes.  Basically, I am thinking of the case where a cluster 
>>> > is running version x.y.z and want to run x.y.z+1, can they spin up an 
>>> > equal number of x.y.z+1 systems and replace the old ones without shutting 
>>> > down the cluster?
>>> > 
>>> > We currently try something like this where we spin up 1 system and then 
>>> > drop 1 system until all the old nodes are replaced.  This process 
>>> > frequently runs into streaming failures while bootstrapping.
>>> > 
>>> > Any insights would be appreciated.
>>> > 
>>> > Claude
>>> 

Reply via email to