Re: CEP-21 and complete cluster replacement.

Sam Tunnicliffe Thu, 20 Oct 2022 08:25:39 -0700

Right, all of the things you describe will be possible post CEP-21, just not 
immediately. My point is that CEP-21 has a specific scope and a lot of the 
great planned improvements necessarily fall outside of that.


> On 20 Oct 2022, at 15:42, Alex Petrov <al...@coffeenco.de> wrote:
> 
> > by default C* does prohibit concurrent bootstraps (behaviour which can be 
> > overridden with the cassandra.consistent.rangemovement system property). 
> > But there's nothing to stop you fully bootstrapping additional nodes in 
> > series, then removing them in the same way.
> 
> I think there are multiple important things in which CEP-21 actually might be 
> helpful. Right now, in a 5-node cluster with RF-3, each node is holding a 
> range that is between its own token and its predecessor in the ring, along 
> with RF-1 ranges replicated from the neighbours. 
> 
> What CEP-21 will allow us to do is to make _some_ RF-sized subset of 5 nodes 
> we have in the cluster be owners of an arbitrary range. That will _also_ mean 
> that you can add a 6th node, that owns nothing at first, and bootstrap it as 
> a participant for read/write quorums of the same ranges node A is a 
> read/write replica of, and, in the next step, remove A as a read/write 
> replica.
> 
> I believe such approach would still be incredibly costly (i.e. you will have 
> to re-stream entire data), but if there are other means are available for 
> sharing disk or sstables that would lower the cost for you, this might even 
> work as a lower-risk upgrade option, even though I think most operators won't 
> be using this. What could be widely beneficial is having an ability to test 
> new version as a canary in a write-survey mode, and then add it as a read 
> replica, but for a small subset of data (effectively decreasing availability 
> of this particular range by extending its RF).
> 
> > What you will be able to do post CEP-21, is to run concurrent bootstraps of 
> > nodes which don't share ranges
> 
> I think we can do even better: we can take an arbitrary range, and split it 
> into N parts, effectively making all N items bootstrappable in parallel. I 
> also think (however I haven't checked if that's truly the case) that we can 
> prepare the plan in which we can allow executing StartJoin for all nodes, 
> while the range is locked, but block execution of `MidJoin` for any of the 
> nodes until StartJoin for all of them is executed and, similarly, throttling 
> FinishJoin before MidJoin is executed for all the nodes. In other words, I 
> think there might be a bit of a room for flexibility, the question is what 
> way will be the most beneficial. 
> 
> On Thu, Oct 20, 2022, at 3:33 PM, Sam Tunnicliffe wrote:
>> > Add A' to the cluster with the same keyspace as A.
>> 
>> Can you clarify what you mean here?
>> 
>> > Currently these operations have to be performed in sequence.  My 
>> > understanding is that you can't add more than one node at a time.  
>> 
>> To ensure consistency guarantees are honoured, by default C* does prohibit 
>> concurrent bootstraps (behaviour which can be overridden with the 
>> cassandra.consistent.rangemovement system property). But there's nothing to 
>> stop you fully bootstrapping additional nodes in series, then removing them 
>> in the same way.
>> 
>> Why you would want to do this, or to use bootstrap and remove for this at 
>> all rather than upgrading in place isn't clear to me though, doing it this 
>> way just adds a streaming overhead that doesn't otherwise exist.
>> 
>> What you will be able to do post CEP-21, is to run concurrent bootstraps of 
>> nodes which don't share ranges. This is a definite an improvement on the 
>> status quo, but it's only an initial step. CEP-21 is intended to lay the 
>> foundations for further improvements down the line.
>> 
>> 
>>> On 20 Oct 2022, at 14:04, Claude Warren, Jr via dev 
>>> <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> wrote:
>>> 
>>> My understanding of our process is (assuming we have 3 nodes A,B,C):
>>> Add A' to the cluster with the same keyspace as A.
>>> Remove A from the cluster.
>>> Add B' to the cluster
>>> Remove B from the cluster
>>> Add C' to the cluster
>>> Remove C from the cluster.
>>> Currently these operations have to be performed in sequence.  My 
>>> understanding is that you can't add more than one node at a time.  What we 
>>> would like to do is do this is 3 steps:
>>> Add A', B', C' to the cluster.
>>> Wait for all 3 to be accepted and functioning.
>>> Remove A, B, C from the cluster.
>>> Does CEP-21 make this possible?
>>> 
>>> On Thu, Oct 20, 2022 at 1:43 PM Sam Tunnicliffe <s...@beobal.com 
>>> <mailto:s...@beobal.com>> wrote:
>>> I'm not sure I 100% understand the question, but the things covered in 
>>> CEP-21 won't enable you to as an operator to bootstrap all your new nodes 
>>> without fully joining, then perform an atomic CAS to replace the existing 
>>> members. CEP-21 alone also won't solve all cross-version streaming issues, 
>>> which is one reason performing topology-modifying operations like bootstrap 
>>> & decommission during an upgrade are not generally considered a good idea.
>>> 
>>> Transactional metadata will make the bootstrapping (and decommissioning) 
>>> experience a whole lot more stable and predictable, so in the short term I 
>>> would expect the recommended rolling approach to upgrades would improve 
>>> significantly. 
>>> 
>>> 
>>> > On 20 Oct 2022, at 12:24, Claude Warren, Jr via dev 
>>> > <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> wrote:
>>> > 
>>> > After CEP-21 would it be possible to take a cluster of 6 nodes, spin up 6 
>>> > new nodes to duplicate the 6 existing nodes and then spin down the 
>>> > original 6 nodes.  Basically, I am thinking of the case where a cluster 
>>> > is running version x.y.z and want to run x.y.z+1, can they spin up an 
>>> > equal number of x.y.z+1 systems and replace the old ones without shutting 
>>> > down the cluster?
>>> > 
>>> > We currently try something like this where we spin up 1 system and then 
>>> > drop 1 system until all the old nodes are replaced.  This process 
>>> > frequently runs into streaming failures while bootstrapping.
>>> > 
>>> > Any insights would be appreciated.
>>> > 
>>> > Claude

Re: CEP-21 and complete cluster replacement.

Reply via email to