> by default C* does prohibit concurrent bootstraps (behaviour which can be > overridden with the cassandra.consistent.rangemovement system property). But > there's nothing to stop you fully bootstrapping additional nodes in series, > then removing them in the same way.
I think there are multiple important things in which CEP-21 actually might be helpful. Right now, in a 5-node cluster with RF-3, each node is holding a range that is between its own token and its predecessor in the ring, along with RF-1 ranges replicated from the neighbours. What CEP-21 will allow us to do is to make _some_ RF-sized subset of 5 nodes we have in the cluster be owners of an arbitrary range. That will _also_ mean that you can add a 6th node, that owns nothing at first, and bootstrap it as a participant for read/write quorums of the same ranges node A is a read/write replica of, and, in the next step, remove A as a read/write replica. I believe such approach would still be incredibly costly (i.e. you will have to re-stream entire data), but if there are other means are available for sharing disk or sstables that would lower the cost for you, this might even work as a lower-risk upgrade option, even though I think most operators won't be using this. What could be widely beneficial is having an ability to test new version as a canary in a write-survey mode, and then add it as a read replica, but for a small subset of data (effectively decreasing availability of this particular range by extending its RF). > What you will be able to do post CEP-21, is to run concurrent bootstraps of > nodes which don't share ranges I think we can do even better: we can take an arbitrary range, and split it into N parts, effectively making all N items bootstrappable in parallel. I also think (however I haven't checked if that's truly the case) that we can prepare the plan in which we can allow executing StartJoin for all nodes, while the range is locked, but block execution of `MidJoin` for any of the nodes until StartJoin for all of them is executed and, similarly, throttling FinishJoin before MidJoin is executed for all the nodes. In other words, I think there might be a bit of a room for flexibility, the question is what way will be the most beneficial. On Thu, Oct 20, 2022, at 3:33 PM, Sam Tunnicliffe wrote: > > Add A' to the cluster with the same keyspace as A. > > Can you clarify what you mean here? > > > Currently these operations have to be performed in sequence. My > > understanding is that you can't add more than one node at a time. > > To ensure consistency guarantees are honoured, by default C* does prohibit > concurrent bootstraps (behaviour which can be overridden with the > cassandra.consistent.rangemovement system property). But there's nothing to > stop you fully bootstrapping additional nodes in series, then removing them > in the same way. > > Why you would want to do this, or to use bootstrap and remove for this at all > rather than upgrading in place isn't clear to me though, doing it this way > just adds a streaming overhead that doesn't otherwise exist. > > What you will be able to do post CEP-21, is to run concurrent bootstraps of > nodes which don't share ranges. This is a definite an improvement on the > status quo, but it's only an initial step. CEP-21 is intended to lay the > foundations for further improvements down the line. > > >> On 20 Oct 2022, at 14:04, Claude Warren, Jr via dev >> <dev@cassandra.apache.org> wrote: >> >> My understanding of our process is (assuming we have 3 nodes A,B,C): >> * Add A' to the cluster with the same keyspace as A. >> * Remove A from the cluster. >> * Add B' to the cluster >> * Remove B from the cluster >> * Add C' to the cluster >> * Remove C from the cluster. >> Currently these operations have to be performed in sequence. My >> understanding is that you can't add more than one node at a time. What we >> would like to do is do this is 3 steps: >> * Add A', B', C' to the cluster. >> * Wait for all 3 to be accepted and functioning. >> * Remove A, B, C from the cluster. >> Does CEP-21 make this possible? >> >> On Thu, Oct 20, 2022 at 1:43 PM Sam Tunnicliffe <s...@beobal.com> wrote: >>> I'm not sure I 100% understand the question, but the things covered in >>> CEP-21 won't enable you to as an operator to bootstrap all your new nodes >>> without fully joining, then perform an atomic CAS to replace the existing >>> members. CEP-21 alone also won't solve all cross-version streaming issues, >>> which is one reason performing topology-modifying operations like bootstrap >>> & decommission during an upgrade are not generally considered a good idea. >>> >>> Transactional metadata will make the bootstrapping (and decommissioning) >>> experience a whole lot more stable and predictable, so in the short term I >>> would expect the recommended rolling approach to upgrades would improve >>> significantly. >>> >>> >>> > On 20 Oct 2022, at 12:24, Claude Warren, Jr via dev >>> > <dev@cassandra.apache.org> wrote: >>> > >>> > After CEP-21 would it be possible to take a cluster of 6 nodes, spin up 6 >>> > new nodes to duplicate the 6 existing nodes and then spin down the >>> > original 6 nodes. Basically, I am thinking of the case where a cluster >>> > is running version x.y.z and want to run x.y.z+1, can they spin up an >>> > equal number of x.y.z+1 systems and replace the old ones without shutting >>> > down the cluster? >>> > >>> > We currently try something like this where we spin up 1 system and then >>> > drop 1 system until all the old nodes are replaced. This process >>> > frequently runs into streaming failures while bootstrapping. >>> > >>> > Any insights would be appreciated. >>> > >>> > Claude >>>