Right, all of the things you describe will be possible post CEP-21, just not immediately. My point is that CEP-21 has a specific scope and a lot of the great planned improvements necessarily fall outside of that.
> On 20 Oct 2022, at 15:42, Alex Petrov <al...@coffeenco.de> wrote: > > > by default C* does prohibit concurrent bootstraps (behaviour which can be > > overridden with the cassandra.consistent.rangemovement system property). > > But there's nothing to stop you fully bootstrapping additional nodes in > > series, then removing them in the same way. > > I think there are multiple important things in which CEP-21 actually might be > helpful. Right now, in a 5-node cluster with RF-3, each node is holding a > range that is between its own token and its predecessor in the ring, along > with RF-1 ranges replicated from the neighbours. > > What CEP-21 will allow us to do is to make _some_ RF-sized subset of 5 nodes > we have in the cluster be owners of an arbitrary range. That will _also_ mean > that you can add a 6th node, that owns nothing at first, and bootstrap it as > a participant for read/write quorums of the same ranges node A is a > read/write replica of, and, in the next step, remove A as a read/write > replica. > > I believe such approach would still be incredibly costly (i.e. you will have > to re-stream entire data), but if there are other means are available for > sharing disk or sstables that would lower the cost for you, this might even > work as a lower-risk upgrade option, even though I think most operators won't > be using this. What could be widely beneficial is having an ability to test > new version as a canary in a write-survey mode, and then add it as a read > replica, but for a small subset of data (effectively decreasing availability > of this particular range by extending its RF). > > > What you will be able to do post CEP-21, is to run concurrent bootstraps of > > nodes which don't share ranges > > I think we can do even better: we can take an arbitrary range, and split it > into N parts, effectively making all N items bootstrappable in parallel. I > also think (however I haven't checked if that's truly the case) that we can > prepare the plan in which we can allow executing StartJoin for all nodes, > while the range is locked, but block execution of `MidJoin` for any of the > nodes until StartJoin for all of them is executed and, similarly, throttling > FinishJoin before MidJoin is executed for all the nodes. In other words, I > think there might be a bit of a room for flexibility, the question is what > way will be the most beneficial. > > On Thu, Oct 20, 2022, at 3:33 PM, Sam Tunnicliffe wrote: >> > Add A' to the cluster with the same keyspace as A. >> >> Can you clarify what you mean here? >> >> > Currently these operations have to be performed in sequence. My >> > understanding is that you can't add more than one node at a time. >> >> To ensure consistency guarantees are honoured, by default C* does prohibit >> concurrent bootstraps (behaviour which can be overridden with the >> cassandra.consistent.rangemovement system property). But there's nothing to >> stop you fully bootstrapping additional nodes in series, then removing them >> in the same way. >> >> Why you would want to do this, or to use bootstrap and remove for this at >> all rather than upgrading in place isn't clear to me though, doing it this >> way just adds a streaming overhead that doesn't otherwise exist. >> >> What you will be able to do post CEP-21, is to run concurrent bootstraps of >> nodes which don't share ranges. This is a definite an improvement on the >> status quo, but it's only an initial step. CEP-21 is intended to lay the >> foundations for further improvements down the line. >> >> >>> On 20 Oct 2022, at 14:04, Claude Warren, Jr via dev >>> <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> wrote: >>> >>> My understanding of our process is (assuming we have 3 nodes A,B,C): >>> Add A' to the cluster with the same keyspace as A. >>> Remove A from the cluster. >>> Add B' to the cluster >>> Remove B from the cluster >>> Add C' to the cluster >>> Remove C from the cluster. >>> Currently these operations have to be performed in sequence. My >>> understanding is that you can't add more than one node at a time. What we >>> would like to do is do this is 3 steps: >>> Add A', B', C' to the cluster. >>> Wait for all 3 to be accepted and functioning. >>> Remove A, B, C from the cluster. >>> Does CEP-21 make this possible? >>> >>> On Thu, Oct 20, 2022 at 1:43 PM Sam Tunnicliffe <s...@beobal.com >>> <mailto:s...@beobal.com>> wrote: >>> I'm not sure I 100% understand the question, but the things covered in >>> CEP-21 won't enable you to as an operator to bootstrap all your new nodes >>> without fully joining, then perform an atomic CAS to replace the existing >>> members. CEP-21 alone also won't solve all cross-version streaming issues, >>> which is one reason performing topology-modifying operations like bootstrap >>> & decommission during an upgrade are not generally considered a good idea. >>> >>> Transactional metadata will make the bootstrapping (and decommissioning) >>> experience a whole lot more stable and predictable, so in the short term I >>> would expect the recommended rolling approach to upgrades would improve >>> significantly. >>> >>> >>> > On 20 Oct 2022, at 12:24, Claude Warren, Jr via dev >>> > <dev@cassandra.apache.org <mailto:dev@cassandra.apache.org>> wrote: >>> > >>> > After CEP-21 would it be possible to take a cluster of 6 nodes, spin up 6 >>> > new nodes to duplicate the 6 existing nodes and then spin down the >>> > original 6 nodes. Basically, I am thinking of the case where a cluster >>> > is running version x.y.z and want to run x.y.z+1, can they spin up an >>> > equal number of x.y.z+1 systems and replace the old ones without shutting >>> > down the cluster? >>> > >>> > We currently try something like this where we spin up 1 system and then >>> > drop 1 system until all the old nodes are replaced. This process >>> > frequently runs into streaming failures while bootstrapping. >>> > >>> > Any insights would be appreciated. >>> > >>> > Claude