Hi, One of the most important operational features of Cassandra is how easy it is (or should be) to do an in-place upgrade. The in-place upgrade procedure essentially consists of rolling-restarting the cluster while updating the jar to the new version, while following additional upgrade instructions from NEWS.txt. In practice, as new features are added and existing features are extended, the upgrade procedure gets more complex, placing more burden on operators to ensure a smooth upgrade process.
For example, updating the storage_compatibility_mode from 4.0 to 5.0 requires 3 cluster-wide restarts[1]. Another example is that upgrading to Cassandra 6.0 prohibits operations like schema changes, node replacement, bootstrap, decommission, move, assassinate before all the nodes are migrated to CMS[2]. I don't want to focus on these particular examples, this is just to illustrate that a lot of manual steps and caution is required to perform in-place upgrades safely and smoothly. In order to improve this, I would like to propose extending Cassandra to allow an operator to register an upgrade intent with the goals of: a) Tracking the upgrade progress in a system table b) Verifying the correctness and improving the safety of the upgrade process c) Performing capability limitation during an upgrade d) Perform pre and post upgrade actions automatically, when registered in the upgrade plan by the operator While there is upgrade awareness in the server, it is mostly reactive and scattered across different modules (as far as I last seen). A potential side goal of this effort is to centralize upgrade handling code from different features in the same module, allowing different features to specify upgrade pre/post actions/conditions more uniformly. This would allow for example, developers to specify upgrade constraints via testable code instead of notes in NEWS.txt, with the hope they will be read by a careful operator. The upgrade plan would be registered in a system table and tracked by an upgrade manager module, that would prevent certain operations (ie. range movements/schema changes) when an upgrade plan is active or emit errors/warnings when anomalies are encountered. A few safety/usability improvements can be enabled when the upgrade plan is registered in the server, among others: a) A node could fail startup if it tries to start in a version different from the one specified in the currently active upgrade plan. b) If a latency degradation or other SLO degradation is detected while an upgrade plan is active, then warnings could be emitted allowing operators to more easily detect upgrade issues. c) When the upgrade is determined to be completed successfully, nodes can coordinate running upgrade-sstables or other post-operations according to a policy specified in the upgrade plan (ie. by rack/dc). To give an example of what the API would look like, a user wishing upgrade to upgrade a cluster from version 4.1 to 5.0 would register the upgrade intent via an API, ie.: nodetool upgradeplan create --target 5.0.4 --disable-schema-changes --post-action upgrade-sstables --post-action upgrade-storage-compatibility-mode. It would not be possible to create another upgrade plan if there's a current in progress. The ultimate goal is that the upgrade process to any version will be as simple as registering an upgrade plan, and performing a cluster rolling restart in the desired target version. Any additional actions would be autonomously coordinated by the servers based on the upgrade progress and according to the preferences specified in the upgrade plan. A related, and probably broader, topic is upgrading features. A couple of examples that come to mind are upgrading Paxos[2] or migrating to incremental repair[3]. Like version upgrades, these feature upgrades require a series of steps to be executed on a determined order and sometimes global coordination. While this suggestion focuses on version upgrades, it can potentially be extended to track feature upgrades. I would appreciate your feedback on this draft suggestion to check if it makes sense before elaborating it on a more detailed proposal, as well as pointers to other efforts or past proposals that might be related to this. Thanks, Paulo [1] - https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L15-L21 [2] - https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L142C1-L148C19 [3] - https://lists.apache.org/thread/06bl99mt502k7lowd5ont9jtnf5p0t05