Hi,

One of the most important operational features of Cassandra is how easy it
is (or should be) to do an in-place upgrade. The in-place upgrade procedure
essentially consists of rolling-restarting the cluster while updating the
jar to the new version, while following additional upgrade instructions
from NEWS.txt. In practice, as new features are added and existing features
are extended, the upgrade procedure gets more complex, placing more burden
on operators to ensure a smooth upgrade process.

For example, updating the storage_compatibility_mode from 4.0 to 5.0
requires 3 cluster-wide restarts[1]. Another example is that upgrading to
Cassandra 6.0 prohibits operations like schema changes, node replacement,
bootstrap, decommission, move, assassinate before all the nodes are
migrated to CMS[2]. I don't want to focus on these particular examples,
this is just to illustrate that a lot of manual steps and caution is
required to perform in-place upgrades safely and smoothly.

In order to improve this, I would like to propose extending Cassandra to
allow an operator to register an upgrade intent with the goals of:
a) Tracking the upgrade progress in a system table
b) Verifying the correctness and improving the safety of the upgrade process
c) Performing capability limitation during an upgrade
d) Perform pre and post upgrade actions automatically, when registered in
the upgrade plan by the operator

While there is upgrade awareness in the server, it is mostly reactive and
scattered across different modules (as far as I last seen). A potential
side goal of this effort is to centralize upgrade handling code from
different features in the same module, allowing different features to
specify upgrade pre/post actions/conditions more uniformly. This would
allow for example, developers to specify upgrade constraints via testable
code instead of notes in NEWS.txt, with the hope they will be read by a
careful operator.

The upgrade plan would be registered in a system table and tracked by an
upgrade manager module, that would prevent certain operations (ie. range
movements/schema changes) when an upgrade plan is active or emit
errors/warnings when anomalies are encountered. A few safety/usability
improvements can be enabled when the upgrade plan is registered in the
server, among others:
a) A node could fail startup if it tries to start in a version different
from the one specified in the currently active upgrade plan.
b) If a latency degradation or other SLO degradation is detected while an
upgrade plan is active, then warnings could be emitted allowing operators
to more easily detect upgrade issues.
c) When the upgrade is determined to be completed successfully, nodes can
coordinate running upgrade-sstables or other post-operations according to a
policy specified in the upgrade plan (ie. by rack/dc).

To give an example of what the API would look like, a user wishing upgrade
to upgrade a cluster from version 4.1 to 5.0 would register the upgrade
intent via an API, ie.: nodetool upgradeplan create --target 5.0.4
--disable-schema-changes --post-action upgrade-sstables --post-action
upgrade-storage-compatibility-mode. It would not be possible to create
another upgrade plan if there's a current in progress.

The ultimate goal is that the upgrade process to any version will be as
simple as registering an upgrade plan, and performing a cluster rolling
restart in the desired target version. Any additional actions would be
autonomously coordinated by the servers based on the upgrade progress and
according to the preferences specified in the upgrade plan.

A related, and probably broader, topic is upgrading features. A couple of
examples that come to mind are upgrading Paxos[2] or migrating to
incremental repair[3]. Like version upgrades, these feature upgrades
require a series of steps to be executed on a determined order and
sometimes global coordination. While this suggestion focuses on version
upgrades, it can potentially be extended to track feature upgrades.

I would appreciate your feedback on this draft suggestion to check if it
makes sense before elaborating it on a more detailed proposal, as well as
pointers to other efforts or past proposals that might be related to this.

Thanks,

Paulo

[1] - https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L15-L21
[2] - https://github.com/apache/cassandra/blob/trunk/NEWS.txt#L142C1-L148C19
[3] - https://lists.apache.org/thread/06bl99mt502k7lowd5ont9jtnf5p0t05

Reply via email to