[jira] [Comment Edited] (CASSSIDECAR-274) Enable rolling restarts of Cassandra clusters via Sidecar

Jira Thu, 28 Aug 2025 14:46:05 -0700


    [ 
https://issues.apache.org/jira/browse/CASSSIDECAR-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18016891#comment-18016891
 ]


Andrés Beck-Ruiz edited comment on CASSSIDECAR-274 at 8/28/25 9:36 PM:
-----------------------------------------------------------------------

I agree that having a single centralized API for all operations is ideal from a 
user experience perspective.
{quote}Is there a particular benefit (or limitation with the existing 
framework) to keeping these operation types distinct at the API level that I 
might be missing?  
{quote}
These are the original limitations I found with the existing framework, where 
it might have to be enhanced:
 * The framework does not persist jobs if a Sidecar instance crashes, which 
CASSSIDECAR-341 would address
 * Sidecar instances can't query the status of non-local jobs, which again 
could be addressed by CASSSIDECAR-341
 * There isn't an ability to update the state of a currently running job. The 
ability to pause or abort a restart would be important for ensuring operational 
safety.
 * The current 
[OperationalJobResponse|https://github.com/apache/cassandra-sidecar/blob/trunk/client-common/src/main/java/org/apache/cassandra/sidecar/common/response/OperationalJobResponse.java]
 is not verbose enough to allow proper visibility into a restart job or other 
cluster-wide operations. It would be important for an operator to understand 
which individual nodes have failed/succeeded to restart, for example.

I have a draft of a CEP for approaching rolling restarts via Sidecar ready, and 
it includes a design for durable, cluster-accessible operations that could 
address CASSSIDECAR-341 and an extensible approach to cluster-wide operations 
as well. I am planning to open it so that the larger community can give 
feedback as well, and am open to further discussion about how this API could be 
organized and whether we should extend the current job management framework.


was (Author: JIRAUSER310114):
I agree that having a single centralized API for all operations is ideal from a 
user experience perspective.
{quote}Is there a particular benefit (or limitation with the existing 
framework) to keeping these operation types distinct at the API level that I 
might be missing?  
{quote}
These are the original limitations I found with the existing framework, where 
it might have to be enhanced:
 * The framework does not persist jobs if a Sidecar instance crashes, which 
CASSSIDECAR-341 would address
 * Sidecar instances can't query the status of non-local jobs, which again 
could be addressed by CASSSIDECAR-341
 * There isn't an ability to update the state of a currently running job. The 
ability to pause or abort a restart would be important for ensuring operational 
safety.
 * The current 
[OperationalJobResponse|https://github.com/apache/cassandra-sidecar/blob/trunk/client-common/src/main/java/org/apache/cassandra/sidecar/common/response/OperationalJobResponse.java]
 is not verbose enough to allow proper visibility into a restart job or other 
cluster-wide operations. It would be important for an operator to understand 
which individual nodes have failed/succeeded to restart, for example.

I have a draft of a CEP for approaching rolling restarts via Sidecar ready, and 
it includes a design for durable, cluster-accessible operations that could 
address CASSSIDECAR-341 and an extensible approach to cluster-wide operations 
as well. I am planning to open it so that the larger community can give 
feedback, and am open to further discussion about how this API could be 
organized and whether we should extend the current job management framework.

> Enable rolling restarts of Cassandra clusters via Sidecar
> ---------------------------------------------------------
>
>                 Key: CASSSIDECAR-274
>                 URL: https://issues.apache.org/jira/browse/CASSSIDECAR-274
>             Project: Sidecar for Apache Cassandra
>          Issue Type: Improvement
>            Reporter: Isaac Reath
>            Priority: Major
>         Attachments: Screenshot 2025-08-13 at 12.34.43 PM.png
>
>
> Rolling restarts are frequently used in Cassandra to apply changes to a 
> cluster such as configuration changes, or version upgrades. In 
> CASSSIDECAR-266, we are adding functionality to safely start and stop a 
> single Cassandra node via Sidecar. This ticket will build on that work to 
> implement a coordinated rolling restart. 
> The scope of this effort includes:
>  * Adding API endpoints to enable operators to start, monitor, pause and stop 
> a rolling restart.
>  * Updating Sidecar to orchestrate start and stop operations across the 
> cluster, allowing for a configurable amount of nodes to be offline 
> simultaneously.
>  * Creating safeguards to ensure that a rolling restart is safe to perform 
> and does not interfere with other operations ongoing in the cluster such as 
> node bootstraps or decommissions. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSSIDECAR-274) Enable rolling restarts of Cassandra clusters via Sidecar

Reply via email to