Upgrade from C* 3 to C* 4 per datacenter

2023-10-26 Thread Michalis Kotsiouros (EXT) via user
Hello Cassandra community,
We are trying to upgrade our systems from Cassandra 3 to Cassandra 4. We plan 
to do this per data center.
During the upgrade, a cluster with mixed SW levels is expected. At this point 
is it possible to perform topology changes?
In case of an upgrade failure, would it be possible to remove the data center 
from the cluster, restore the datacenter to C*3 SW and add it back to cluster 
which will contain datacenters in both C* 3 and C*4? Alternatively, could we 
remove the datacenter, perform the SW upgrade to C*4 and then add it back to 
the cluster? Are there any suggestions or experiences regarding this fallback 
scenario?

BR
MK



Re: Upgrade from C* 3 to C* 4 per datacenter

2023-10-26 Thread C. Scott Andreas

The recommended approach to upgrading is to perform a replica-safe rolling restart of instances in 
each datacenter, one datacenter at a time. > In case of an upgrade failure, would it be possible 
to remove the data center from the cluster, restore the datacenter to C*3 SW and add it back to 
cluster which will contain datacenters in both C* 3 and C*4? Streaming and repair are not supported 
between 3.x and 4.x instances, so it will not be possible to bootstrap a datacenter as 3.x from nodes 
that are running 4.x. The approach above isn't an option and, in many topologies, may violate 
consistency or induce data loss. > Alternatively, could we remove the datacenter, perform the SW 
upgrade to C*4 and then add it back to the cluster? You *could* do this if the datacenter is added 
back on 4.x, but it's not quite clear what it would accomplish. By far the safest and most tested 
upgrade path used by nearly everyone is a replica-safe rolling restart of instances in each 
datacenter, one datacenter at a time. Could you say more about the concerns you have with this 
upgrade path, or the worries you are hoping to mitigate? – Scott On Oct 26, 2023, at 8:32 AM, 
"Michalis Kotsiouros (EXT) via user"  wrote: Hello 
Cassandra community, We are trying to upgrade our systems from Cassandra 3 to Cassandra 4. We plan to 
do this per data center. During the upgrade, a cluster with mixed SW levels is expected. At this 
point is it possible to perform topology changes? In case of an upgrade failure, would it be possible 
to remove the data center from the cluster, restore the datacenter to C*3 SW and add it back to 
cluster which will contain datacenters in both C* 3 and C*4? Alternatively, could we remove the 
datacenter, perform the SW upgrade to C*4 and then add it back to the cluster? Are there any 
suggestions or experiences regarding this fallback scenario? BR MK

RE: Upgrade from C* 3 to C* 4 per datacenter

2023-10-26 Thread Michalis Kotsiouros (EXT) via user
Hello Scott,
Thanks a lot for the immediate answer.
We use a semi automated procedure to do the upgrade of the SW in our systems 
which is done per datacenter.
Our limitation is that if we want to rollback we need to rollback the Cassandra 
nodes from the whole datacenter.
May I return to the alternatives:
Would it be possible to add a new Datacenter of C*3 SW in a cluster with 
datacenters on c*3 and C*4 by limiting the data streaming only from the C*3 
datacenters?
Would it be possible to add a new Datacenter of C*4 SW in a cluster with 
datacenters on C*3 and C*4 by limiting the data streaming only from the C*4 
datacenters?
I am currently planning my tests so any initial statement would be really 
valuable for me to know what to expect and what to try out.

My purpose of investigating this is that I want to reduce the downtime of a 
datacenter in case of an upgrade failure. If I have the possibility to add a 
datacenter of either C*3 or C*4 SW in a cluster with datacenters of both C*3 or 
C*4, I will be able to handle its failure case independently and add the 
datacenter back to service quicker.
My other alternatives in this case, is to upgrade the remaining datacenters 
from C*3 to C*4 and upgrade the “failed” datacenter to C*4 SW and add it back 
to the cluster. This has an increased time of operating with less datacenters 
and poses the risk when upgrade failures happen to multiple datacenters.

BR
MK

From: C. Scott Andreas 
Sent: October 26, 2023 10:43
To: user@cassandra.apache.org
Cc: user@cassandra.apache.org; Michalis Kotsiouros (EXT) 

Subject: Re: Upgrade from C* 3 to C* 4 per datacenter

The recommended approach to upgrading is to perform a replica-safe rolling 
restart of instances in each datacenter, one datacenter at a time.

> In case of an upgrade failure, would it be possible to remove the data center 
> from the cluster, restore the datacenter to C*3 SW and add it back to cluster 
> which will contain datacenters in both C* 3 and C*4?

Streaming and repair are not supported between 3.x and 4.x instances, so it 
will not be possible to bootstrap a datacenter as 3.x from nodes that are 
running 4.x. The approach above isn't an option and, in many topologies, may 
violate consistency or induce data loss.

> Alternatively, could we remove the datacenter, perform the SW upgrade to C*4 
> and then add it back to the cluster?

You *could* do this if the datacenter is added back on 4.x, but it's not quite 
clear what it would accomplish.

By far the safest and most tested upgrade path used by nearly everyone is a 
replica-safe rolling restart of instances in each datacenter, one datacenter at 
a time.

Could you say more about the concerns you have with this upgrade path, or the 
worries you are hoping to mitigate?

– Scott


On Oct 26, 2023, at 8:32 AM, "Michalis Kotsiouros (EXT) via user" 
mailto:user@cassandra.apache.org>> wrote:


Hello Cassandra community,
We are trying to upgrade our systems from Cassandra 3 to Cassandra 4. We plan 
to do this per data center.
During the upgrade, a cluster with mixed SW levels is expected. At this point 
is it possible to perform topology changes?
In case of an upgrade failure, would it be possible to remove the data center 
from the cluster, restore the datacenter to C*3 SW and add it back to cluster 
which will contain datacenters in both C* 3 and C*4? Alternatively, could we 
remove the datacenter, perform the SW upgrade to C*4 and then add it back to 
the cluster? Are there any suggestions or experiences regarding this fallback 
scenario?

BR
MK




Re: Upgrade from C* 3 to C* 4 per datacenter

2023-10-26 Thread Jeff Jirsa


> On Oct 26, 2023, at 12:32 AM, Michalis Kotsiouros (EXT) via user 
>  wrote:
> 
> 
> Hello Cassandra community,
> We are trying to upgrade our systems from Cassandra 3 to Cassandra 4. We plan 
> to do this per data center.
> During the upgrade, a cluster with mixed SW levels is expected. At this point 
> is it possible to perform topology changes?

You may find that this works at various stages when an entire dc is upgraded or 
not upgraded if you don’t do any cross-DC streaming (eg no SimpleStrategy 
keyspaces at all).  It’s not guaranteed to work, we don’t test it, but I expect 
that it probably will.

Schema changes will not. 

> In case of an upgrade failure, would it be possible to remove the data center 
> from the cluster, restore the datacenter to C*3 SW and add it back to cluster 
> which will contain datacenters in both C* 3 and C*4?

Definitely possible for the first DC you upgrade
Untested for the second-last 

> Alternatively, could we remove the datacenter, perform the SW upgrade to C*4 
> and then add it back to the cluster?

Not really. Probably technically possible but doesn’t make a lot of practical 
sense

> Are there any suggestions or experiences regarding this fallback scenario?

Doing one host, then one replica of each replica set (1/3rd of hosts / 1 AZ), 
then one DC, then repeat for all DCs. The point of no return, so to speak, is 
when you get into the second AZ of the second DC. Until that point you can just 
act as if the upgraded hosts failed all at once and re-stream that data via 
bootstrap 

Not clear what exactly worries you in the upgrade, but restore a backup to a 
lab and run the upgrade once or twice offline. Doesn’t have to be a full size 
cluster, just a few hosts in a few fake DCs. The 3-4 upgrade was pretty 
uneventful compared to past upgrades, especially if you use the later releases. 
Good to be cautious, though. 


RE: Upgrade from C* 3 to C* 4 per datacenter

2023-10-26 Thread Michalis Kotsiouros (EXT) via user
Hello Jeff et al,
Thanks a lot for your valuable info. Your comment covers all my queries.

BR
MK

From: Jeff Jirsa 
Sent: October 26, 2023 15:48
To: user@cassandra.apache.org
Cc: Michalis Kotsiouros (EXT) 
Subject: Re: Upgrade from C* 3 to C* 4 per datacenter




On Oct 26, 2023, at 12:32 AM, Michalis Kotsiouros (EXT) via user 
mailto:user@cassandra.apache.org>> wrote:

Hello Cassandra community,
We are trying to upgrade our systems from Cassandra 3 to Cassandra 4. We plan 
to do this per data center.
During the upgrade, a cluster with mixed SW levels is expected. At this point 
is it possible to perform topology changes?

You may find that this works at various stages when an entire dc is upgraded or 
not upgraded if you don’t do any cross-DC streaming (eg no SimpleStrategy 
keyspaces at all).  It’s not guaranteed to work, we don’t test it, but I expect 
that it probably will.

Schema changes will not.


In case of an upgrade failure, would it be possible to remove the data center 
from the cluster, restore the datacenter to C*3 SW and add it back to cluster 
which will contain datacenters in both C* 3 and C*4?

Definitely possible for the first DC you upgrade
Untested for the second-last


Alternatively, could we remove the datacenter, perform the SW upgrade to C*4 
and then add it back to the cluster?

Not really. Probably technically possible but doesn’t make a lot of practical 
sense


Are there any suggestions or experiences regarding this fallback scenario?

Doing one host, then one replica of each replica set (1/3rd of hosts / 1 AZ), 
then one DC, then repeat for all DCs. The point of no return, so to speak, is 
when you get into the second AZ of the second DC. Until that point you can just 
act as if the upgraded hosts failed all at once and re-stream that data via 
bootstrap

Not clear what exactly worries you in the upgrade, but restore a backup to a 
lab and run the upgrade once or twice offline. Doesn’t have to be a full size 
cluster, just a few hosts in a few fake DCs. The 3-4 upgrade was pretty 
uneventful compared to past upgrades, especially if you use the later releases. 
Good to be cautious, though.


Re: Upgrade from C* 3 to C* 4 per datacenter

2023-10-26 Thread Sebastian Marsching
Hi,

as we are currently facing the same challenge (upgrading an existing cluster 
from C* 3 to C* 4), I wanted to share our strategy with you. It largely is what 
Scott already suggested, but I have some extra details, so I thought it might 
still be useful.

We duplicated our cluster using the strategy described at 
http://adamhutson.com/cloning-cassandra-clusters-the-fast-way/. Of course it is 
possible to figure out all the steps on your own, but I feel like this detailed 
guide saved me at least a few hours, if not days. Instead of restoring from a 
backup, we chose to create a snapshot on the live nodes and copy the data from 
there, but this does not really change the overall process.

We only run a single data-center cluster, but I think that this process easily 
translates to a multi data-center setup. In this case, you can choose to only 
clone a single data center or you can clone a few or all of them, if you deem 
this to be necessary for your tests. The only “limitation” is that for each 
data center that you clone, you need exactly the same number of nodes in your 
test cluster that you have in the respective data center of your production 
cluster.

Once the cluster is cloned, you can test whatever you like (e.g. upgrade to C* 
4, test operations in a mixed-version cluster, etc.).

Our experience with the upgrade from C* 3.11 to C* 4.1 on the test cluster was 
quite smooth. The only problem that we saw was that when later adding a second 
data center to the test cluster, we got a lot of CorruptSSTableExceptions on 
one of the nodes in the existing data center. We first attributed this to the 
upgrade, but later we found out that this also happens when running on C* 3.11.

We now believe that the hardware of one of the nodes that we used for the test 
cluster has a defect, because the exceptions were limited to this exact node, 
even after moving data around. It just took us a while to figure this out, 
because the hardware for the test cluster was brand new, so “broken hardware” 
wasn’t our first guess. We are still in the process of definitely proving that 
this specific piece of hardware is broken, but we are now sufficiently 
confident in the stability of C* 4, that we are soon going to move forward with 
upgrading the production cluster.

-Sebastian



smime.p7s
Description: S/MIME cryptographic signature


Re: Upgrade from C* 3 to C* 4 per datacenter

2023-10-26 Thread Aaron Ploetz
Just a heads-up, but there have been issues (at least one) reported when
upgrading a multi-DC cluster from 3.x to 4.x when the cluster uses
node-to-node SSL/TLS encryption. This is largely attributed to the fact
that the secure port in 4.x changes to 9142, whereas in 3.x it continues to
run on 9042 (same as non-SSL/TLS).

On Thu, Oct 26, 2023 at 2:03 PM Sebastian Marsching 
wrote:

> Hi,
>
> as we are currently facing the same challenge (upgrading an existing
> cluster from C* 3 to C* 4), I wanted to share our strategy with you. It
> largely is what Scott already suggested, but I have some extra details, so
> I thought it might still be useful.
>
> We duplicated our cluster using the strategy described at
> http://adamhutson.com/cloning-cassandra-clusters-the-fast-way/. Of course
> it is possible to figure out all the steps on your own, but I feel like
> this detailed guide saved me at least a few hours, if not days. Instead of
> restoring from a backup, we chose to create a snapshot on the live nodes
> and copy the data from there, but this does not really change the overall
> process.
>
> We only run a single data-center cluster, but I think that this process
> easily translates to a multi data-center setup. In this case, you can
> choose to only clone a single data center or you can clone a few or all of
> them, if you deem this to be necessary for your tests. The only
> “limitation” is that for each data center that you clone, you need exactly
> the same number of nodes in your test cluster that you have in the
> respective data center of your production cluster.
>
> Once the cluster is cloned, you can test whatever you like (e.g. upgrade
> to C* 4, test operations in a mixed-version cluster, etc.).
>
> Our experience with the upgrade from C* 3.11 to C* 4.1 on the test cluster
> was quite smooth. The only problem that we saw was that when later adding a
> second data center to the test cluster, we got a lot of
> CorruptSSTableExceptions on one of the nodes in the existing data center.
> We first attributed this to the upgrade, but later we found out that this
> also happens when running on C* 3.11.
>
> We now believe that the hardware of one of the nodes that we used for the
> test cluster has a defect, because the exceptions were limited to this
> exact node, even after moving data around. It just took us a while to
> figure this out, because the hardware for the test cluster was brand new,
> so “broken hardware” wasn’t our first guess. We are still in the process of
> definitely proving that this specific piece of hardware is broken, but we
> are now sufficiently confident in the stability of C* 4, that we are soon
> going to move forward with upgrading the production cluster.
>
> -Sebastian
>
>


Backup and Restore Strategy and Tools

2023-10-26 Thread Bhavesh Prajapati via user
Hi,

I have 48 nodes, single DC, Apache Cassandra cluster running in Prod – version 
is 4.0.6.
Currently, we are using a home-grown backup script based on nodetool snapshot 
that uploads backup to s3. We are using a home-grown restore script to recover 
incase of disaster.

I am looking for a guidance on what is a good backup and restore strategy when 
cluster has 48 nodes.
Is it possible to use Datastax OpsCenter for Apache Cassandra cluster ? Is it 
available to use for free ?
Is there any other UI or Command line tools that you recommend ?

Thanks,
Bhavesh


Re: Backup and Restore Strategy and Tools

2023-10-26 Thread Miklosovic, Stefan via user
Hi Bhavesh,

have you gone through Cassandra tools here (1)?

Just search for "backup", there are couple (CLI) solutions out there for your 
problem.

Feel tree to ping me on Cassandra Slack or privately if you want.

Cheers

https://cassandra.apache.org/_/ecosystem.html


From: Bhavesh Prajapati via user 
Sent: Thursday, October 26, 2023 20:44
To: user@cassandra.apache.org
Cc: Bhavesh Prajapati
Subject: Backup and Restore Strategy and Tools

NetApp Security WARNING: This is an external email. Do not click links or open 
attachments unless you recognize the sender and know the content is safe.



Hi,

I have 48 nodes, single DC, Apache Cassandra cluster running in Prod – version 
is 4.0.6.
Currently, we are using a home-grown backup script based on nodetool snapshot 
that uploads backup to s3. We are using a home-grown restore script to recover 
incase of disaster.

I am looking for a guidance on what is a good backup and restore strategy when 
cluster has 48 nodes.
Is it possible to use Datastax OpsCenter for Apache Cassandra cluster ? Is it 
available to use for free ?
Is there any other UI or Command line tools that you recommend ?

Thanks,
Bhavesh


Re: Resources to understand rebalancing

2023-10-26 Thread Vikas Kumar
Thanks for sharing this clear explanation, Jeff. Cheers!

On Wed, Oct 25, 2023 at 5:58 PM Jeff Jirsa  wrote:

> Data ownership is defined by the token ring concept.
>
> Hosts in the cluster may have tokens - let's oversimplify to 5 hosts, each
> with 1 token A=0, B=1000, C=2000, D=3000, E=4000
>
> The partition key is hashed to calculate the token, and the next 3 hosts
> in the ring are the "owners" of that data - a key that hashes to 1234 would
> be found on hosts C, D, E
>
> Anytime hosts move tokens (joining/expansion, leaving/shrink,
> re-arranging/moves), the tokens go into a pending state.
>
> So if you were to add a 6th host here, let's say F=2500, when it first
> gossips, it'll have 2500 in a different JOINING (pending) state. In that
> state, it won't get any read traffic, but the quorum calculations will be
> augmented to send extra writes - instead of needing 2/3 nodes to ack any
> write, it'll require a total of 3 acks (of the 4 possible replicas, the 3
> natural replicas and the 1 pending replica).
>
> When the node finishes joining, and it gossips its state=NORMAL, it'll be
> removed from pending, and the reads will move to it instead.
>
> The gossip state transition from pending to normal isn't exact, it's
> propagated via gossip (so it's seconds of change where reads/writes can hit
> either replica), but the increase in writes (writing to both destinations)
> should make it safe in that transition. It's being rewritten to be
> transactional in an upcoming version of cassandra.
>
>
>
> On Tue, Oct 24, 2023 at 11:39 PM Vikas Kumar  wrote:
>
>> Hi folks,
>>
>> I am looking for some resources to understand the internals of
>> rebalancing in Cassandra. Specifically:
>>
>>  - How are read and write queries served during data migration?
>>  - How is the cutover from the current node to the new node performed?
>>
>> Any help is greatly appreciated.
>>
>> Thanks,
>> Vikas
>>
>