Hello everyone, @Sebastian thanks a lot for the indication about cloning the cluster to do the tests. It is really good information. @Aaron thanks a lot for the heads up. Thanks a lot for the valuable contribution of the community in this case!!!
BR MK From: Aaron Ploetz <aaronplo...@gmail.com> Sent: October 26, 2023 21:08 To: user@cassandra.apache.org Subject: Re: Upgrade from C* 3 to C* 4 per datacenter Just a heads-up, but there have been issues (at least one) reported when upgrading a multi-DC cluster from 3.x to 4.x when the cluster uses node-to-node SSL/TLS encryption. This is largely attributed to the fact that the secure port in 4.x changes to 9142, whereas in 3.x it continues to run on 9042 (same as non-SSL/TLS). On Thu, Oct 26, 2023 at 2:03 PM Sebastian Marsching <sebast...@marsching.com<mailto:sebast...@marsching.com>> wrote: Hi, as we are currently facing the same challenge (upgrading an existing cluster from C* 3 to C* 4), I wanted to share our strategy with you. It largely is what Scott already suggested, but I have some extra details, so I thought it might still be useful. We duplicated our cluster using the strategy described at http://adamhutson.com/cloning-cassandra-clusters-the-fast-way/<https://protect2.fireeye.com/v1/url?k=31323334-501d5122-313273af-454445555731-115c472cae66ec83&q=1&e=a393db99-c4f3-414a-abfb-815cb3868d2d&u=http%3A%2F%2Fadamhutson.com%2Fcloning-cassandra-clusters-the-fast-way%2F>. Of course it is possible to figure out all the steps on your own, but I feel like this detailed guide saved me at least a few hours, if not days. Instead of restoring from a backup, we chose to create a snapshot on the live nodes and copy the data from there, but this does not really change the overall process. We only run a single data-center cluster, but I think that this process easily translates to a multi data-center setup. In this case, you can choose to only clone a single data center or you can clone a few or all of them, if you deem this to be necessary for your tests. The only “limitation” is that for each data center that you clone, you need exactly the same number of nodes in your test cluster that you have in the respective data center of your production cluster. Once the cluster is cloned, you can test whatever you like (e.g. upgrade to C* 4, test operations in a mixed-version cluster, etc.). Our experience with the upgrade from C* 3.11 to C* 4.1 on the test cluster was quite smooth. The only problem that we saw was that when later adding a second data center to the test cluster, we got a lot of CorruptSSTableExceptions on one of the nodes in the existing data center. We first attributed this to the upgrade, but later we found out that this also happens when running on C* 3.11. We now believe that the hardware of one of the nodes that we used for the test cluster has a defect, because the exceptions were limited to this exact node, even after moving data around. It just took us a while to figure this out, because the hardware for the test cluster was brand new, so “broken hardware” wasn’t our first guess. We are still in the process of definitely proving that this specific piece of hardware is broken, but we are now sufficiently confident in the stability of C* 4, that we are soon going to move forward with upgrading the production cluster. -Sebastian