We are about to finally embark on some version upgrades for lots of clusters, 2.1.x and 2.2.x targetting eventually 3.11.x
I have seen recipes that do the full binary upgrade + upgrade sstables for 1 node before moving forward, while I've seen a 2016 vote by Jon Haddad (a TLP guy) that backs doing the binary version upgrades through the cluster on a rolling basis, then doing the upgradesstables on a rolling basis. Under what cluster conditions are streaming/node replacement precluded, that is we are vulnerable to a cloud provided dumping one of our nodes under us or hardware failure? We ain't apple, but we do have 30+ node datacenters and 80-100 node clusters. Is the node replacement and streaming only disabled while there are heterogenous cassandra versions, or until all the sstables have been upgraded in the cluster? My instincts tell me the best thing to do is to get all the cassandra nodes to the same version without the upgradesstables step through the cluster, and then roll through the upgradesstables as needed, and that upgradesstables is a node-local concern that doesn't impact streaming or node replacement or other situations since cassandra can read old version sstables and new sstables would simply be the new format.