You should be able to get repairs working fine if you use a tool such as cassandra-reaper to manage it for you for such a small cluster. I would look into that before doing major cluster topology changes, as these can be complex and risky. I definitely wouldn't go about it in the way you've described above, as you're likely to run into load issues.
I also would advise against upgrading to the 4.0 beta releases in production. Leave running betas in production to the experts. There has not been enough real world testing to really have any kind of guarantee about the stability of 4.0 beta2. If cassandra-reaper doesn't work for you you'll want to migrate in one of two ways, both of which have their own caveats: 1. A complete DC migration, ideally with the same number of nodes to handle load. All new nodes would have the lower token count and should use the token allocation algorithm. Downside here is expensive but this is the least risk option as it can be managed independently of the existing DC. 2. Decommission and re-add one node at a time, utilising the token allocation algorithm when you re-add nodes. This is more risky as both decommission and addition will incur considerable load on the existing nodes, and you also may not have the spare storage capacity for it to work. You could solve this however by adding a new node first. Note I've left a lot of details out here, as these processes can be quite involved. You should test each method before you do it. raft.so - Cassandra consulting, support, and managed services On Sun, Mar 21, 2021 at 2:09 AM Lapo Luchini <l...@lapo.it> wrote: > I have a 6 nodes production cluster running 3.11.9 with the default > num_tokens=256… which is fine but I later discovered is a bit of a > hassle to do repairs and is probably better to lower that to 16. > > I'm adding two new nodes with much higher space storage and I was > wondering which migration strategy is better. > > If I got it correct I was thinking about this: > 1. add the 2 new nodes as a new "temporary DC", with num_token=16 RF=3 > 2. repair it all, then test it a bit > 3. switch production applications to "DC-temp" > 4. drop the old 6-node DC > 5. re-create it from scratch with num_token=16 RF=3 > 6. switch production applications to "main DC" again > 7. drop "DC-temp", eventually integrate nodes into "main DC" > > I'd also like to migrate from 3.11.9 to 4.0-beta2 (I'm running on > FreeBSD so those are the options), does it make sense to do it during > the mentioned "num_tokens migration" (at step 1, or 5) or does it make > more sense to do it at step 8, as a in-place rolling upgrade of each of > the 6 (or 8) nodes? > > Did I get it correctly? > Can it be done "better"? > > Thanks in advance for any suggestion or correction! > > -- > Lapo Luchini > l...@lapo.it > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org > For additional commands, e-mail: user-h...@cassandra.apache.org > >