Hi all, New to Cassandra, I'm trying to wrap my head around how dead nodes should be revived.
Specifically, we deployed our cluster in Kubernetes, which means that nodes that go down will lose their IP address. When restarted, it is possible that: 1. their IP address changes 2. their new IP address is that of another downed node. I spent the last two days looking for, and reading, possible solutions online. However I could not find any recent or working solution (any link would be appreciated). I've seen plenty of hacks where people would define one k8s service per node, but that sounds like a burdensome and fragile solution. My current understanding is that a node should be able to be revived and get its missing data from hinted handoff if it wasn't down longer than max_hint_handoff_windom. Or, if that window is exceeded, a repair would be needed. In any case, it's possible that the data is still available, and I'd like to avoid having to stream everything from zero from the other nodes. I also looked into -Dcassandra.replace_address, but I feel like that would trigger a new token assignment, and again lots of streaming. Finally there's one thing unclear to me as of yet (forgetting that dynamic IP address and kubernetes stuff): say I have several downed nodes, in the "DN" state. When one of those nodes is restarted, will it go through the "UJ" state? In other words, can I restart all downed nodes at once, or should I still respect the 2 minute rule? And how would that work with dynamic IP addresses? tl;dr: is there any updated documentation on how to revive nodes consistently when static IP addresses can't be assigned? Best regards, Antoine