Hi all,

New to Cassandra, I'm trying to wrap my head around how dead nodes should be 
revived.

Specifically, we deployed our cluster in Kubernetes, which means that nodes 
that go down will lose their IP address. When restarted, it is possible that:

1. their IP address changes
2. their new IP address is that of another downed node.

I spent the last two days looking for, and reading, possible solutions online. 
However I could not find any recent or working solution (any link would be 
appreciated). I've seen plenty of hacks where people would define one k8s 
service per node, but that sounds like a burdensome and fragile solution.

My current understanding is that a node should be able to be revived and get 
its missing data from hinted handoff if it wasn't down longer than 
max_hint_handoff_windom. Or, if that window is exceeded, a repair would be 
needed. In any case, it's possible that the data is still available, and I'd 
like to avoid having to stream everything from zero from the other nodes.

I also looked into -Dcassandra.replace_address, but I feel like that would 
trigger a new token assignment, and again lots of streaming.

Finally there's one thing unclear to me as of yet (forgetting that dynamic IP 
address and kubernetes stuff): say I have several downed nodes, in the "DN" 
state. When one of those nodes is restarted, will it go through the "UJ" state? 
In other words, can I restart all downed nodes at once, or should I still 
respect the 2 minute rule?

And how would that work with dynamic IP addresses?

tl;dr: is there any updated documentation on how to revive nodes consistently 
when static IP addresses can't be assigned?

Best regards,
Antoine

Reply via email to