I’m curious if anyone has ever seen this happen or has any idea how it would happen. I have a 10 cluster node with 5 nodes in each data center running .6 (we're working on the upgrade now). I had several nodes with forgotten deletes so I failed the nodes and bootstrapped them back into the cluster one at a time. Everything seemed fine, but now I’m noticing that all systems in 1 of my data centers see all 10 nodes and all systems in the other data center see just 9. I’m figuring now is the time to fail the node that only half the other nodes can see, but what would cause this to happen?
-Tim Smith DC1 10.x.x.45 Up 162065212751151145161126595807335373 |<--| 10.x.x.44 Up 5452449782323250074504667089218893518 | ^ 10.x.x.43 Up 8114257989534302620064490155463988554 v | 10.x.x.46 Up 21422567192334579300859480282267974118 | ^ 10.x.x.60 Up 54861697885175209049354960363878287097 v | 10.x.x.69 Up 154328840302872203985032035664154382201 | ^ 10.x.x.62 Up 156995951391754654763374624484548356765 v | 10.x.x.61 Up 158321671343439891722659169797597266747 | ^ 10.x.x.47 Up 159657096314745030420102789742477598562 |-->| DC2 10.x.x.45 Up 162065212751151145161126595807335373 |<--| 10.x.x.44 Up 5452449782323250074504667089218893518 | ^ 10.x.x.43 Up 8114257989534302620064490155463988554 v | 10.x.x.46 Up 21422567192334579300859480282267974118 | ^ 10.x.x.60 Up 54861697885175209049354960363878287097 v | 10.x.x.71 Up 149032703168324939856639604911542585192 | ^ 10.x.x.69 Up 154328840302872203985032035664154382201 v | 10.x.x.62 Up 156995951391754654763374624484548356765 | ^ 10.x.x.61 Up 158321671343439891722659169797597266747 v | 10.x.x.47 Up 159657096314745030420102789742477598562 |-->|