Hello, A client of mime have problems when adding a node in the cluster. After 4 days, the node is still in joining mode, it doesn't have the same level of load than the other and there seems to be no streaming from and to the new node.
This node has a history. 1. At the begin, it was in a seed in the cluster. 2. Ops detected that client had problems with it. 3. They tried to reset it but failed. In their process they launched several repair and rebuild process on the node. 4. Then they asked me to help them. 5. We stopped the node, 6. removed it from the list of seeds (more precisely it was replaced by another node), 7. removed it from the cluster (I choose not to use decommission since node data was compromised) 8. deleted all files from data, commitlog and savedcache directories. 9. after the leaving process ended, it was started as a fresh new node and began autobootstrap. As I don’t have direct access to the cluster I don't have a lot of information, but I will have tomorrow (logs and results of some commands). And I can ask for people any required information. Does someone have any idea of what could have happened and what I should investigate first ? What would you do to unlock the situation ? Context: The cluster consists of two DC, each with 15 nodes. Average load is around 3 TB per node. The joining node froze a little after 2 TB. Thank you for your help. Cheers, -- Jérôme Mainaud jer...@mainaud.com