Hello,

A client of mime have problems when adding a node in the cluster.
After 4 days, the node is still in joining mode, it doesn't have the same
level of load than the other and there seems to be no streaming from and to
the new node.

This node has a history.

   1. At the begin, it was in a seed in the cluster.
   2. Ops detected that client had problems with it.
   3. They tried to reset it but failed. In their process they launched
   several repair and rebuild process on the node.
   4. Then they asked me to help them.
   5. We stopped the node,
   6. removed it from the list of seeds (more precisely it was replaced by
   another node),
   7. removed it from the cluster (I choose not to use decommission since
   node data was compromised)
   8. deleted all files from data, commitlog and savedcache directories.
   9. after the leaving process ended, it was started as a fresh new node
   and began autobootstrap.


As I don’t have direct access to the cluster I don't have a lot of
information, but I will have tomorrow (logs and results of some commands).
And I can ask for people any required information.

Does someone have any idea of what could have happened and what I should
investigate first ?
What would you do to unlock the situation ?

Context: The cluster consists of two DC, each with 15 nodes. Average load
is around 3 TB per node. The joining node froze a little after 2 TB.

Thank you for your help.
Cheers,


-- 
Jérôme Mainaud
jer...@mainaud.com

Reply via email to