Hi,

C*1.2.2.

I have removed 4 nodes with "nodetool decommission". 2 of them have left
with no issue, while the 2 others nodes remained "leaving" even after
streaming their data.

The only specific thing of these 2 nodes is that they had a lot of hints
pending. Hints from a node that couldn't come back and that I removed
earlier (because of the heavy load induced by Hinted Handoff while coming
back, which induced a lot of latencies in our app. This node didn't manage
to come back after 10 minutes, I removed it).

So there I faced 3 bugs (or problems) :

1 - At first, one of my node came down 5 min and when it came back it get
flooded by Hinted Handoff so hard that it could not handle the real time
queries properly. I haven't find a way to prioritize app queries rather
than Hinted Handoff.
2 - Nodes keep hints for a node that has been removed.
3 - Nodes with 500MB to 3GB hints stored for a removed node can't be
decommissioned, they stuck after streaming their data.


As solutions for this 3 issues I did the following:

Solution to 1 - I removed this down node (nodetool removenode)
Solution to 2 - Stop the node remove system hints
Solution to 3 - Stop the node and removenode instead of decommission

Now I have no more issue, yet I felt I had to report this. Maybe my
experience can help users to get out of tricky situations and commiters to
detect some issues,  specially about hinted handoff.

Alain

Reply via email to