My blind guess is: https://issues.apache.org/jira/browse/CASSANDRA-5179
In our case the only sensible solution was to pause hints delivery and
disable storing them (both done with a nodetool: pausehandoff and
disablehandoff). Once they TTL'd (3 hours by default I believe?) I
turned HH on again and started to repair. However, problem has returned
on the next day, so I had to do a quick C* upgrade with the version
having this patch applied (we use a "self-built" 1.2.1 with a few
additional patches applied).
M.
W dniu 04.07.2013 18:41, Alain RODRIGUEZ pisze:
The point is that there is no way, afaik, to limit the speed of these
Hinted Handoff since it's not a stream like repair or bootstrap, no way
either to keep the node out of the ring during the time it is receiving
hints since hints and "normal" traffic both go through gossip protocol on
port 7000.
How to avoid this Hinted Handoff flood on returning nodes ?
Alain
2013/7/4 Alain RODRIGUEZ <arodr...@gmail.com>
Hi,
Using C*1.2.2 12 EC2 xLarge cluster.
When I restart a node, if it spend a few minutes down, when I bring it up,
all the cpu are blocked at 100%, even once compactions are disabled,
inducing a very big and intolerable latency in my app. I suspect Hinted
Handoff to be the cause of this. disabling gossip fix the problem, enabling
it again brings the latency back (with a lot of gc, dropped messages...).
Is there a way to disable HH ? Are they responsible for this issue ?
I currently have this node down, any fast insight would be appreciated.
Alain