Thanks for sharing, here is some more information…

> 1 - At first, one of my node came down 5 min and when it came back it get 
> flooded by Hinted Handoff so hard that it could not handle the real time 
> queries properly. I haven't find a way to prioritize app queries rather than 
> Hinted Handoff.
You can disable hint delivery with nodetool pausehandoff or reduce the hint 
throughput 
https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L50
 
> 2 - Nodes keep hints for a node that has been removed.
The hints are stored with a TTL that is the gc_grace_seconds for the CF a the 
time the hint is written, so they will eventually be purged by compaction. 

You can also delete the hints using the Hinted Handoff bean 
https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/db/HintedHandOffManagerMBean.java#L30

> 3 - Nodes with 500MB to 3GB hints stored for a removed node can't be 
> decommissioned, they stuck after streaming their data.
The hint KS is defined using the LocalStrategy and so it not replicated. They 
should not be involved in streaming. 

Cheers

-----------------
Aaron Morton
Freelance Cassandra Consultant
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 10/07/2013, at 12:47 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> Hi,
> 
> C*1.2.2.
> 
> I have removed 4 nodes with "nodetool decommission". 2 of them have left with 
> no issue, while the 2 others nodes remained "leaving" even after streaming 
> their data.
> 
> The only specific thing of these 2 nodes is that they had a lot of hints 
> pending. Hints from a node that couldn't come back and that I removed earlier 
> (because of the heavy load induced by Hinted Handoff while coming back, which 
> induced a lot of latencies in our app. This node didn't manage to come back 
> after 10 minutes, I removed it).
> 
> So there I faced 3 bugs (or problems) :
> 
> 1 - At first, one of my node came down 5 min and when it came back it get 
> flooded by Hinted Handoff so hard that it could not handle the real time 
> queries properly. I haven't find a way to prioritize app queries rather than 
> Hinted Handoff.
> 2 - Nodes keep hints for a node that has been removed.
> 3 - Nodes with 500MB to 3GB hints stored for a removed node can't be 
> decommissioned, they stuck after streaming their data.
> 
> 
> As solutions for this 3 issues I did the following:
> 
> Solution to 1 - I removed this down node (nodetool removenode)
> Solution to 2 - Stop the node remove system hints
> Solution to 3 - Stop the node and removenode instead of decommission
> 
> Now I have no more issue, yet I felt I had to report this. Maybe my 
> experience can help users to get out of tricky situations and commiters to 
> detect some issues,  specially about hinted handoff.
> 
> Alain
> 
> 

Reply via email to