Thanks for sharing, here is some more information… > 1 - At first, one of my node came down 5 min and when it came back it get > flooded by Hinted Handoff so hard that it could not handle the real time > queries properly. I haven't find a way to prioritize app queries rather than > Hinted Handoff. You can disable hint delivery with nodetool pausehandoff or reduce the hint throughput https://github.com/apache/cassandra/blob/cassandra-1.2/conf/cassandra.yaml#L50 > 2 - Nodes keep hints for a node that has been removed. The hints are stored with a TTL that is the gc_grace_seconds for the CF a the time the hint is written, so they will eventually be purged by compaction.
You can also delete the hints using the Hinted Handoff bean https://github.com/apache/cassandra/blob/cassandra-1.2/src/java/org/apache/cassandra/db/HintedHandOffManagerMBean.java#L30 > 3 - Nodes with 500MB to 3GB hints stored for a removed node can't be > decommissioned, they stuck after streaming their data. The hint KS is defined using the LocalStrategy and so it not replicated. They should not be involved in streaming. Cheers ----------------- Aaron Morton Freelance Cassandra Consultant New Zealand @aaronmorton http://www.thelastpickle.com On 10/07/2013, at 12:47 AM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > Hi, > > C*1.2.2. > > I have removed 4 nodes with "nodetool decommission". 2 of them have left with > no issue, while the 2 others nodes remained "leaving" even after streaming > their data. > > The only specific thing of these 2 nodes is that they had a lot of hints > pending. Hints from a node that couldn't come back and that I removed earlier > (because of the heavy load induced by Hinted Handoff while coming back, which > induced a lot of latencies in our app. This node didn't manage to come back > after 10 minutes, I removed it). > > So there I faced 3 bugs (or problems) : > > 1 - At first, one of my node came down 5 min and when it came back it get > flooded by Hinted Handoff so hard that it could not handle the real time > queries properly. I haven't find a way to prioritize app queries rather than > Hinted Handoff. > 2 - Nodes keep hints for a node that has been removed. > 3 - Nodes with 500MB to 3GB hints stored for a removed node can't be > decommissioned, they stuck after streaming their data. > > > As solutions for this 3 issues I did the following: > > Solution to 1 - I removed this down node (nodetool removenode) > Solution to 2 - Stop the node remove system hints > Solution to 3 - Stop the node and removenode instead of decommission > > Now I have no more issue, yet I felt I had to report this. Maybe my > experience can help users to get out of tricky situations and commiters to > detect some issues, specially about hinted handoff. > > Alain > >