I had to face this too, but precisely the "unsafeAssassinateEndpoint" removed the "UNREACHABLE" nodes (from describe cluster - CLI). After that, I had these ghost host marked as "STATUS:LEFT" on gossipinfo (nodetool) and my truncate could run properly. But this is only my own experience, and you might want listen to Brian, who has probably more experience than I do, and restart your cluster. I guess it also depends on your need of using truncate and whether you can afford a down time or not.
But I really think that, at this point, you can run a truncate. Alain 2013/5/22 Brian Tarbox <tar...@cabotresearch.com> > Have to disagree with the "does no harm" comment just a tiny bit. I had a > similar situation recently and coincidentally needed to do a CF truncate. > The system rejected the request saying that not all nodes were up. > Nodetool ring said everyone was up but nodetool gossipinfo said there were > vestiges of dead nodes still hanging around. I ended up restarting the > entire cluster which cleared the issue. > > Brian > > > On Wed, May 22, 2013 at 6:46 AM, Vasileios Vlachos < > vasileiosvlac...@gmail.com> wrote: > >> Hello, >> >> Thanks for your fast response. That makes sense. I'll just keep an eye on >> it then. >> >> Many thanks, >> >> Vasilis >> >> >> On Wed, May 22, 2013 at 10:54 AM, Alain RODRIGUEZ <arodr...@gmail.com>wrote: >> >>> Hi. >>> >>> I think that the "unsafeAssassinateEndpoint" was the good solution here. >>> I was going to lead you to this solution after reading the first part of >>> your message. >>> >>> "Does anyone know why the dead nodes still appear when we run "nodetool >>> gossipinfo" but they don't when we run "describe cluster" from the CLI?" >>> >>> That's a good thing. Gossiper just keep this information for a while (7 >>> or 10 days by default off the top off my head), but this doesn't harm your >>> cluster in any ways, but having "UNREACHABLE" nodes could have been >>> annoying. By the way gossipinfo shows you those nodes as "STATUS:LEFT" >>> which is good. I am quite sure that this status changed when you used the >>> jmx "unsafeAssassinateEndpoint". >>> >>> "do a full cluster restart (I presume that means a rolling restart - not >>> shut-down the entire cluster right???). " >>> >>> A full restart => entire cluster down => down time. It is precisely >>> *not* a rolling restart. >>> >>> To conclude I would say that your cluster seems healthy now (from what I >>> can see), you have no more ghost nodes and nothing to do. Just wait a week >>> or so and look for gossipinfo again. >>> >>> >>> 2013/5/22 Vasileios Vlachos <vasileiosvlac...@gmail.com> >>> >>>> Hello All, >>>> >>>> A while ago we had 3 cassandra nodes on Amazon. At some point we >>>> decided to buy some servers and deploy cassandra there. The problem is that >>>> since then we have a list of dead IPs listed as UNREACHABLE nodes when we >>>> run describe cluster on cassandra-cli. >>>> >>>> I have seen other posts which describe similar issues, and the bottom >>>> line is "it's harmless but if you want to get rid of it do a full cluster >>>> restart" (I presume that means a rolling restart - not shut-down the entire >>>> cluster right???). Anyway... >>>> >>>> We also came across another solution: Install "libmx4j-java", uncomment >>>> the respective line on "/etc/default/cassandra", restart the node, go to " >>>> http://cassandra_node:8081/mbean?objectname=org.apache.cassandra.net%3Atype%3DGossiper", >>>> type in the dead IP/IPs next to the "unsafeAssassinateEndpoint" and invoke >>>> it. So we did that on one of the nodes for the list of dead IPs. After >>>> running "describe cluster" on the CLI on every node, we noticed that there >>>> were no UNREACHABLE nodes and everything looked OK. >>>> >>>> However, when we run "nodetool gossipinfo" we get the following output: >>>> >>>> /10.1.32.97 >>>> RELEASE_VERSION:1.0.11 >>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >>>> LOAD:2.76851457173E11 >>>> RPC_ADDRESS:0.0.0.0 >>>> STATUS:NORMAL,56713727820156410577229101238628035243 >>>> /10.128.16.111 >>>> REMOVAL_COORDINATOR:REMOVER,113427455640312821154458202477256070486 >>>> STATUS:LEFT,42537039300520238181471502256297362072,1369471488145 >>>> /10.128.16.110 >>>> REMOVAL_COORDINATOR:REMOVER,1 >>>> STATUS:LEFT,42537092606577173116506557155915918934,1369471275829 >>>> /10.1.32.100 >>>> RELEASE_VERSION:1.0.11 >>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >>>> LOAD:2.75649392881E11 >>>> RPC_ADDRESS:0.0.0.0 >>>> STATUS:NORMAL,85070591730234615865843651857942052863 >>>> /10.1.32.101 >>>> RELEASE_VERSION:1.0.11 >>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >>>> LOAD:2.71158702006E11 >>>> RPC_ADDRESS:0.0.0.0 >>>> STATUS:NORMAL,141784319550391026443072753096570088105 >>>> /10.1.32.98 >>>> RELEASE_VERSION:1.0.11 >>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >>>> LOAD:2.73163150773E11 >>>> RPC_ADDRESS:0.0.0.0 >>>> STATUS:NORMAL,113427455640312821154458202477256070486 >>>> /10.128.16.112 >>>> REMOVAL_COORDINATOR:REMOVER,1 >>>> STATUS:LEFT,42537092606577173116506557155915918934,1369471567719 >>>> /10.1.32.99 >>>> RELEASE_VERSION:1.0.11 >>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >>>> LOAD:2.72271268395E11 >>>> RPC_ADDRESS:0.0.0.0 >>>> STATUS:NORMAL,28356863910078205288614550619314017621 >>>> /10.1.32.96 >>>> RELEASE_VERSION:1.0.11 >>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >>>> LOAD:2.71494331357E11 >>>> RPC_ADDRESS:0.0.0.0 >>>> STATUS:NORMAL,0 >>>> >>>> Does anyone know why the dead nodes still appear when we run "nodetool >>>> gossipinfo" but they don't when we run "describe cluster" from the CLI? >>>> >>>> Thank you in advance for your help, >>>> >>>> Vasilis >>>> >>> >>> >> >