I had to face this too, but precisely the "unsafeAssassinateEndpoint"
removed the "UNREACHABLE" nodes (from describe cluster - CLI). After that,
I had these ghost host marked as "STATUS:LEFT" on gossipinfo (nodetool) and
my truncate could run properly. But this is only my own experience, and you
might want listen to Brian, who has probably more experience than I do, and
restart your cluster. I guess it also depends on your need of using
truncate and whether you can afford a down time or not.

But I really think that, at this point, you can run a truncate.


2013/5/22 Brian Tarbox <tar...@cabotresearch.com>

> Have to disagree with the "does no harm" comment just a tiny bit.  I had a
> similar situation recently and coincidentally needed to do a CF truncate.
>  The system rejected the request saying that not all nodes were up.
>  Nodetool ring said everyone was up but nodetool gossipinfo said there were
> vestiges of dead nodes still hanging around.  I ended up restarting the
> entire cluster which cleared the issue.
> Brian
> On Wed, May 22, 2013 at 6:46 AM, Vasileios Vlachos <
> vasileiosvlac...@gmail.com> wrote:
>> Hello,
>> Thanks for your fast response. That makes sense. I'll just keep an eye on
>> it then.
>> Many thanks,
>> Vasilis
>> On Wed, May 22, 2013 at 10:54 AM, Alain RODRIGUEZ <arodr...@gmail.com>wrote:
>>> Hi.
>>> I think that the "unsafeAssassinateEndpoint" was the good solution here.
>>> I was going to lead you to this solution after reading the first part of
>>> your message.
>>> "Does anyone know why the dead nodes still appear when we run "nodetool
>>> gossipinfo" but they don't when we run "describe cluster" from the CLI?"
>>>  That's a good thing. Gossiper just keep this information for a while (7
>>> or 10 days by default off the top off my head), but this doesn't harm your
>>> cluster in any ways, but having "UNREACHABLE" nodes could have been
>>> annoying. By the way gossipinfo shows you those nodes as "STATUS:LEFT"
>>> which is good. I am quite sure that this status changed when you used the
>>> jmx "unsafeAssassinateEndpoint".
>>> "do a full cluster restart (I presume that means a rolling restart - not
>>> shut-down the entire cluster right???). "
>>> A full restart => entire cluster down => down time. It is precisely
>>> *not* a rolling restart.
>>> To conclude I would say that your cluster seems healthy now (from what I
>>> can see), you have no more ghost nodes and nothing to do. Just wait a week
>>> or so and look for gossipinfo again.
>>> 2013/5/22 Vasileios Vlachos <vasileiosvlac...@gmail.com>
>>>> Hello All,
>>>> A while ago we had 3 cassandra nodes on Amazon. At some point we
>>>> decided to buy some servers and deploy cassandra there. The problem is that
>>>> since then we have a list of dead IPs listed as UNREACHABLE nodes when we
>>>> run describe cluster on cassandra-cli.
>>>> I have seen other posts which describe similar issues, and the bottom
>>>> line is "it's harmless but if you want to get rid of it do a full cluster
>>>> restart" (I presume that means a rolling restart - not shut-down the entire
>>>> cluster right???). Anyway...
>>>> We also came across another solution: Install "libmx4j-java", uncomment
>>>> the respective line on "/etc/default/cassandra", restart the node, go to "
>>>> http://cassandra_node:8081/mbean?objectname=org.apache.cassandra.net%3Atype%3DGossiper";,
>>>> type in the dead IP/IPs next to the "unsafeAssassinateEndpoint" and invoke
>>>> it. So we did that on one of the nodes for the list of dead IPs. After
>>>> running "describe cluster" on the CLI on every node, we noticed that there
>>>> were no UNREACHABLE nodes and everything looked OK.
>>>> However, when we run "nodetool gossipinfo" we get the following output:
>>>> /
>>>>  RELEASE_VERSION:1.0.11
>>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>>>> LOAD:2.76851457173E11
>>>> STATUS:NORMAL,56713727820156410577229101238628035243
>>>> /
>>>> REMOVAL_COORDINATOR:REMOVER,113427455640312821154458202477256070486
>>>> STATUS:LEFT,42537039300520238181471502256297362072,1369471488145
>>>> /
>>>> STATUS:LEFT,42537092606577173116506557155915918934,1369471275829
>>>> /
>>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>>>> LOAD:2.75649392881E11
>>>> STATUS:NORMAL,85070591730234615865843651857942052863
>>>> /
>>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>>>> LOAD:2.71158702006E11
>>>> STATUS:NORMAL,141784319550391026443072753096570088105
>>>> /
>>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>>>> LOAD:2.73163150773E11
>>>> STATUS:NORMAL,113427455640312821154458202477256070486
>>>> /
>>>> STATUS:LEFT,42537092606577173116506557155915918934,1369471567719
>>>> /
>>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>>>> LOAD:2.72271268395E11
>>>> STATUS:NORMAL,28356863910078205288614550619314017621
>>>> /
>>>> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff
>>>> LOAD:2.71494331357E11
>>>> Does anyone know why the dead nodes still appear when we run "nodetool
>>>> gossipinfo" but they don't when we run "describe cluster" from the CLI?
>>>> Thank you in advance for your help,
>>>> Vasilis

Reply via email to