Hello, Thanks for your fast response. That makes sense. I'll just keep an eye on it then.
Many thanks, Vasilis On Wed, May 22, 2013 at 10:54 AM, Alain RODRIGUEZ <arodr...@gmail.com>wrote: > Hi. > > I think that the "unsafeAssassinateEndpoint" was the good solution here. I > was going to lead you to this solution after reading the first part of your > message. > > "Does anyone know why the dead nodes still appear when we run "nodetool > gossipinfo" but they don't when we run "describe cluster" from the CLI?" > > That's a good thing. Gossiper just keep this information for a while (7 or > 10 days by default off the top off my head), but this doesn't harm your > cluster in any ways, but having "UNREACHABLE" nodes could have been > annoying. By the way gossipinfo shows you those nodes as "STATUS:LEFT" > which is good. I am quite sure that this status changed when you used the > jmx "unsafeAssassinateEndpoint". > > "do a full cluster restart (I presume that means a rolling restart - not > shut-down the entire cluster right???). " > > A full restart => entire cluster down => down time. It is precisely *not* > a rolling restart. > > To conclude I would say that your cluster seems healthy now (from what I > can see), you have no more ghost nodes and nothing to do. Just wait a week > or so and look for gossipinfo again. > > > 2013/5/22 Vasileios Vlachos <vasileiosvlac...@gmail.com> > >> Hello All, >> >> A while ago we had 3 cassandra nodes on Amazon. At some point we decided >> to buy some servers and deploy cassandra there. The problem is that since >> then we have a list of dead IPs listed as UNREACHABLE nodes when we run >> describe cluster on cassandra-cli. >> >> I have seen other posts which describe similar issues, and the bottom >> line is "it's harmless but if you want to get rid of it do a full cluster >> restart" (I presume that means a rolling restart - not shut-down the entire >> cluster right???). Anyway... >> >> We also came across another solution: Install "libmx4j-java", uncomment >> the respective line on "/etc/default/cassandra", restart the node, go to " >> http://cassandra_node:8081/mbean?objectname=org.apache.cassandra.net%3Atype%3DGossiper", >> type in the dead IP/IPs next to the "unsafeAssassinateEndpoint" and invoke >> it. So we did that on one of the nodes for the list of dead IPs. After >> running "describe cluster" on the CLI on every node, we noticed that there >> were no UNREACHABLE nodes and everything looked OK. >> >> However, when we run "nodetool gossipinfo" we get the following output: >> >> /10.1.32.97 >> RELEASE_VERSION:1.0.11 >> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >> LOAD:2.76851457173E11 >> RPC_ADDRESS:0.0.0.0 >> STATUS:NORMAL,56713727820156410577229101238628035243 >> /10.128.16.111 >> REMOVAL_COORDINATOR:REMOVER,113427455640312821154458202477256070486 >> STATUS:LEFT,42537039300520238181471502256297362072,1369471488145 >> /10.128.16.110 >> REMOVAL_COORDINATOR:REMOVER,1 >> STATUS:LEFT,42537092606577173116506557155915918934,1369471275829 >> /10.1.32.100 >> RELEASE_VERSION:1.0.11 >> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >> LOAD:2.75649392881E11 >> RPC_ADDRESS:0.0.0.0 >> STATUS:NORMAL,85070591730234615865843651857942052863 >> /10.1.32.101 >> RELEASE_VERSION:1.0.11 >> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >> LOAD:2.71158702006E11 >> RPC_ADDRESS:0.0.0.0 >> STATUS:NORMAL,141784319550391026443072753096570088105 >> /10.1.32.98 >> RELEASE_VERSION:1.0.11 >> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >> LOAD:2.73163150773E11 >> RPC_ADDRESS:0.0.0.0 >> STATUS:NORMAL,113427455640312821154458202477256070486 >> /10.128.16.112 >> REMOVAL_COORDINATOR:REMOVER,1 >> STATUS:LEFT,42537092606577173116506557155915918934,1369471567719 >> /10.1.32.99 >> RELEASE_VERSION:1.0.11 >> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >> LOAD:2.72271268395E11 >> RPC_ADDRESS:0.0.0.0 >> STATUS:NORMAL,28356863910078205288614550619314017621 >> /10.1.32.96 >> RELEASE_VERSION:1.0.11 >> SCHEMA:b1116df0-b3dd-11e2-0000-16fe4da5dbff >> LOAD:2.71494331357E11 >> RPC_ADDRESS:0.0.0.0 >> STATUS:NORMAL,0 >> >> Does anyone know why the dead nodes still appear when we run "nodetool >> gossipinfo" but they don't when we run "describe cluster" from the CLI? >> >> Thank you in advance for your help, >> >> Vasilis >> > >