Cool. If you get it again grab nodetool gossipinfo from a few machines.
Cheers ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 19/10/2012, at 3:32 AM, Rene Kochen <rene.koc...@emea.schange.com> wrote: > Thanks Aaron, > > Telnet works (in both directions). > > After a normal (i.e. without discarding ring state) restart of the node > reporting the other one as down, the ring shows "up" again. So a node > restarts fixes the incorrect state. > > I see this error occasionally. > > I will further investigate and post more details when it happens again. > > 2012/10/18 aaron morton <aa...@thelastpickle.com> > You can double check the node reporting 9.109 as down can telnet to port 7000 > on 9.109. > > Then I would restart 9.109 with -Dcassandra.load_ring_state=false added as a > JVM param in cassandra-env.sh. > > If is still shows as down can you post the output from nodetool gossipinfo > from 9.109 and the node that sees 9.109 as down. > > Cheers > > > ----------------- > Aaron Morton > Freelance Developer > @aaronmorton > http://www.thelastpickle.com > > On 18/10/2012, at 8:45 PM, Rene Kochen <rene.koc...@schange.com> wrote: > >> I have a four node EC2 cluster. >> >> Three machines show via nodetool ring that all machines are UP. >> One machine shows via nodetool ring that one machine is DOWN. >> >> If I take a closer to the machine reporting the other machine as down, I see >> the following: >> >> - StorageService.UnreachableNodes = 10.49.9.109 >> - FailureDetector.SimpleStates: 10.49.9.109 = UP >> >> So gossip is fine. Actually the whole 10.49.9.109 machine is fine. I see in >> the logging that there is communication between 10.49.9.109 and the machine >> reporting it as down. >> >> How or when is a node removed from the UnreachableNodes list and reported as >> UP again via nodetool ring? >> >> I use Cassandra 1.0.11 >> >> Thanks! >> >> Rene >> > >