Did not help finally.

So I enabled  logging at debug level.
The log files tell me that the node being added is communicating with the other 
nodes (that are seed nodes). Still nothing seems to be returning to that node.
The log files on the other nodes are detecting the shadow request, but no other 
information like being unable to send something back.
And as before, just restarting that node once more does the trick and bootstrap 
is proceeding.

Maybe the problem has something to do about the gossip state?
My test case is : Decommission node 10.164.8.93, restart a clean node 
10.164.8.93 and let it bootstrap.

In my test case, I see that 7 minutes before the node is added again to the 
ring the other nodes are detecting the decommission of the node.

2014-02-26 10:23:21.443 60000 elapsed, /10.164.8.93 gossip quarantine over
2014-02-26 10:23:21.444 Ignoring state change for dead or unknown endpoint: 
/10.164.8.93
2014-02-26 10:23:55.636 Forcing conviction of /10.164.8.93
2014-02-26 10:24:00.230 Reseting version for /10.164.8.93
2014-02-26 10:24:00.230 Reseting version for /10.164.8.93


At the time the node 10.164.8.93 is added the log shows :
--------------------------------------------------------------------

2014-02-26 10:30:03.015 Cassandra version: 2.0.5-SNAPSHOT
2014-02-26 10:30:03.016 Thrift API version: 19.39.0
2014-02-26 10:30:03.018 CQL supported versions: 2.0.0,3.1.4 (default: 3.1.4)
2014-02-26 10:30:03.029 Loading persisted ring state
2014-02-26 10:30:03.034 Starting shadow gossip round to check for endpoint 
collision
2014-02-26 10:30:03.034 Starting Messaging Service on port 9804
2014-02-26 10:30:03.046 attempting to connect to /10.164.8.249
2014-02-26 10:30:03.047 attempting to connect to /10.164.8.250
2014-02-26 10:30:03.048 attempting to connect to /10.164.8.92
2014-02-26 10:30:03.051 Handshaking version with /10.164.8.250
2014-02-26 10:30:03.052 Handshaking version with /10.164.8.249
2014-02-26 10:30:03.052 Handshaking version with /10.164.8.92
2014-02-26 10:30:34.059 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
                at 
org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1173) 
~[apache-cassandra-2.0.5-SNAPSHOT.jar:2.0.5-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:424)
 ~[apache-cassandra-2.0.5-SNAPSHOT.jar:2.0.5-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:615)
 ~[apache-cassandra-2.0.5-SNAPSHOT.jar:2.0.5-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:583) 
~[apache-cassandra-2.0.5-SNAPSHOT.jar:2.0.5-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:482) 
~[apache-cassandra-2.0.5-SNAPSHOT.jar:2.0.5-SNAPSHOT]
                at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:348) 
~[apache-cassandra-2.0.5-SNAPSHOT.jar:2.0.5-SNAPSHOT]
                at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:465) 
~[apache-cassandra-2.0.5-SNAPSHOT.jar:2.0.5-SNAPSHOT]
                at 
be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.startUpCassandra(CassandraStore.java:495)
 [landc-services-search-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT-92200]
                at 
be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.run(CassandraStore.java:461)
 [landc-services-search-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT-92200]
2014-02-26 10:30:34.069 Exception in thread 
Thread[StorageServiceShutdownHook,5,main]
java.lang.NullPointerException: null
                at org.apache.cassandra.gms.Gossiper.stop(Gossiper.java:1250) 
~[apache-cassandra-2.0.5-SNAPSHOT.jar:2.0.5-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:550)
 ~[apache-cassandra-2.0.5-SNAPSHOT.jar:2.0.5-SNAPSHOT]
                at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-2.0.5-SNAPSHOT.jar:2.0.5-SNAPSHOT]
                at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_40]
2014-02-26 10:31:34.081 ShutDownHook requests shutdown on 
be.landc.framework.service.ipl.Boot@730e8516


And at that time 3 other  nodes print log information :
-----------------------------------------------------------------

2014-02-26 10:30:03.051 Connection version 7 from /10.164.8.93
2014-02-26 10:30:03.065 Upgrading incoming connection to be compressed
2014-02-26 10:30:03.130 Max version for /10.164.8.93 is 7
2014-02-26 10:30:03.130 Setting version 7 for /10.164.8.93
2014-02-26 10:30:03.131 set version for /10.164.8.93 to 7
2014-02-26 10:30:03.131 Shadow request received, adding all states


Any more information I can pass?
Regards,
Ignace

From: Desimpel, Ignace [mailto:ignace.desim...@nuance.com]
Sent: maandag 24 februari 2014 11:43
To: user@cassandra.apache.org
Subject: FW: Sporadic gossip exception on add node

Had a look at the code, and this might be a race-condition like problem at the 
function StorageService::checkForEndpointCollision and 
StorageService::prepareReplacementInfo

To do a Gossiper.instance.doShadowRound(), the 
MessagingService.instance().listen(FBUtilities.getLocalAddress()) must be FULLY 
(accepting connections) running.
However , the listen function is starting SocketThread threads, but is not 
waiting for these to be started. So I think, at least in theory,  that the 
doShadowRound function will be sending messages, thus excepting answers, but 
there is no guarantee that the listeners are actually up and running.

As a test I modified the MessagingService::listen code by
SocketThread th = new SocketThread(ss, "ACCEPT-" + localEp);
   synchronized( th ) {
     th.start();
     try { th.wait(); } catch(Throwable tt){}
   }

And the SocketThread::run function
public void run()
   {
     synchronized( this ) {
       this.notifyAll();
     }

That way there is little chance the socket thread is not running yet (should be 
blocked in the server.accept call() ).


Regards,
Ignace Desimpel

From: Desimpel, Ignace [mailto:ignace.desim...@nuance.com]
Sent: donderdag 6 februari 2014 12:15
To: user@cassandra.apache.org<mailto:user@cassandra.apache.org>
Subject: Sporadic gossip exception on add node

Environment : linux, cassandra 2.0.4, 3 node, embedded, byte ordered, LCS

When I add a node to the existing 3 node cluster I sometimes get the exception 
'Unable to gossip with any seeds ' listed below. If I just restart it without 
any change then mostly it works. Must be some timing issue.

The Cassandra at that time is configured using the Cassandra.yaml file
with the auto_bootstrap set true
and the initial_token set to something like : 00f35256, 041e692a, 0562d8b2, 
0930274a, 0b16ce96, 0c5b3e1e, 10cac47a, 12b16bc6, 13f5db4e, 186561aa, 1907996e, 
1c32b042, 1e19578e ......

The two seeds configured in this yaml are 10.164.8.250 and 10.164.8.249 and 
these are up and running.
The new node to add has ip 10.164.8.93

At the time of the exception, I do not get the gossip message 'Handshaking 
version with /10.164.8.93' on the seeds.
If the exception does not occurs, then I do get that gossip message 
'Handshaking version with /10.164.8.93' on the seed

2014-01-31 13:40:36.380 Loading persisted ring state
2014-01-31 13:40:36.386 Starting Messaging Service on port 9804
2014-01-31 13:40:36.408 Handshaking version with /10.164.8.250
2014-01-31 13:40:36.408 Handshaking version with /10.164.8.249
2014-01-31 13:41:07.415 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
                at 
org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1160) 
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:426)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:618)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:586) 
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:485) 
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:346) 
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461) 
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at 
be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.startUpCassandra(CassandraStore.java:469)
 [landc-services-search-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT-87937]
                at 
be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.run(CassandraStore.java:460)
 [landc-services-search-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT-87937]
java.lang.RuntimeException: Unable to gossip with any seeds
                at 
org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1160)
                at 
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:426)
                at 
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:618)
                at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:586)
                at 
org.apache.cassandra.service.StorageService.initServer(StorageService.java:485)
                at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:346)
                at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
                at 
be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.startUpCassandra(CassandraStore.java:469)
                at 
be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.run(CassandraStore.java:460)
Exception encountered during startup: Unable to gossip with any seeds
2014-01-31 13:41:07.419 Exception in thread 
Thread[StorageServiceShutdownHook,5,main]
java.lang.NullPointerException: null
                at 
org.apache.cassandra.service.StorageService.stopNativeTransport(StorageService.java:349)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:364)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService.access$3(StorageService.java:361) 
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at 
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:551)
 ~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28) 
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
                at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_40]
2014-01-31 13:41:07.420 ShutDownHook requests shutdown on 
be.landc.services.cdi.server.cassandra.CDIServer@7c32d1a3<mailto:be.landc.services.cdi.server.cassandra.CDIServer@7c32d1a3>
2014-01-31 13:41:07.421 Shutdown server request

Reply via email to