Had a look at the code, and this might be a race-condition like problem at the
function StorageService::checkForEndpointCollision and
StorageService::prepareReplacementInfo
To do a Gossiper.instance.doShadowRound(), the
MessagingService.instance().listen(FBUtilities.getLocalAddress()) must be FULLY
(accepting connections) running.
However , the listen function is starting SocketThread threads, but is not
waiting for these to be started. So I think, at least in theory, that the
doShadowRound function will be sending messages, thus excepting answers, but
there is no guarantee that the listeners are actually up and running.
As a test I modified the MessagingService::listen code by
SocketThread th = new SocketThread(ss, "ACCEPT-" + localEp);
synchronized( th ) {
th.start();
try { th.wait(); } catch(Throwable tt){}
}
And the SocketThread::run function
public void run()
{
synchronized( this ) {
this.notifyAll();
}
That way there is little chance the socket thread is not running yet (should be
blocked in the server.accept call() ).
Regards,
Ignace Desimpel
From: Desimpel, Ignace [mailto:[email protected]]
Sent: donderdag 6 februari 2014 12:15
To: [email protected]
Subject: Sporadic gossip exception on add node
Environment : linux, cassandra 2.0.4, 3 node, embedded, byte ordered, LCS
When I add a node to the existing 3 node cluster I sometimes get the exception
'Unable to gossip with any seeds ' listed below. If I just restart it without
any change then mostly it works. Must be some timing issue.
The Cassandra at that time is configured using the Cassandra.yaml file
with the auto_bootstrap set true
and the initial_token set to something like : 00f35256, 041e692a, 0562d8b2,
0930274a, 0b16ce96, 0c5b3e1e, 10cac47a, 12b16bc6, 13f5db4e, 186561aa, 1907996e,
1c32b042, 1e19578e ......
The two seeds configured in this yaml are 10.164.8.250 and 10.164.8.249 and
these are up and running.
The new node to add has ip 10.164.8.93
At the time of the exception, I do not get the gossip message 'Handshaking
version with /10.164.8.93' on the seeds.
If the exception does not occurs, then I do get that gossip message
'Handshaking version with /10.164.8.93' on the seed
2014-01-31 13:40:36.380 Loading persisted ring state
2014-01-31 13:40:36.386 Starting Messaging Service on port 9804
2014-01-31 13:40:36.408 Handshaking version with /10.164.8.250
2014-01-31 13:40:36.408 Handshaking version with /10.164.8.249
2014-01-31 13:41:07.415 Exception encountered during startup
java.lang.RuntimeException: Unable to gossip with any seeds
at
org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1160)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:426)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:618)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:586)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:485)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:346)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at
be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.startUpCassandra(CassandraStore.java:469)
[landc-services-search-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT-87937]
at
be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.run(CassandraStore.java:460)
[landc-services-search-1.2.0-SNAPSHOT.jar:1.2.0-SNAPSHOT-87937]
java.lang.RuntimeException: Unable to gossip with any seeds
at
org.apache.cassandra.gms.Gossiper.doShadowRound(Gossiper.java:1160)
at
org.apache.cassandra.service.StorageService.checkForEndpointCollision(StorageService.java:426)
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:618)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:586)
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:485)
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:346)
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:461)
at
be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.startUpCassandra(CassandraStore.java:469)
at
be.landc.services.search.server.db.baseserver.indexsearch.store.cassandra.CassandraStore$CassThread.run(CassandraStore.java:460)
Exception encountered during startup: Unable to gossip with any seeds
2014-01-31 13:41:07.419 Exception in thread
Thread[StorageServiceShutdownHook,5,main]
java.lang.NullPointerException: null
at
org.apache.cassandra.service.StorageService.stopNativeTransport(StorageService.java:349)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at
org.apache.cassandra.service.StorageService.shutdownClientServers(StorageService.java:364)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at
org.apache.cassandra.service.StorageService.access$3(StorageService.java:361)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at
org.apache.cassandra.service.StorageService$1.runMayThrow(StorageService.java:551)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
~[apache-cassandra-2.0.4-SNAPSHOT.jar:2.0.4-SNAPSHOT]
at java.lang.Thread.run(Thread.java:724) ~[na:1.7.0_40]
2014-01-31 13:41:07.420 ShutDownHook requests shutdown on
be.landc.services.cdi.server.cassandra.CDIServer@7c32d1a3<mailto:be.landc.services.cdi.server.cassandra.CDIServer@7c32d1a3>
2014-01-31 13:41:07.421 Shutdown server request