Hi,
I just wanted to share our experience of upgrading 1.0.10 to 1.2.3.
It happened that first we upgraded both of our two seeds to 1.2.3.
And basically after that old nodes couldn't communicate with new ones
anymore. Cluster was down until we upgraded all nodes to 1.2.3. We
don't have many nodes and that process didn't took long. Yet it
caused outage for ~10 mins.
Here are some logs:
On the new, freshly upgraded seed node (v1.2.3):
----------------
ERROR [OptionalTasks:1] 2013-03-31 08:48:19,370 CassandraDaemon.java
(line 164) Exception in thread Thread[OptionalTasks:1,5,main]
java.lang.NullPointerException
at
org.apache.cassandra.service.MigrationManager$1.run(MigrationManager.java:137)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98)
at
java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
WARN [MutationStage:20] 2013-03-31 08:48:23,613 StorageProxy.java
(line 577) Unable to store hint for host with missing ID, /10.0.1.8
(old node?)
....
ERROR [MutationStage:33] 2013-03-31 09:00:02,899 CassandraDaemon.java
(line 164) Exception in thread Thread[MutationStage:33,5,main]
java.lang.AssertionError: Missing host ID for 10.0.1.8
at
org.apache.cassandra.service.StorageProxy.writeHintForMutation(StorageProxy.java:580)
at
org.apache.cassandra.service.StorageProxy$5.runMayThrow(StorageProxy.java:555)
at
org.apache.cassandra.service.StorageProxy$HintRunnable.run(StorageProxy.java:1643)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
----------------
At the same time, old nodes (v1.0.10) were blinded:
----------------
ERROR [RequestResponseStage:441] 2013-03-31 09:04:07,955
AbstractCassandraDaemon.java (line 139) Fatal exception in thread
Thread[RequestResponseStage:441,5,main]
java.io.IOError: java.io.EOFException
at
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:71)
at
org.apache.cassandra.service.ReadCallback.response(ReadCallback.java:132)
at
org.apache.cassandra.net.ResponseVerbHandler.doVerb(ResponseVerbHandler.java:45)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:100)
at
org.apache.cassandra.db.ReadResponseSerializer.deserialize(ReadResponse.java:81)
at
org.apache.cassandra.service.AbstractRowResolver.preprocess(AbstractRowResolver.java:64)
... 6 more
.....
INFO [GossipStage:3] 2013-03-31 09:06:08,885 Gossiper.java (line 804)
InetAddress /10.0.1.8 is now UP
ERROR [GossipStage:3] 2013-03-31 09:06:08,885
AbstractCassandraDaemon.java (line 139) Fatal exception in thread
Thread[GossipStage:3,5,main]
java.lang.UnsupportedOperationException: Not a time-based UUID
at java.util.UUID.timestamp(UUID.java:308)
at
org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121)
at
org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99)
at
org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83)
at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806)
at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:923)
at
org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
ERROR [GossipStage:3] 2013-03-31 09:06:08,886
AbstractCassandraDaemon.java (line 139) Fatal exception in thread
Thread[GossipStage:3,5,main]
java.lang.UnsupportedOperationException: Not a time-based UUID
at java.util.UUID.timestamp(UUID.java:308)
at
org.apache.cassandra.service.MigrationManager.updateHighestKnown(MigrationManager.java:121)
at
org.apache.cassandra.service.MigrationManager.rectify(MigrationManager.java:99)
at
org.apache.cassandra.service.MigrationManager.onAlive(MigrationManager.java:83)
at org.apache.cassandra.gms.Gossiper.markAlive(Gossiper.java:806)
at org.apache.cassandra.gms.Gossiper.applyStateLocally(Gossiper.java:923)
at
org.apache.cassandra.gms.GossipDigestAckVerbHandler.doVerb(GossipDigestAckVerbHandler.java:68)
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:59)
at
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
----------
I've seen other people reporting similar problem with 1.1.x.
Not sure, but maybe we could avoid this by upgrading seeds last.
Best,
Rustam.