Hi Cassandra users, We are trying to upgrade our Cassandra version from 2.2.5 to 3.0.8 (running on Mesos, but that's besides the point). We have two datacenters, so in order to preserve our data, we are trying to upgrade one datacenter at a time.
Initially both DCs (dc1 and dc2) are running 2.2.5. The idea is to tear down dc1 completely (delete all the data in it), bring it up with 3.0.8, let data replicate from dc2 to dc1, and then tear down dc2, bring it up with 3.0.8 and replicate data from dc1. I am able to reproduce the problem on bare metal clusters running on 3 nodes. I am using Oracle's server-jre-8u74-linux-x64 JRE. *Node A*: Downloaded 2.2.5-bin.tar.gz, changed the seeds to include its own IP address, changed listen_address and rpc_address to its own IP and changed endpoint_snitch to GossipingPropertyFileSnitch. I changed conf/cassandra-rackdc.properties to dc=dc2 rack=rack2 This node started up fine and is UN in nodetool status in dc2. I used CQL shell to create a table and insert 3 rows: verma@xxxxx:~/apache-cassandra-2.2.5$ bin/cqlsh $HOSTNAME Connected to Test Cluster at xxxxx:9042. [cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4] Use HELP for help. cqlsh> desc tmp CREATE KEYSPACE tmp WITH replication = {'class': 'NetworkTopologyStrategy', 'dc1': '1', 'dc2': '1'} AND durable_writes = true; CREATE TABLE tmp.map ( key text PRIMARY KEY, value text )...; cqlsh> select * from tmp.map; key | value -----+------- k1 | v1 k3 | v3 k2 | v2 *Node B:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include itself and node A, changed listen_address and rpc_address to its own IP, changed endpoint_snitch to GossipingPropertyFileSnitch. I did not change conf/cassandra-rackdc.properties and its contents are dc=dc1 rack=rack1 In the logs, I see: INFO [main] 2016-10-10 22:42:42,850 MessagingService.java:557 - Starting Messaging Service on /10.164.32.29:7000 (eth0) INFO [main] 2016-10-10 22:42:42,864 StorageService.java:784 - This node will not auto bootstrap because it is configured to be a seed node. So I start a third node: *Node C:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include node A and node B, changed listen_address and rpc_address to its own IP, changed endpoint_snitch to GossipingPropertyFileSnitch. I did not change conf/cassandra-rackdc.properties. Now, nodetool status shows: verma@xxxxxxx:~/apache-cassandra-3.0.8$ bin/nodetool status Datacenter: dc1 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UJ <Node C IP> 87.81 KB 256 ? 9064832d-ed5c-4c42-ad5a-f754b52b670c rack1 UN <Node B IP> 107.72 KB 256 100.0% 28b1043f-115b-46a5-b6b6-8609829cde76 rack1 Datacenter: dc2 =============== Status=Up/Down |/ State=Normal/Leaving/Joining/Moving -- Address Load Tokens Owns (effective) Host ID Rack UN <Node A IP> 73.2 KB 256 100.0% 09cc542c-2299-45a5-a4d1-159c239ded37 rack2 Nodetool describe cluster shows: verma@xxxxxxx:~/apache-cassandra-3.0.8$ bin/nodetool describecluster Cluster Information: Name: Test Cluster Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: c2a2bb4f-7d31-3fb8-a216-00b41a643650: [<Node B IP>, <Node C IP>] 9770e3c5-3135-32e2-b761-65a0f6d8824e: [<Node A IP>] Note that there are two schema versions and they don't match. I see the following in the system.log: INFO [InternalResponseStage:1] 2016-10-10 22:48:36,055 ColumnFamilyStore.java:390 - Initializing system_auth.roles INFO [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING: waiting for schema information to complete INFO [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING: schema complete, ready to bootstrap INFO [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING: waiting for pending range calculation INFO [main] 2016-10-10 22:48:36,317 StorageService.java:1149 - JOINING: calculation complete, ready to bootstrap INFO [main] 2016-10-10 22:48:36,319 StorageService.java:1149 - JOINING: getting bootstrap token INFO [main] 2016-10-10 22:48:36,357 StorageService.java:1149 - JOINING: sleeping 30000 ms for pending range setup INFO [main] 2016-10-10 22:49:06,358 StorageService.java:1149 - JOINING: Starting to bootstrap... INFO [main] 2016-10-10 22:49:06,494 StreamResultFuture.java:87 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e] Executing streaming plan for Bootstrap INFO [StreamConnectionEstablisher:1] 2016-10-10 22:49:06,495 StreamSession.java:242 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e] Starting streaming to /<Node A IP> INFO [StreamConnectionEstablisher:2] 2016-10-10 22:49:06,495 StreamSession.java:242 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e] Starting streaming to /<Node B IP> INFO [StreamConnectionEstablisher:2] 2016-10-10 22:49:06,500 StreamCoordinator.java:213 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e, ID#0] Beginning stream session with /<Node B IP> INFO [STREAM-IN-/<Node B IP>] 2016-10-10 22:49:06,590 StreamResultFuture.java:183 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e] Session with /<Node B IP> is complete INFO [StreamConnectionEstablisher:1] 2016-10-10 22:49:06,635 StreamCoordinator.java:213 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e, ID#0] Beginning stream session with /<Node A IP> ERROR [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,639 StreamSession.java:528 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e] Streaming error occurred java.io.IOException: Connection reset by peer at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_102] at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.8.0_102] at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_102] at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_102] at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380) ~[na:1.8.0_102] at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206) ~[na:1.8.0_102] at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103) ~[na:1.8.0_102] at java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385) ~[na:1.8.0_102] at org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54) ~[apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287) ~[apache-cassandra-3.0.8.jar:3.0.8] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] INFO [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,639 StreamResultFuture.java:183 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e] Session with /<Node A IP> is complete WARN [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,640 StreamResultFuture.java:210 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e] Stream failed WARN [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,640 StorageService.java:1208 - Error during bootstrap. org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-3.0.8.jar:3.0.8] at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) [guava-18.0.jar:na] at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) [guava-18.0.jar:na] at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) [guava-18.0.jar:na] at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) [guava-18.0.jar:na] at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) [guava-18.0.jar:na] at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) [apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) [apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:429) [apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:534) [apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:305) [apache-cassandra-3.0.8.jar:3.0.8] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102] ERROR [main] 2016-10-10 22:49:06,641 StorageService.java:1218 - Error while waiting on bootstrap to complete. Bootstrap will have to be restarted. java.util.concurrent.ExecutionException: org.apache.cassandra.streaming.StreamException: Stream failed at com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116) ~[guava-18.0.jar:na] at org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1213) [apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:889) [apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:663) [apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.service.StorageService.initServer(StorageService.java:528) [apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:339) [apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557) [apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685) [apache-cassandra-3.0.8.jar:3.0.8] Caused by: org.apache.cassandra.streaming.StreamException: Stream failed at org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85) ~[apache-cassandra-3.0.8.jar:3.0.8] at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145) ~[guava-18.0.jar:na] at com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202) ~[guava-18.0.jar:na] at org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211) ~[apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187) ~[apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:429) ~[apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:534) ~[apache-cassandra-3.0.8.jar:3.0.8] at org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:305) ~[apache-cassandra-3.0.8.jar:3.0.8] at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102] WARN [main] 2016-10-10 22:49:06,646 StorageService.java:944 - Some data streaming failed. Use nodetool to check bootstrap state and resume. For more, see `nodetool help bootstrap`. IN_PROGRESS INFO [main] 2016-10-10 22:49:06,647 CassandraDaemon.java:644 - Waiting for gossip to settle before accepting client requests... INFO [main] 2016-10-10 22:49:14,648 CassandraDaemon.java:675 - No gossip backlog; proceeding INFO [main] 2016-10-10 22:49:14,694 NativeTransportService.java:70 - Netty using native Epoll event loop INFO [main] 2016-10-10 22:49:14,726 Server.java:159 - Using Netty Version: [netty-buffer=netty-buffer-4.0.23.Final.208198c, netty-codec=netty-codec-4.0.23.Final.208198c, netty-codec-http=netty-codec-http-4.0.23.Final.208198c, netty-codec-socks=netty-codec-socks-4.0.23.Final.208198c, netty-common=netty-common-4.0.23.Final.208198c, netty-handler=netty-handler-4.0.23.Final.208198c, netty-transport=netty-transport-4.0.23.Final.208198c, netty-transport-rxtx=netty-transport-rxtx-4.0.23.Final.208198c, netty-transport-sctp=netty-transport-sctp-4.0.23.Final.208198c, netty-transport-udt=netty-transport-udt-4.0.23.Final.208198c] INFO [main] 2016-10-10 22:49:14,726 Server.java:160 - Starting listening for CQL clients on /<Node C IP>:9042 (unencrypted)... INFO [main] 2016-10-10 22:49:14,748 CassandraDaemon.java:477 - Not starting RPC server as requested. Use JMX (StorageService->startRPCServer()) or nodetool (enablethrift) to start it I tried resuming bootstrap but it fails with the same streaming errors: verma@<Node C>:~/apache-cassandra-3.0.8$ bin/nodetool bootstrap resume Resuming bootstrap [2016-10-10 23:15:11,816] session with /<Node B IP> complete (progress: 0%) [2016-10-10 23:15:11,939] session with /<Node A IP> complete (progress: 0%) [2016-10-10 23:15:11,940] Stream failed and I see the same error in the system.log: StreamSession.java:528 - [Stream #64b73a20-8f3f-11e6-b69a-1b451159408e] Streaming error occurred java.io.IOException: Connection reset by peer ... Does Cassandra support upgrading from 2.2.5 to 3.0.8 in this way? Am I missing something? Thanks for your time. -Abhishek.