Hi Cassandra users,

We are trying to upgrade our Cassandra version from 2.2.5 to 3.0.8 (running
on Mesos, but that's besides the point). We have two datacenters, so in
order to preserve our data, we are trying to upgrade one datacenter at a
time.

Initially both DCs (dc1 and dc2) are running 2.2.5. The idea is to tear
down dc1 completely (delete all the data in it), bring it up with 3.0.8,
let data replicate from dc2 to dc1, and then tear down dc2, bring it up
with 3.0.8 and replicate data from dc1.

I am able to reproduce the problem on bare metal clusters running on 3
nodes. I am using Oracle's server-jre-8u74-linux-x64 JRE.

*Node A*: Downloaded 2.2.5-bin.tar.gz, changed the seeds to include its own
IP address, changed listen_address and rpc_address to its own IP and
changed endpoint_snitch to GossipingPropertyFileSnitch. I
changed conf/cassandra-rackdc.properties to
dc=dc2
rack=rack2
This node started up fine and is UN in nodetool status in dc2.

I used CQL shell to create a table and insert 3 rows:
verma@xxxxx:~/apache-cassandra-2.2.5$ bin/cqlsh $HOSTNAME
Connected to Test Cluster at xxxxx:9042.
[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh> desc tmp

CREATE KEYSPACE tmp WITH replication = {'class': 'NetworkTopologyStrategy',
'dc1': '1', 'dc2': '1'}  AND durable_writes = true;

CREATE TABLE tmp.map (
    key text PRIMARY KEY,
    value text
)...;
cqlsh> select * from tmp.map;

 key | value
-----+-------
  k1 |    v1
  k3 |    v3
  k2 |    v2


*Node B:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include itself
and node A, changed listen_address and rpc_address to its own IP, changed
endpoint_snitch to GossipingPropertyFileSnitch. I did not change
conf/cassandra-rackdc.properties and its contents are
dc=dc1
rack=rack1

In the logs, I see:
INFO  [main] 2016-10-10 22:42:42,850 MessagingService.java:557 - Starting
Messaging Service on /10.164.32.29:7000 (eth0)
INFO  [main] 2016-10-10 22:42:42,864 StorageService.java:784 - This node
will not auto bootstrap because it is configured to be a seed node.

So I start a third node:
*Node C:* Downloaded 3.0.8-bin.tar.gz, changed the seeds to include node A
and node B, changed listen_address and rpc_address to its own IP, changed
endpoint_snitch to GossipingPropertyFileSnitch. I did not change
conf/cassandra-rackdc.properties.
Now, nodetool status shows:

verma@xxxxxxx:~/apache-cassandra-3.0.8$ bin/nodetool status
Datacenter: dc1
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID
                      Rack
UJ  <Node C IP>   87.81 KB   256          ?
9064832d-ed5c-4c42-ad5a-f754b52b670c  rack1
UN  <Node B IP>  107.72 KB  256          100.0%
 28b1043f-115b-46a5-b6b6-8609829cde76  rack1
Datacenter: dc2
===============
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address       Load       Tokens       Owns (effective)  Host ID
                      Rack
UN  <Node A IP>    73.2 KB    256          100.0%
 09cc542c-2299-45a5-a4d1-159c239ded37  rack2

Nodetool describe cluster shows:
verma@xxxxxxx:~/apache-cassandra-3.0.8$ bin/nodetool describecluster
Cluster Information:
Name: Test Cluster
Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch
Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
Schema versions:
c2a2bb4f-7d31-3fb8-a216-00b41a643650: [<Node B IP>, <Node C IP>]

9770e3c5-3135-32e2-b761-65a0f6d8824e: [<Node A IP>]

Note that there are two schema versions and they don't match.

I see the following in the system.log:

INFO  [InternalResponseStage:1] 2016-10-10 22:48:36,055
ColumnFamilyStore.java:390 - Initializing system_auth.roles
INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING:
waiting for schema information to complete
INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING:
schema complete, ready to bootstrap
INFO  [main] 2016-10-10 22:48:36,316 StorageService.java:1149 - JOINING:
waiting for pending range calculation
INFO  [main] 2016-10-10 22:48:36,317 StorageService.java:1149 - JOINING:
calculation complete, ready to bootstrap
INFO  [main] 2016-10-10 22:48:36,319 StorageService.java:1149 - JOINING:
getting bootstrap token
INFO  [main] 2016-10-10 22:48:36,357 StorageService.java:1149 - JOINING:
sleeping 30000 ms for pending range setup
INFO  [main] 2016-10-10 22:49:06,358 StorageService.java:1149 - JOINING:
Starting to bootstrap...
INFO  [main] 2016-10-10 22:49:06,494 StreamResultFuture.java:87 - [Stream
#bfb5e470-8f3b-11e6-b69a-1b451159408e] Executing streaming plan for
Bootstrap
INFO  [StreamConnectionEstablisher:1] 2016-10-10 22:49:06,495
StreamSession.java:242 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e]
Starting streaming to /<Node A IP>
INFO  [StreamConnectionEstablisher:2] 2016-10-10 22:49:06,495
StreamSession.java:242 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e]
Starting streaming to /<Node B IP>
INFO  [StreamConnectionEstablisher:2] 2016-10-10 22:49:06,500
StreamCoordinator.java:213 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e,
ID#0] Beginning stream session with /<Node B IP>
INFO  [STREAM-IN-/<Node B IP>] 2016-10-10 22:49:06,590
StreamResultFuture.java:183 - [Stream
#bfb5e470-8f3b-11e6-b69a-1b451159408e] Session with /<Node B IP> is complete
INFO  [StreamConnectionEstablisher:1] 2016-10-10 22:49:06,635
StreamCoordinator.java:213 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e,
ID#0] Beginning stream session with /<Node A IP>
ERROR [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,639
StreamSession.java:528 - [Stream #bfb5e470-8f3b-11e6-b69a-1b451159408e]
Streaming error occurred
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_102]
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
~[na:1.8.0_102]
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_102]
at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_102]
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
~[na:1.8.0_102]
at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206)
~[na:1.8.0_102]
at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
~[na:1.8.0_102]
at
java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
~[na:1.8.0_102]
at
org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:54)
~[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:287)
~[apache-cassandra-3.0.8.jar:3.0.8]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
INFO  [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,639
StreamResultFuture.java:183 - [Stream
#bfb5e470-8f3b-11e6-b69a-1b451159408e] Session with /<Node A IP> is complete
WARN  [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,640
StreamResultFuture.java:210 - [Stream
#bfb5e470-8f3b-11e6-b69a-1b451159408e] Stream failed
WARN  [STREAM-IN-/<Node A IP>] 2016-10-10 22:49:06,640
StorageService.java:1208 - Error during bootstrap.
org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
~[apache-cassandra-3.0.8.jar:3.0.8]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
[guava-18.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
[guava-18.0.jar:na]
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:429)
[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:534)
[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:305)
[apache-cassandra-3.0.8.jar:3.0.8]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_102]
ERROR [main] 2016-10-10 22:49:06,641 StorageService.java:1218 - Error while
waiting on bootstrap to complete. Bootstrap will have to be restarted.
java.util.concurrent.ExecutionException:
org.apache.cassandra.streaming.StreamException: Stream failed
at
com.google.common.util.concurrent.AbstractFuture$Sync.getValue(AbstractFuture.java:299)
~[guava-18.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture$Sync.get(AbstractFuture.java:286)
~[guava-18.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:116)
~[guava-18.0.jar:na]
at
org.apache.cassandra.service.StorageService.bootstrap(StorageService.java:1213)
[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.service.StorageService.joinTokenRing(StorageService.java:889)
[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:663)
[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.service.StorageService.initServer(StorageService.java:528)
[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:339)
[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:557)
[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:685)
[apache-cassandra-3.0.8.jar:3.0.8]
Caused by: org.apache.cassandra.streaming.StreamException: Stream failed
at
org.apache.cassandra.streaming.management.StreamEventJMXNotifier.onFailure(StreamEventJMXNotifier.java:85)
~[apache-cassandra-3.0.8.jar:3.0.8]
at com.google.common.util.concurrent.Futures$6.run(Futures.java:1310)
~[guava-18.0.jar:na]
at
com.google.common.util.concurrent.MoreExecutors$DirectExecutor.execute(MoreExecutors.java:457)
~[guava-18.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.executeListener(ExecutionList.java:156)
~[guava-18.0.jar:na]
at
com.google.common.util.concurrent.ExecutionList.execute(ExecutionList.java:145)
~[guava-18.0.jar:na]
at
com.google.common.util.concurrent.AbstractFuture.setException(AbstractFuture.java:202)
~[guava-18.0.jar:na]
at
org.apache.cassandra.streaming.StreamResultFuture.maybeComplete(StreamResultFuture.java:211)
~[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.streaming.StreamResultFuture.handleSessionComplete(StreamResultFuture.java:187)
~[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.streaming.StreamSession.closeSession(StreamSession.java:429)
~[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.streaming.StreamSession.onError(StreamSession.java:534)
~[apache-cassandra-3.0.8.jar:3.0.8]
at
org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:305)
~[apache-cassandra-3.0.8.jar:3.0.8]
at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_102]
WARN  [main] 2016-10-10 22:49:06,646 StorageService.java:944 - Some data
streaming failed. Use nodetool to check bootstrap state and resume. For
more, see `nodetool help bootstrap`. IN_PROGRESS
INFO  [main] 2016-10-10 22:49:06,647 CassandraDaemon.java:644 - Waiting for
gossip to settle before accepting client requests...
INFO  [main] 2016-10-10 22:49:14,648 CassandraDaemon.java:675 - No gossip
backlog; proceeding
INFO  [main] 2016-10-10 22:49:14,694 NativeTransportService.java:70 - Netty
using native Epoll event loop
INFO  [main] 2016-10-10 22:49:14,726 Server.java:159 - Using Netty Version:
[netty-buffer=netty-buffer-4.0.23.Final.208198c,
netty-codec=netty-codec-4.0.23.Final.208198c,
netty-codec-http=netty-codec-http-4.0.23.Final.208198c,
netty-codec-socks=netty-codec-socks-4.0.23.Final.208198c,
netty-common=netty-common-4.0.23.Final.208198c,
netty-handler=netty-handler-4.0.23.Final.208198c,
netty-transport=netty-transport-4.0.23.Final.208198c,
netty-transport-rxtx=netty-transport-rxtx-4.0.23.Final.208198c,
netty-transport-sctp=netty-transport-sctp-4.0.23.Final.208198c,
netty-transport-udt=netty-transport-udt-4.0.23.Final.208198c]
INFO  [main] 2016-10-10 22:49:14,726 Server.java:160 - Starting listening
for CQL clients on /<Node C IP>:9042 (unencrypted)...
INFO  [main] 2016-10-10 22:49:14,748 CassandraDaemon.java:477 - Not
starting RPC server as requested. Use JMX
(StorageService->startRPCServer()) or nodetool (enablethrift) to start it

I tried resuming bootstrap but it fails with the same streaming errors:

verma@<Node C>:~/apache-cassandra-3.0.8$ bin/nodetool bootstrap resume
Resuming bootstrap
[2016-10-10 23:15:11,816] session with /<Node B IP> complete (progress: 0%)
[2016-10-10 23:15:11,939] session with /<Node A IP> complete (progress: 0%)
[2016-10-10 23:15:11,940] Stream failed

and I see the same error in the system.log:

StreamSession.java:528 - [Stream #64b73a20-8f3f-11e6-b69a-1b451159408e]
Streaming error occurred
java.io.IOException: Connection reset by peer
...

Does Cassandra support upgrading from 2.2.5 to 3.0.8 in this way? Am I
missing something?

Thanks for your time.
-Abhishek.

Reply via email to