Hello,

I am trying to upgrade Apache Cassandra from 2.1.16 to 3.11.3, the regular
rolling upgrade process works fine without any issues.

However, I am running into an issue where if there is a node with older
version dies (hardware failure) and a new node comes up and tries to
bootstrap, it's failing.

I tried two combinations:

1. Joining replacement node with 2.1.16 version of cassandra
In this case nodes with 2.1.16 version are able to stream data to the new
node, but the nodes with 3.11.3 version are failing with the below error.


> ERROR [STREAM-INIT-/10.x.x.x:40296] 2019-07-26 17:45:17,775
> IncomingStreamingConnection.java:80 - Error while reading from socket from
> /10.y.y.y:40296.
> java.io.IOException: Received stream using protocol version 2 (my version
> 4). Terminating connection

2. Joining replacement node with 3.11.3 version of cassandra
In this case the nodes with 3.11.3 version of cassandra are able to stream
the data but it's not able to stream data from the 2.1.16 nodes and failing
with the below error.


> ERROR [STREAM-IN-/10.z.z.z:7000] 2019-07-26 18:08:10,380
> StreamSession.java:593 - [Stream #538c6900-afd0-11e9-a649-ab2e045ee53b]
> Streaming error occurred on session with peer 10.z.z.z
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.8.0_151]
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> ~[na:1.8.0_151]
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.8.0_151]
> at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[na:1.8.0_151]
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:380)
> ~[na:1.8.0_151]
> at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:206)
> ~[na:1.8.0_151]
> at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:103)
> ~[na:1.8.0_151]
> at
> java.nio.channels.Channels$ReadableByteChannelImpl.read(Channels.java:385)
> ~[na:1.8.0_151]
> at
> org.apache.cassandra.streaming.messages.StreamMessage.deserialize(StreamMessage.java:56)
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at
> org.apache.cassandra.streaming.ConnectionHandler$IncomingMessageHandler.run(ConnectionHandler.java:311)
> ~[apache-cassandra-3.11.3.jar:3.11.3]
> at java.lang.Thread.run(Thread.java:748) [na:1.8.0_151]


Note: In both cases I am using replace_address to replace dead node, as I
am running into some issues with "nodetool removenode" . I use ephemeral
disk, so replacement node always comes up with empty data dir and bootstrap.

Any other work around to mitigate this problem? I am worried about any
nodes going down while we are in the process of upgrade, as it could take
several hours to upgrade depending on the cluster size.

Reply via email to