This is the log after enabling TRACE on org.apache.cassandra.net.OutboundTcpConnection:
DEBUG [WRITE-/54.215.70.YY] 2013-10-03 18:01:50,237 OutboundTcpConnection.java (line 338) Target max version is -2147483648; no version information yet, will retry TRACE [HANDSHAKE-/10.177.14.XX] 2013-10-03 18:01:50,237 OutboundTcpConnection.java (line 406) Cannot handshake version with /10.177.14.XX java.nio.channels.AsynchronousCloseException at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:272) at sun.nio.ch.SocketAdaptor$SocketInputStream.read(SocketAdaptor.java:176) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:86) at java.io.InputStream.read(InputStream.java:82) at sun.nio.ch.ChannelInputStream.read(ChannelInputStream.java:64) at java.io.DataInputStream.readInt(DataInputStream.java:370) at org.apache.cassandra.net.OutboundTcpConnection$1.run(OutboundTcpConnection.java:400) 2013/10/3 Paulo Motta <pauloricard...@gmail.com> > Hello, > > During a rolling upgrade between 1.1.10 and 1.2.10, the newly upgrade > nodes keep showing the following log message: > > INFO [HANDSHAKE-/10.176.249.XX] 2013-10-03 17:36:16,948 > OutboundTcpConnection.java (line 399) Handshaking version with > /10.176.249.XX > INFO [HANDSHAKE-/10.176.182.YY] 2013-10-03 17:36:17,280 > OutboundTcpConnection.java (line 408) Cannot handshake version with > /10.176.182.YY > INFO [HANDSHAKE-/10.176.182.YY] 2013-10-03 17:36:17,280 > OutboundTcpConnection.java (line 399) Handshaking version with > /10.176.182.YY > INFO [HANDSHAKE-/10.188.13.ZZ] 2013-10-03 17:36:17,510 > OutboundTcpConnection.java (line 408) Cannot handshake version with > /10.188.13.ZZ > INFO [HANDSHAKE-/10.188.13.ZZ] 2013-10-03 17:36:17,511 > OutboundTcpConnection.java (line 399) Handshaking version with /10.188.13.ZZ > > Nodes XX, YY and ZZ are from the previous version (1.1.10). Is it expected > they can't handshake or is this a potential problem? > > During reads to any cluster node they normally succeed, but sometimes I > get read timeout errors. Has anyone had a similar issue? > > Cheers, > > Paulo > > > > 2013/10/2 Paulo Motta <pauloricard...@gmail.com> > >> Nevermind the question. It was a firewall problem. Now the nodes between >> different versions are able to see ach other! =) >> >> Cheers, >> >> Paulo >> >> >> 2013/10/2 Paulo Motta <pauloricard...@gmail.com> >> >>> Hello, >>> >>> I just started the rolling upgrade procedure from 1.1.10 to 2.1.10. Our >>> strategy is to simultaneously upgrade one server from each replication >>> group. So, if we have a 6 nodes with RF=2, we will upgrade 3 nodes at a >>> time (from distinct replication groups). >>> >>> My question is: do the newly upgraded nodes show as "Down" in the >>> "nodetool ring" of the old cluster (1.1.10)? Because I thought that network >>> compatibility meant nodes from a newer version would receive traffic (write >>> + reads) from the previous version without problems. >>> >>> Cheers, >>> >>> Paulo >>> >>> >>> 2013/9/26 Paulo Motta <pauloricard...@gmail.com> >>> >>>> Hello Charles, >>>> >>>> Thank you very much for your detailed upgrade report. It'll be very >>>> helpful during our upgrade operation (even though we'll do a rolling >>>> production upgrade). >>>> >>>> I'll also share our findings during the upgrade here. >>>> >>>> Cheers, >>>> >>>> Paulo >>>> >>>> >>>> 2013/9/24 Charles Brophy <cbro...@zulily.com> >>>> >>>>> Hi Paulo, >>>>> >>>>> I just completed a migration from 1.1.10 to 1.2.10 and it was >>>>> surprisingly painless. >>>>> >>>>> The course of action that I took: >>>>> 1) describe cluster - make sure all nodes are on the same schema >>>>> 2) shutoff all maintenance tasks; i.e. make sure no scheduled repair >>>>> is going to kick off in the middle of what you're doing >>>>> 3) snapshot - maybe not necessary but it's so quick it makes no sense >>>>> to skip this step >>>>> 4) drain the nodes - I shut down the entire cluster rather than chance >>>>> any incompatible gossip concerns that might come from a rolling upgrade. I >>>>> have the luxury of controlling both the providers and consumers of our >>>>> data, so this wasn't so disruptive for us. >>>>> 5) Upgrade the nodes, turn them on one-by-one, monitor the logs for >>>>> funny business. >>>>> 6) nodetool upgradesstables >>>>> 7) Turn various maintenance tasks back on, etc. >>>>> >>>>> The worst part was managing the yaml/config changes between the >>>>> versions. It wasn't horrible, but the diff was "noisier" than a more >>>>> incremental upgrade typically is. A few things I recall that were special: >>>>> 1) Since you have an existing cluster, you'll probably need to set the >>>>> default partitioner back to RandomPartitioner in cassandra.yaml. I believe >>>>> that is outlined in NEWS. >>>>> 2) I set the initial tokens to be the same as what the nodes held >>>>> previously. >>>>> 3) The timeout is now divided into more atomic settings and you get to >>>>> decided how (or if) to configure it from the default appropriately. >>>>> >>>>> tldr; I did a standard upgrade and payed careful attention to the >>>>> NEWS.txt upgrade notices. I did a full cluster restart and NOT a rolling >>>>> upgrade. It went without a hitch. >>>>> >>>>> Charles >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Tue, Sep 24, 2013 at 2:33 PM, Paulo Motta <pauloricard...@gmail.com >>>>> > wrote: >>>>> >>>>>> Cool, sounds fair enough. Thanks for the help, Rob! >>>>>> >>>>>> If anyone has upgraded from 1.1.X to 1.2.X, please feel invited to >>>>>> share any tips on issues you're encountered that are not yet documented. >>>>>> >>>>>> Cheers, >>>>>> >>>>>> Paulo >>>>>> >>>>>> >>>>>> 2013/9/24 Robert Coli <rc...@eventbrite.com> >>>>>> >>>>>>> On Tue, Sep 24, 2013 at 1:41 PM, Paulo Motta < >>>>>>> pauloricard...@gmail.com> wrote: >>>>>>> >>>>>>>> Doesn't the probability of something going wrong increases as the >>>>>>>> gap between the versions increase? So, using this reasoning, upgrading >>>>>>>> from >>>>>>>> 1.1.10 to 1.2.6 would have less chance of something going wrong then >>>>>>>> from >>>>>>>> 1.1.10 to 1.2.9 or 1.2.10. >>>>>>>> >>>>>>> >>>>>>> Sorta, but sorta not. >>>>>>> >>>>>>> https://github.com/apache/cassandra/blob/trunk/NEWS.txt >>>>>>> >>>>>>> Is the canonical source of concerns on upgrade. There are a few >>>>>>> cases where upgrading to the "root" of X.Y.Z creates issues that do not >>>>>>> exist if you upgrade to the "head" of that line. AFAIK there have been >>>>>>> no >>>>>>> cases where upgrading to the "head" of a line (where that line is >>>>>>> mature, >>>>>>> like 1.2.10) has created problems which would have been avoided by >>>>>>> upgrading to the "root" first. >>>>>>> >>>>>>> >>>>>>>> I'm hoping this reasoning is wrong and I can update directly from >>>>>>>> 1.1.10 to 1.2.10. :-) >>>>>>>> >>>>>>> >>>>>>> That's what I plan to do when we move to 1.2.X, FWIW. >>>>>>> >>>>>>> =Rob >>>>>>> >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> Paulo Ricardo >>>>>> >>>>>> -- >>>>>> European Master in Distributed Computing*** >>>>>> Royal Institute of Technology - KTH >>>>>> * >>>>>> *Instituto Superior Técnico - IST* >>>>>> *http://paulormg.com* >>>>>> >>>>> >>>>> >>>> >>>> >>>> -- >>>> Paulo Ricardo >>>> >>>> -- >>>> European Master in Distributed Computing*** >>>> Royal Institute of Technology - KTH >>>> * >>>> *Instituto Superior Técnico - IST* >>>> *http://paulormg.com* >>>> >>> >>> >>> >>> -- >>> Paulo Ricardo >>> >>> -- >>> European Master in Distributed Computing*** >>> Royal Institute of Technology - KTH >>> * >>> *Instituto Superior Técnico - IST* >>> *http://paulormg.com* >>> >> >> >> >> -- >> Paulo Ricardo >> >> -- >> European Master in Distributed Computing*** >> Royal Institute of Technology - KTH >> * >> *Instituto Superior Técnico - IST* >> *http://paulormg.com* >> > > > > -- > Paulo Ricardo > > -- > European Master in Distributed Computing*** > Royal Institute of Technology - KTH > * > *Instituto Superior Técnico - IST* > *http://paulormg.com* > -- Paulo Ricardo -- European Master in Distributed Computing*** Royal Institute of Technology - KTH * *Instituto Superior Técnico - IST* *http://paulormg.com*