Hello, During a rolling upgrade between 1.1.10 and 1.2.10, the newly upgrade nodes keep showing the following log message:
INFO [HANDSHAKE-/10.176.249.XX] 2013-10-03 17:36:16,948 OutboundTcpConnection.java (line 399) Handshaking version with /10.176.249.XX INFO [HANDSHAKE-/10.176.182.YY] 2013-10-03 17:36:17,280 OutboundTcpConnection.java (line 408) Cannot handshake version with /10.176.182.YY INFO [HANDSHAKE-/10.176.182.YY] 2013-10-03 17:36:17,280 OutboundTcpConnection.java (line 399) Handshaking version with /10.176.182.YY INFO [HANDSHAKE-/10.188.13.ZZ] 2013-10-03 17:36:17,510 OutboundTcpConnection.java (line 408) Cannot handshake version with /10.188.13.ZZ INFO [HANDSHAKE-/10.188.13.ZZ] 2013-10-03 17:36:17,511 OutboundTcpConnection.java (line 399) Handshaking version with /10.188.13.ZZ Nodes XX, YY and ZZ are from the previous version (1.1.10). Is it expected they can't handshake or is this a potential problem? During reads to any cluster node they normally succeed, but sometimes I get read timeout errors. Has anyone had a similar issue? Cheers, Paulo 2013/10/2 Paulo Motta <pauloricard...@gmail.com> > Nevermind the question. It was a firewall problem. Now the nodes between > different versions are able to see ach other! =) > > Cheers, > > Paulo > > > 2013/10/2 Paulo Motta <pauloricard...@gmail.com> > >> Hello, >> >> I just started the rolling upgrade procedure from 1.1.10 to 2.1.10. Our >> strategy is to simultaneously upgrade one server from each replication >> group. So, if we have a 6 nodes with RF=2, we will upgrade 3 nodes at a >> time (from distinct replication groups). >> >> My question is: do the newly upgraded nodes show as "Down" in the >> "nodetool ring" of the old cluster (1.1.10)? Because I thought that network >> compatibility meant nodes from a newer version would receive traffic (write >> + reads) from the previous version without problems. >> >> Cheers, >> >> Paulo >> >> >> 2013/9/26 Paulo Motta <pauloricard...@gmail.com> >> >>> Hello Charles, >>> >>> Thank you very much for your detailed upgrade report. It'll be very >>> helpful during our upgrade operation (even though we'll do a rolling >>> production upgrade). >>> >>> I'll also share our findings during the upgrade here. >>> >>> Cheers, >>> >>> Paulo >>> >>> >>> 2013/9/24 Charles Brophy <cbro...@zulily.com> >>> >>>> Hi Paulo, >>>> >>>> I just completed a migration from 1.1.10 to 1.2.10 and it was >>>> surprisingly painless. >>>> >>>> The course of action that I took: >>>> 1) describe cluster - make sure all nodes are on the same schema >>>> 2) shutoff all maintenance tasks; i.e. make sure no scheduled repair is >>>> going to kick off in the middle of what you're doing >>>> 3) snapshot - maybe not necessary but it's so quick it makes no sense >>>> to skip this step >>>> 4) drain the nodes - I shut down the entire cluster rather than chance >>>> any incompatible gossip concerns that might come from a rolling upgrade. I >>>> have the luxury of controlling both the providers and consumers of our >>>> data, so this wasn't so disruptive for us. >>>> 5) Upgrade the nodes, turn them on one-by-one, monitor the logs for >>>> funny business. >>>> 6) nodetool upgradesstables >>>> 7) Turn various maintenance tasks back on, etc. >>>> >>>> The worst part was managing the yaml/config changes between the >>>> versions. It wasn't horrible, but the diff was "noisier" than a more >>>> incremental upgrade typically is. A few things I recall that were special: >>>> 1) Since you have an existing cluster, you'll probably need to set the >>>> default partitioner back to RandomPartitioner in cassandra.yaml. I believe >>>> that is outlined in NEWS. >>>> 2) I set the initial tokens to be the same as what the nodes held >>>> previously. >>>> 3) The timeout is now divided into more atomic settings and you get to >>>> decided how (or if) to configure it from the default appropriately. >>>> >>>> tldr; I did a standard upgrade and payed careful attention to the >>>> NEWS.txt upgrade notices. I did a full cluster restart and NOT a rolling >>>> upgrade. It went without a hitch. >>>> >>>> Charles >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Tue, Sep 24, 2013 at 2:33 PM, Paulo Motta >>>> <pauloricard...@gmail.com>wrote: >>>> >>>>> Cool, sounds fair enough. Thanks for the help, Rob! >>>>> >>>>> If anyone has upgraded from 1.1.X to 1.2.X, please feel invited to >>>>> share any tips on issues you're encountered that are not yet documented. >>>>> >>>>> Cheers, >>>>> >>>>> Paulo >>>>> >>>>> >>>>> 2013/9/24 Robert Coli <rc...@eventbrite.com> >>>>> >>>>>> On Tue, Sep 24, 2013 at 1:41 PM, Paulo Motta < >>>>>> pauloricard...@gmail.com> wrote: >>>>>> >>>>>>> Doesn't the probability of something going wrong increases as the >>>>>>> gap between the versions increase? So, using this reasoning, upgrading >>>>>>> from >>>>>>> 1.1.10 to 1.2.6 would have less chance of something going wrong then >>>>>>> from >>>>>>> 1.1.10 to 1.2.9 or 1.2.10. >>>>>>> >>>>>> >>>>>> Sorta, but sorta not. >>>>>> >>>>>> https://github.com/apache/cassandra/blob/trunk/NEWS.txt >>>>>> >>>>>> Is the canonical source of concerns on upgrade. There are a few cases >>>>>> where upgrading to the "root" of X.Y.Z creates issues that do not exist >>>>>> if >>>>>> you upgrade to the "head" of that line. AFAIK there have been no cases >>>>>> where upgrading to the "head" of a line (where that line is mature, like >>>>>> 1.2.10) has created problems which would have been avoided by upgrading >>>>>> to >>>>>> the "root" first. >>>>>> >>>>>> >>>>>>> I'm hoping this reasoning is wrong and I can update directly from >>>>>>> 1.1.10 to 1.2.10. :-) >>>>>>> >>>>>> >>>>>> That's what I plan to do when we move to 1.2.X, FWIW. >>>>>> >>>>>> =Rob >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Paulo Ricardo >>>>> >>>>> -- >>>>> European Master in Distributed Computing*** >>>>> Royal Institute of Technology - KTH >>>>> * >>>>> *Instituto Superior Técnico - IST* >>>>> *http://paulormg.com* >>>>> >>>> >>>> >>> >>> >>> -- >>> Paulo Ricardo >>> >>> -- >>> European Master in Distributed Computing*** >>> Royal Institute of Technology - KTH >>> * >>> *Instituto Superior Técnico - IST* >>> *http://paulormg.com* >>> >> >> >> >> -- >> Paulo Ricardo >> >> -- >> European Master in Distributed Computing*** >> Royal Institute of Technology - KTH >> * >> *Instituto Superior Técnico - IST* >> *http://paulormg.com* >> > > > > -- > Paulo Ricardo > > -- > European Master in Distributed Computing*** > Royal Institute of Technology - KTH > * > *Instituto Superior Técnico - IST* > *http://paulormg.com* > -- Paulo Ricardo -- European Master in Distributed Computing*** Royal Institute of Technology - KTH * *Instituto Superior Técnico - IST* *http://paulormg.com*