There has been a little misunderstanding. When all nodes are 1.2.2, they are fine. But during the rolling upgrade, 1.2.2 nodes see 1.1.10 nodes as down in nodetool command despite gossip reporting NORMAL. I will give your suggestion a try and wil report back.
On Sat, Mar 23, 2013 at 10:37 AM, aaron morton <aa...@thelastpickle.com>wrote: > So all nodes are 1.2 and some are still being marked as down ? > > I would try a rolling restart with -Dcassandra.load_ring_state=false added > as a JVM _OPT in cassandra-env.sh. There is no guarantee it will fix it, > but it's a simple thing to try. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 22/03/2013, at 10:30 AM, Arya Goudarzi <gouda...@gmail.com> wrote: > > I took Brandon's suggestion in CASSANDRA-5332 and upgraded to 1.1.10 > before upgrading to 1.2.2 but the issue with nodetool ring reporting > machines as down did not resolve. > > On Fri, Mar 15, 2013 at 6:35 PM, Arya Goudarzi <gouda...@gmail.com> wrote: > >> Thank you very much Aaron. I recall from the logs of this upgraded node >> to 1.2.2 reported seeing others as dead. Brandon suggested in >> https://issues.apache.org/jira/browse/CASSANDRA-5332 that I should at >> least upgrade from 1.1.7. So, I decided to try upgrading to 1.1.10 first >> before upgrading to 1.2.2. I am in the middle of troubleshooting some other >> issues I had with that upgrade (posted separately), once I am done, I will >> give your suggestion a try. >> >> >> On Mon, Mar 11, 2013 at 10:34 PM, aaron morton >> <aa...@thelastpickle.com>wrote: >> >>> > Is this just a display bug in nodetool or this upgraded node really >>> sees the other ones as dead? >>> Is the 1.2.2 node which is see all the others as down processing >>> requests ? >>> Is it showing the others as down in the log ? >>> >>> I'm not really sure what's happening. But you can try starting the 1.2.2 >>> node with the >>> >>> -Dcassandra.load_ring_state=false >>> >>> parameter, append it at the bottom of the cassandra-env.sh file. It will >>> force the node to get the ring state from the others. >>> >>> Cheers >>> >>> ----------------- >>> Aaron Morton >>> Freelance Cassandra Consultant >>> New Zealand >>> >>> @aaronmorton >>> http://www.thelastpickle.com >>> >>> On 8/03/2013, at 10:24 PM, Arya Goudarzi <gouda...@gmail.com> wrote: >>> >>> > OK. I upgraded one node from 1.1.6 to 1.2.2 today. Despite some new >>> problems that I had and I posted them in a separate email, this issue still >>> exists but now it is only on 1.2.2 node. This means that the nodes running >>> 1.1.6 see all other nodes including 1.2.2 as Up. Here is the ring and >>> gossip from nodes with 1.1.6 for example. Bold denotes upgraded node: >>> > >>> > Address DC Rack Status State Load >>> Effective-Ownership Token >>> > >>> 141784319550391026443072753098378663700 >>> > XX.180.36 us-east 1b Up Normal 49.47 GB >>> 25.00% 1808575600 >>> > XX.231.121 us-east 1c Up Normal 47.08 GB >>> 25.00% 7089215977519551322153637656637080005 >>> > XX.177.177 us-east 1d Up Normal 33.64 GB >>> 25.00% 14178431955039102644307275311465584410 >>> > XX.7.148 us-east 1b Up Normal 41.27 GB >>> 25.00% 42535295865117307932921825930779602030 >>> > XX.20.9 us-east 1c Up Normal 38.51 GB >>> 25.00% 49624511842636859255075463585608106435 >>> > XX.86.255 us-east 1d Up Normal 34.78 GB >>> 25.00% 56713727820156410577229101240436610840 >>> > XX.63.230 us-east 1b Up Normal 38.11 GB >>> 25.00% 85070591730234615865843651859750628460 >>> > XX.163.36 us-east 1c Up Normal 44.25 GB >>> 25.00% 92159807707754167187997289514579132865 >>> > XX.31.234 us-east 1d Up Normal 44.66 GB >>> 25.00% 99249023685273718510150927169407637270 >>> > XX.132.169 us-east 1b Up Normal 44.2 GB >>> 25.00% 127605887595351923798765477788721654890 >>> > XX.71.63 us-east 1c Up Normal 38.74 GB >>> 25.00% 134695103572871475120919115443550159295 >>> > XX.197.209 us-east 1d Up Normal 41.5 GB >>> 25.00% 141784319550391026443072753098378663700 >>> > >>> > /XX.71.63 >>> > RACK:1c >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > LOAD:4.1598705272E10 >>> > DC:us-east >>> > INTERNAL_IP:XX.194.92 >>> > STATUS:NORMAL,134695103572871475120919115443550159295 >>> > RPC_ADDRESS:XX.194.92 >>> > RELEASE_VERSION:1.1.6 >>> > /XX.86.255 >>> > RACK:1d >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > LOAD:3.734334162E10 >>> > DC:us-east >>> > INTERNAL_IP:XX.6.195 >>> > STATUS:NORMAL,56713727820156410577229101240436610840 >>> > RPC_ADDRESS:XX.6.195 >>> > RELEASE_VERSION:1.1.6 >>> > /XX.7.148 >>> > RACK:1b >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > LOAD:4.4316975808E10 >>> > DC:us-east >>> > INTERNAL_IP:XX.47.250 >>> > STATUS:NORMAL,42535295865117307932921825930779602030 >>> > RPC_ADDRESS:XX.47.250 >>> > RELEASE_VERSION:1.1.6 >>> > /XX.63.230 >>> > RACK:1b >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > LOAD:4.0918593305E10 >>> > DC:us-east >>> > INTERNAL_IP:XX.89.127 >>> > STATUS:NORMAL,85070591730234615865843651859750628460 >>> > RPC_ADDRESS:XX.89.127 >>> > RELEASE_VERSION:1.1.6 >>> > /XX.132.169 >>> > RACK:1b >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > LOAD:4.745883458E10 >>> > DC:us-east >>> > INTERNAL_IP:XX.94.161 >>> > STATUS:NORMAL,127605887595351923798765477788721654890 >>> > RPC_ADDRESS:XX.94.161 >>> > RELEASE_VERSION:1.1.6 >>> > /XX.180.36 >>> > RACK:1b >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > LOAD:5.311963027E10 >>> > DC:us-east >>> > INTERNAL_IP:XX.123.112 >>> > STATUS:NORMAL,1808575600 >>> > RPC_ADDRESS:XX.123.112 >>> > RELEASE_VERSION:1.1.6 >>> > /XX.163.36 >>> > RACK:1c >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > LOAD:4.7516755022E10 >>> > DC:us-east >>> > INTERNAL_IP:XX.163.180 >>> > STATUS:NORMAL,92159807707754167187997289514579132865 >>> > RPC_ADDRESS:XX.163.180 >>> > RELEASE_VERSION:1.1.6 >>> > /XX.31.234 >>> > RACK:1d >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > LOAD:4.7954372912E10 >>> > DC:us-east >>> > INTERNAL_IP:XX.192.159 >>> > STATUS:NORMAL,99249023685273718510150927169407637270 >>> > RPC_ADDRESS:XX.192.159 >>> > RELEASE_VERSION:1.1.6 >>> > /XX.197.209 >>> > RACK:1d >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > LOAD:4.4558968005E10 >>> > DC:us-east >>> > INTERNAL_IP:XX.66.205 >>> > STATUS:NORMAL,141784319550391026443072753098378663700 >>> > RPC_ADDRESS:XX.66.205 >>> > RELEASE_VERSION:1.1.6 >>> > /XX.177.177 >>> > RACK:1d >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > LOAD:3.6115572697E10 >>> > DC:us-east >>> > INTERNAL_IP:XX.65.57 >>> > STATUS:NORMAL,14178431955039102644307275311465584410 >>> > RPC_ADDRESS:XX.65.57 >>> > RELEASE_VERSION:1.1.6 >>> > /XX.20.9 >>> > RACK:1c >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > LOAD:4.1352503882E10 >>> > DC:us-east >>> > INTERNAL_IP:XX.33.229 >>> > STATUS:NORMAL,49624511842636859255075463585608106435 >>> > RPC_ADDRESS:XX.33.229 >>> > RELEASE_VERSION:1.1.6 >>> > /XX.231.121 >>> > RACK:1c >>> > SCHEMA:09487aa5-3380-33ab-b9a5-bcc8476066b0 >>> > X4:9c765678-d058-4d85-a588-638ce10ff984 >>> > X3:7 >>> > DC:us-east >>> > INTERNAL_IP:XX.223.241 >>> > RPC_ADDRESS:XX.223.241 >>> > RELEASE_VERSION:1.2.2 >>> > >>> > Now the nodetool on the 1.2.2 node shows all nodes as Down but itself. >>> Gossipinfo looks gook though: >>> > >>> > Datacenter: us-east >>> > ========== >>> > Replicas: 3 >>> > >>> > Address Rack Status State Load Owns >>> Token >>> > >>> 56713727820156410577229101240436610840 >>> > XX.132.169 1b Down Normal 44.2 GB 25.00% >>> 127605887595351923798765477788721654890 >>> > XX.7.148 1b Down Normal 41.27 GB 25.00% >>> 42535295865117307932921825930779602030 >>> > XX.180.36 1b Down Normal 49.47 GB 25.00% >>> 1808575600 >>> > XX.63.230 1b Down Normal 38.11 GB 25.00% >>> 85070591730234615865843651859750628460 >>> > XX.231.121 1c Up Normal 47.25 GB 25.00% >>> 7089215977519551322153637656637080005 >>> > XX.71.63 1c Down Normal 38.74 GB 25.00% >>> 134695103572871475120919115443550159295 >>> > XX.177.177 1d Down Normal 33.64 GB 25.00% >>> 14178431955039102644307275311465584410 >>> > XX.31.234 1d Down Normal 44.66 GB 25.00% >>> 99249023685273718510150927169407637270 >>> > XX.20.9 1c Down Normal 38.51 GB 25.00% >>> 49624511842636859255075463585608106435 >>> > XX.163.36 1c Down Normal 44.25 GB 25.00% >>> 92159807707754167187997289514579132865 >>> > XX.197.209 1d Down Normal 41.5 GB 25.00% >>> 141784319550391026443072753098378663700 >>> > XX.86.255 1d Down Normal 34.78 GB 25.00% >>> 56713727820156410577229101240436610840 >>> > >>> > /XX.71.63 >>> > RACK:1c >>> > RPC_ADDRESS:XX.194.92 >>> > RELEASE_VERSION:1.1.6 >>> > INTERNAL_IP:XX.194.92 >>> > STATUS:NORMAL,134695103572871475120919115443550159295 >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > DC:us-east >>> > LOAD:4.1598705272E10 >>> > /XX.86.255 >>> > RACK:1d >>> > RPC_ADDRESS:XX.6.195 >>> > RELEASE_VERSION:1.1.6 >>> > INTERNAL_IP:XX.6.195 >>> > STATUS:NORMAL,56713727820156410577229101240436610840 >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > DC:us-east >>> > LOAD:3.7343205002E10 >>> > /XX.7.148 >>> > RACK:1b >>> > RPC_ADDRESS:XX.47.250 >>> > RELEASE_VERSION:1.1.6 >>> > INTERNAL_IP:XX.47.250 >>> > STATUS:NORMAL,42535295865117307932921825930779602030 >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > DC:us-east >>> > LOAD:4.4316975808E10 >>> > /XX.63.230 >>> > RACK:1b >>> > RPC_ADDRESS:XX.89.127 >>> > RELEASE_VERSION:1.1.6 >>> > INTERNAL_IP:XX.89.127 >>> > STATUS:NORMAL,85070591730234615865843651859750628460 >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > DC:us-east >>> > LOAD:4.0918456687E10 >>> > /XX.132.169 >>> > RACK:1b >>> > RPC_ADDRESS:XX.94.161 >>> > RELEASE_VERSION:1.1.6 >>> > INTERNAL_IP:XX.94.161 >>> > STATUS:NORMAL,127605887595351923798765477788721654890 >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > DC:us-east >>> > LOAD:4.745883458E10 >>> > /XX.180.36 >>> > RACK:1b >>> > RPC_ADDRESS:XX.123.112 >>> > RELEASE_VERSION:1.1.6 >>> > INTERNAL_IP:XX.123.112 >>> > STATUS:NORMAL,1808575600 >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > DC:us-east >>> > LOAD:5.311963027E10 >>> > /XX.163.36 >>> > RACK:1c >>> > RPC_ADDRESS:XX.163.180 >>> > RELEASE_VERSION:1.1.6 >>> > INTERNAL_IP:XX.163.180 >>> > STATUS:NORMAL,92159807707754167187997289514579132865 >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > DC:us-east >>> > LOAD:4.7516755022E10 >>> > /XX.31.234 >>> > RACK:1d >>> > RPC_ADDRESS:XX.192.159 >>> > RELEASE_VERSION:1.1.6 >>> > INTERNAL_IP:XX.192.159 >>> > STATUS:NORMAL,99249023685273718510150927169407637270 >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > DC:us-east >>> > LOAD:4.7954372912E10 >>> > /XX.197.209 >>> > RACK:1d >>> > RPC_ADDRESS:XX.66.205 >>> > RELEASE_VERSION:1.1.6 >>> > INTERNAL_IP:XX.66.205 >>> > STATUS:NORMAL,141784319550391026443072753098378663700 >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > DC:us-east >>> > LOAD:4.4559013211E10 >>> > /XX.177.177 >>> > RACK:1d >>> > RPC_ADDRESS:XX.65.57 >>> > RELEASE_VERSION:1.1.6 >>> > INTERNAL_IP:XX.65.57 >>> > STATUS:NORMAL,14178431955039102644307275311465584410 >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > DC:us-east >>> > LOAD:3.6115572697E10 >>> > /XX.20.9 >>> > RACK:1c >>> > RPC_ADDRESS:XX.33.229 >>> > RELEASE_VERSION:1.1.6 >>> > INTERNAL_IP:XX.33.229 >>> > STATUS:NORMAL,49624511842636859255075463585608106435 >>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>> > DC:us-east >>> > LOAD:4.1352367264E10 >>> > /XX.231.121 >>> > HOST_ID:9c765678-d058-4d85-a588-638ce10ff984 >>> > RACK:1c >>> > RPC_ADDRESS:XX.223.241 >>> > RELEASE_VERSION:1.2.2 >>> > INTERNAL_IP:XX.223.241 >>> > STATUS:NORMAL,7089215977519551322153637656637080005 >>> > NET_VERSION:7 >>> > SCHEMA:8b8948f5-d56f-3a96-8005-b9452e42cd67 >>> > SEVERITY:0.0 >>> > DC:us-east >>> > LOAD:5.0710624207E10 >>> > >>> > Is this just a display bug in nodetool or this upgraded node really >>> sees the other ones as dead? >>> > >>> > -Arya >>> > >>> > >>> > On Mon, Feb 25, 2013 at 8:10 PM, Arya Goudarzi <gouda...@gmail.com> >>> wrote: >>> > No I did not look at nodetool gossipinfo but from the ring on both >>> pre-upgrade and post upgrade nodes to 1.2.1, what I observed was the >>> described behavior. >>> > >>> > >>> > On Sat, Feb 23, 2013 at 1:26 AM, Michael Kjellman < >>> mkjell...@barracuda.com> wrote: >>> > This was a bug with 1.2.0 but resolved in 1.2.1. Did you take a >>> capture of nodetool gossipinfo and nodetool ring by chance? >>> > >>> > On Feb 23, 2013, at 12:26 AM, "Arya Goudarzi" <gouda...@gmail.com> >>> wrote: >>> > >>> > > Hi C* users, >>> > > >>> > > I just upgrade a 12 node test cluster from 1.1.6 to 1.2.1. What I >>> noticed from nodetool ring was that the new upgraded nodes only saw each >>> other as Normal and the rest of the cluster which was on 1.1.6 as Down. >>> Vise versa was true for the nodes running 1.1.6. They saw each other as >>> Normal but the 1.2.1 nodes as down. I don't see a note in upgrade docs that >>> this would be an issue. Has anyone else observed this problem? >>> > > >>> > > In the debug logs I could see messages saying attempting to connect >>> to node IP and then saying it is down. >>> > > >>> > > Cheers, >>> > > -Arya >>> > >>> > Copy, by Barracuda, helps you store, protect, and share all your >>> amazing >>> > >>> > things. Start today: www.copy.com. >>> > >>> > >>> >>> >> > >