Aaron, I added -Dcassandra.load_ring_state=false in the cassandra-env.sh and did a rolling restart With one node in 1.2.3 version and 11 other nodes in 1.1.10, the 1.1.10 nodes saw 1.2.3 node but now the gossip on 1.2.3 only sees itself.
Cheers, -Arya On Thu, Mar 28, 2013 at 1:02 PM, Arya Goudarzi <gouda...@gmail.com> wrote: > There has been a little misunderstanding. When all nodes are 1.2.2, they > are fine. But during the rolling upgrade, 1.2.2 nodes see 1.1.10 nodes as > down in nodetool command despite gossip reporting NORMAL. I will give your > suggestion a try and wil report back. > > > On Sat, Mar 23, 2013 at 10:37 AM, aaron morton <aa...@thelastpickle.com>wrote: > >> So all nodes are 1.2 and some are still being marked as down ? >> >> I would try a rolling restart with -Dcassandra.load_ring_state=false >> added as a JVM _OPT in cassandra-env.sh. There is no guarantee it will fix >> it, but it's a simple thing to try. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Consultant >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 22/03/2013, at 10:30 AM, Arya Goudarzi <gouda...@gmail.com> wrote: >> >> I took Brandon's suggestion in CASSANDRA-5332 and upgraded to 1.1.10 >> before upgrading to 1.2.2 but the issue with nodetool ring reporting >> machines as down did not resolve. >> >> On Fri, Mar 15, 2013 at 6:35 PM, Arya Goudarzi <gouda...@gmail.com>wrote: >> >>> Thank you very much Aaron. I recall from the logs of this upgraded node >>> to 1.2.2 reported seeing others as dead. Brandon suggested in >>> https://issues.apache.org/jira/browse/CASSANDRA-5332 that I should at >>> least upgrade from 1.1.7. So, I decided to try upgrading to 1.1.10 first >>> before upgrading to 1.2.2. I am in the middle of troubleshooting some other >>> issues I had with that upgrade (posted separately), once I am done, I will >>> give your suggestion a try. >>> >>> >>> On Mon, Mar 11, 2013 at 10:34 PM, aaron morton >>> <aa...@thelastpickle.com>wrote: >>> >>>> > Is this just a display bug in nodetool or this upgraded node really >>>> sees the other ones as dead? >>>> Is the 1.2.2 node which is see all the others as down processing >>>> requests ? >>>> Is it showing the others as down in the log ? >>>> >>>> I'm not really sure what's happening. But you can try starting the >>>> 1.2.2 node with the >>>> >>>> -Dcassandra.load_ring_state=false >>>> >>>> parameter, append it at the bottom of the cassandra-env.sh file. It >>>> will force the node to get the ring state from the others. >>>> >>>> Cheers >>>> >>>> ----------------- >>>> Aaron Morton >>>> Freelance Cassandra Consultant >>>> New Zealand >>>> >>>> @aaronmorton >>>> http://www.thelastpickle.com >>>> >>>> On 8/03/2013, at 10:24 PM, Arya Goudarzi <gouda...@gmail.com> wrote: >>>> >>>> > OK. I upgraded one node from 1.1.6 to 1.2.2 today. Despite some new >>>> problems that I had and I posted them in a separate email, this issue still >>>> exists but now it is only on 1.2.2 node. This means that the nodes running >>>> 1.1.6 see all other nodes including 1.2.2 as Up. Here is the ring and >>>> gossip from nodes with 1.1.6 for example. Bold denotes upgraded node: >>>> > >>>> > Address DC Rack Status State Load >>>> Effective-Ownership Token >>>> > >>>> 141784319550391026443072753098378663700 >>>> > XX.180.36 us-east 1b Up Normal 49.47 GB >>>> 25.00% 1808575600 >>>> > XX.231.121 us-east 1c Up Normal 47.08 GB >>>> 25.00% 7089215977519551322153637656637080005 >>>> > XX.177.177 us-east 1d Up Normal 33.64 GB >>>> 25.00% 14178431955039102644307275311465584410 >>>> > XX.7.148 us-east 1b Up Normal 41.27 GB >>>> 25.00% 42535295865117307932921825930779602030 >>>> > XX.20.9 us-east 1c Up Normal 38.51 GB >>>> 25.00% 49624511842636859255075463585608106435 >>>> > XX.86.255 us-east 1d Up Normal 34.78 GB >>>> 25.00% 56713727820156410577229101240436610840 >>>> > XX.63.230 us-east 1b Up Normal 38.11 GB >>>> 25.00% 85070591730234615865843651859750628460 >>>> > XX.163.36 us-east 1c Up Normal 44.25 GB >>>> 25.00% 92159807707754167187997289514579132865 >>>> > XX.31.234 us-east 1d Up Normal 44.66 GB >>>> 25.00% 99249023685273718510150927169407637270 >>>> > XX.132.169 us-east 1b Up Normal 44.2 GB >>>> 25.00% 127605887595351923798765477788721654890 >>>> > XX.71.63 us-east 1c Up Normal 38.74 GB >>>> 25.00% 134695103572871475120919115443550159295 >>>> > XX.197.209 us-east 1d Up Normal 41.5 GB >>>> 25.00% 141784319550391026443072753098378663700 >>>> > >>>> > /XX.71.63 >>>> > RACK:1c >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > LOAD:4.1598705272E10 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.194.92 >>>> > STATUS:NORMAL,134695103572871475120919115443550159295 >>>> > RPC_ADDRESS:XX.194.92 >>>> > RELEASE_VERSION:1.1.6 >>>> > /XX.86.255 >>>> > RACK:1d >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > LOAD:3.734334162E10 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.6.195 >>>> > STATUS:NORMAL,56713727820156410577229101240436610840 >>>> > RPC_ADDRESS:XX.6.195 >>>> > RELEASE_VERSION:1.1.6 >>>> > /XX.7.148 >>>> > RACK:1b >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > LOAD:4.4316975808E10 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.47.250 >>>> > STATUS:NORMAL,42535295865117307932921825930779602030 >>>> > RPC_ADDRESS:XX.47.250 >>>> > RELEASE_VERSION:1.1.6 >>>> > /XX.63.230 >>>> > RACK:1b >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > LOAD:4.0918593305E10 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.89.127 >>>> > STATUS:NORMAL,85070591730234615865843651859750628460 >>>> > RPC_ADDRESS:XX.89.127 >>>> > RELEASE_VERSION:1.1.6 >>>> > /XX.132.169 >>>> > RACK:1b >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > LOAD:4.745883458E10 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.94.161 >>>> > STATUS:NORMAL,127605887595351923798765477788721654890 >>>> > RPC_ADDRESS:XX.94.161 >>>> > RELEASE_VERSION:1.1.6 >>>> > /XX.180.36 >>>> > RACK:1b >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > LOAD:5.311963027E10 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.123.112 >>>> > STATUS:NORMAL,1808575600 >>>> > RPC_ADDRESS:XX.123.112 >>>> > RELEASE_VERSION:1.1.6 >>>> > /XX.163.36 >>>> > RACK:1c >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > LOAD:4.7516755022E10 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.163.180 >>>> > STATUS:NORMAL,92159807707754167187997289514579132865 >>>> > RPC_ADDRESS:XX.163.180 >>>> > RELEASE_VERSION:1.1.6 >>>> > /XX.31.234 >>>> > RACK:1d >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > LOAD:4.7954372912E10 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.192.159 >>>> > STATUS:NORMAL,99249023685273718510150927169407637270 >>>> > RPC_ADDRESS:XX.192.159 >>>> > RELEASE_VERSION:1.1.6 >>>> > /XX.197.209 >>>> > RACK:1d >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > LOAD:4.4558968005E10 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.66.205 >>>> > STATUS:NORMAL,141784319550391026443072753098378663700 >>>> > RPC_ADDRESS:XX.66.205 >>>> > RELEASE_VERSION:1.1.6 >>>> > /XX.177.177 >>>> > RACK:1d >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > LOAD:3.6115572697E10 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.65.57 >>>> > STATUS:NORMAL,14178431955039102644307275311465584410 >>>> > RPC_ADDRESS:XX.65.57 >>>> > RELEASE_VERSION:1.1.6 >>>> > /XX.20.9 >>>> > RACK:1c >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > LOAD:4.1352503882E10 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.33.229 >>>> > STATUS:NORMAL,49624511842636859255075463585608106435 >>>> > RPC_ADDRESS:XX.33.229 >>>> > RELEASE_VERSION:1.1.6 >>>> > /XX.231.121 >>>> > RACK:1c >>>> > SCHEMA:09487aa5-3380-33ab-b9a5-bcc8476066b0 >>>> > X4:9c765678-d058-4d85-a588-638ce10ff984 >>>> > X3:7 >>>> > DC:us-east >>>> > INTERNAL_IP:XX.223.241 >>>> > RPC_ADDRESS:XX.223.241 >>>> > RELEASE_VERSION:1.2.2 >>>> > >>>> > Now the nodetool on the 1.2.2 node shows all nodes as Down but >>>> itself. Gossipinfo looks gook though: >>>> > >>>> > Datacenter: us-east >>>> > ========== >>>> > Replicas: 3 >>>> > >>>> > Address Rack Status State Load Owns >>>> Token >>>> > >>>> 56713727820156410577229101240436610840 >>>> > XX.132.169 1b Down Normal 44.2 GB 25.00% >>>> 127605887595351923798765477788721654890 >>>> > XX.7.148 1b Down Normal 41.27 GB 25.00% >>>> 42535295865117307932921825930779602030 >>>> > XX.180.36 1b Down Normal 49.47 GB 25.00% >>>> 1808575600 >>>> > XX.63.230 1b Down Normal 38.11 GB 25.00% >>>> 85070591730234615865843651859750628460 >>>> > XX.231.121 1c Up Normal 47.25 GB 25.00% >>>> 7089215977519551322153637656637080005 >>>> > XX.71.63 1c Down Normal 38.74 GB 25.00% >>>> 134695103572871475120919115443550159295 >>>> > XX.177.177 1d Down Normal 33.64 GB 25.00% >>>> 14178431955039102644307275311465584410 >>>> > XX.31.234 1d Down Normal 44.66 GB 25.00% >>>> 99249023685273718510150927169407637270 >>>> > XX.20.9 1c Down Normal 38.51 GB 25.00% >>>> 49624511842636859255075463585608106435 >>>> > XX.163.36 1c Down Normal 44.25 GB 25.00% >>>> 92159807707754167187997289514579132865 >>>> > XX.197.209 1d Down Normal 41.5 GB 25.00% >>>> 141784319550391026443072753098378663700 >>>> > XX.86.255 1d Down Normal 34.78 GB 25.00% >>>> 56713727820156410577229101240436610840 >>>> > >>>> > /XX.71.63 >>>> > RACK:1c >>>> > RPC_ADDRESS:XX.194.92 >>>> > RELEASE_VERSION:1.1.6 >>>> > INTERNAL_IP:XX.194.92 >>>> > STATUS:NORMAL,134695103572871475120919115443550159295 >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > DC:us-east >>>> > LOAD:4.1598705272E10 >>>> > /XX.86.255 >>>> > RACK:1d >>>> > RPC_ADDRESS:XX.6.195 >>>> > RELEASE_VERSION:1.1.6 >>>> > INTERNAL_IP:XX.6.195 >>>> > STATUS:NORMAL,56713727820156410577229101240436610840 >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > DC:us-east >>>> > LOAD:3.7343205002E10 >>>> > /XX.7.148 >>>> > RACK:1b >>>> > RPC_ADDRESS:XX.47.250 >>>> > RELEASE_VERSION:1.1.6 >>>> > INTERNAL_IP:XX.47.250 >>>> > STATUS:NORMAL,42535295865117307932921825930779602030 >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > DC:us-east >>>> > LOAD:4.4316975808E10 >>>> > /XX.63.230 >>>> > RACK:1b >>>> > RPC_ADDRESS:XX.89.127 >>>> > RELEASE_VERSION:1.1.6 >>>> > INTERNAL_IP:XX.89.127 >>>> > STATUS:NORMAL,85070591730234615865843651859750628460 >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > DC:us-east >>>> > LOAD:4.0918456687E10 >>>> > /XX.132.169 >>>> > RACK:1b >>>> > RPC_ADDRESS:XX.94.161 >>>> > RELEASE_VERSION:1.1.6 >>>> > INTERNAL_IP:XX.94.161 >>>> > STATUS:NORMAL,127605887595351923798765477788721654890 >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > DC:us-east >>>> > LOAD:4.745883458E10 >>>> > /XX.180.36 >>>> > RACK:1b >>>> > RPC_ADDRESS:XX.123.112 >>>> > RELEASE_VERSION:1.1.6 >>>> > INTERNAL_IP:XX.123.112 >>>> > STATUS:NORMAL,1808575600 >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > DC:us-east >>>> > LOAD:5.311963027E10 >>>> > /XX.163.36 >>>> > RACK:1c >>>> > RPC_ADDRESS:XX.163.180 >>>> > RELEASE_VERSION:1.1.6 >>>> > INTERNAL_IP:XX.163.180 >>>> > STATUS:NORMAL,92159807707754167187997289514579132865 >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > DC:us-east >>>> > LOAD:4.7516755022E10 >>>> > /XX.31.234 >>>> > RACK:1d >>>> > RPC_ADDRESS:XX.192.159 >>>> > RELEASE_VERSION:1.1.6 >>>> > INTERNAL_IP:XX.192.159 >>>> > STATUS:NORMAL,99249023685273718510150927169407637270 >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > DC:us-east >>>> > LOAD:4.7954372912E10 >>>> > /XX.197.209 >>>> > RACK:1d >>>> > RPC_ADDRESS:XX.66.205 >>>> > RELEASE_VERSION:1.1.6 >>>> > INTERNAL_IP:XX.66.205 >>>> > STATUS:NORMAL,141784319550391026443072753098378663700 >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > DC:us-east >>>> > LOAD:4.4559013211E10 >>>> > /XX.177.177 >>>> > RACK:1d >>>> > RPC_ADDRESS:XX.65.57 >>>> > RELEASE_VERSION:1.1.6 >>>> > INTERNAL_IP:XX.65.57 >>>> > STATUS:NORMAL,14178431955039102644307275311465584410 >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > DC:us-east >>>> > LOAD:3.6115572697E10 >>>> > /XX.20.9 >>>> > RACK:1c >>>> > RPC_ADDRESS:XX.33.229 >>>> > RELEASE_VERSION:1.1.6 >>>> > INTERNAL_IP:XX.33.229 >>>> > STATUS:NORMAL,49624511842636859255075463585608106435 >>>> > SCHEMA:99dce53b-487e-3e7b-a958-a1cc48d9f575 >>>> > DC:us-east >>>> > LOAD:4.1352367264E10 >>>> > /XX.231.121 >>>> > HOST_ID:9c765678-d058-4d85-a588-638ce10ff984 >>>> > RACK:1c >>>> > RPC_ADDRESS:XX.223.241 >>>> > RELEASE_VERSION:1.2.2 >>>> > INTERNAL_IP:XX.223.241 >>>> > STATUS:NORMAL,7089215977519551322153637656637080005 >>>> > NET_VERSION:7 >>>> > SCHEMA:8b8948f5-d56f-3a96-8005-b9452e42cd67 >>>> > SEVERITY:0.0 >>>> > DC:us-east >>>> > LOAD:5.0710624207E10 >>>> > >>>> > Is this just a display bug in nodetool or this upgraded node really >>>> sees the other ones as dead? >>>> > >>>> > -Arya >>>> > >>>> > >>>> > On Mon, Feb 25, 2013 at 8:10 PM, Arya Goudarzi <gouda...@gmail.com> >>>> wrote: >>>> > No I did not look at nodetool gossipinfo but from the ring on both >>>> pre-upgrade and post upgrade nodes to 1.2.1, what I observed was the >>>> described behavior. >>>> > >>>> > >>>> > On Sat, Feb 23, 2013 at 1:26 AM, Michael Kjellman < >>>> mkjell...@barracuda.com> wrote: >>>> > This was a bug with 1.2.0 but resolved in 1.2.1. Did you take a >>>> capture of nodetool gossipinfo and nodetool ring by chance? >>>> > >>>> > On Feb 23, 2013, at 12:26 AM, "Arya Goudarzi" <gouda...@gmail.com> >>>> wrote: >>>> > >>>> > > Hi C* users, >>>> > > >>>> > > I just upgrade a 12 node test cluster from 1.1.6 to 1.2.1. What I >>>> noticed from nodetool ring was that the new upgraded nodes only saw each >>>> other as Normal and the rest of the cluster which was on 1.1.6 as Down. >>>> Vise versa was true for the nodes running 1.1.6. They saw each other as >>>> Normal but the 1.2.1 nodes as down. I don't see a note in upgrade docs that >>>> this would be an issue. Has anyone else observed this problem? >>>> > > >>>> > > In the debug logs I could see messages saying attempting to connect >>>> to node IP and then saying it is down. >>>> > > >>>> > > Cheers, >>>> > > -Arya >>>> > >>>> > Copy, by Barracuda, helps you store, protect, and share all your >>>> amazing >>>> > >>>> > things. Start today: www.copy.com. >>>> > >>>> > >>>> >>>> >>> >> >> >