Hello Paul, The behavior looks similar to what we experienced and reported. https://issues.apache.org/jira/browse/CASSANDRA-15138
In our testing, "service cassandra stop" makes a cluster sometimes in a wrong state. How about doing kill -9 ? Thanks, Hiro On Sun, Dec 8, 2019 at 7:47 PM Hossein Ghiyasi Mehr <ghiyasim...@gmail.com> wrote: > > Which version of Cassandra did you install? deb or tar? > If it's deb, its script should be used for start/stop. > If it's tar, kill pid of cassandra to stop and use bin/cassandra to start. > > Stop doesn't need any other actions: drain, disable gossip & etc. > > Where do you use Cassandra? > ------------------------------------------------------- > VafaTech : A Total Solution for Data Gathering & Analysis > ------------------------------------------------------- > > > On Fri, Dec 6, 2019 at 11:20 PM Paul Mena <pm...@whoi.edu> wrote: >> >> As we are still without a functional Cassandra cluster in our development >> environment, I thought I’d try restarting the same node (one of 4 in the >> cluster) with the following command: >> >> >> >> ip=$(cat /etc/hostname); nodetool disablethrift && nodetool disablebinary && >> sleep 5 && nodetool disablegossip && nodetool drain && sleep 10 && sudo >> service cassandra restart && until echo "SELECT * FROM system.peers LIMIT >> 1;" | cqlsh $ip > /dev/null 2>&1; do echo "Node $ip is still DOWN"; sleep >> 10; done && echo "Node $ip is now UP" >> >> >> >> The above command returned “Node is now UP” after about 40 seconds, >> confirmed on “node001” via “nodetool status”: >> >> >> >> user@node001=> nodetool status >> >> Datacenter: datacenter1 >> >> ======================= >> >> Status=Up/Down >> >> |/ State=Normal/Leaving/Joining/Moving >> >> -- Address Load Tokens Owns Host ID >> Rack >> >> UN 192.168.187.121 539.43 GB 256 ? >> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >> >> UN 192.168.187.122 633.92 GB 256 ? >> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >> >> UN 192.168.187.123 576.31 GB 256 ? >> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >> >> UN 192.168.187.124 628.5 GB 256 ? >> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >> >> >> >> As was the case before, running “nodetool status” on any of the other nodes >> shows that “node001” is still down: >> >> >> >> user@node002=> nodetool status >> >> Datacenter: datacenter1 >> >> ======================= >> >> Status=Up/Down >> >> |/ State=Normal/Leaving/Joining/Moving >> >> -- Address Load Tokens Owns Host ID >> Rack >> >> DN 192.168.187.121 538.94 GB 256 ? >> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >> >> UN 192.168.187.122 634.04 GB 256 ? >> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >> >> UN 192.168.187.123 576.42 GB 256 ? >> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >> >> UN 192.168.187.124 628.56 GB 256 ? >> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >> >> >> >> Is it inadvisable to continue with the rolling restart? >> >> >> >> Paul Mena >> >> Senior Application Administrator >> >> WHOI - Information Services >> >> 508-289-3539 >> >> >> >> From: Shalom Sagges <shalomsag...@gmail.com> >> Sent: Tuesday, November 26, 2019 12:59 AM >> To: user@cassandra.apache.org >> Subject: Re: Cassandra is not showing a node up hours after restart >> >> >> >> Hi Paul, >> >> >> >> From the gossipinfo output, it looks like the node's IP address and >> rpc_address are different. >> >> /192.168.187.121 vs RPC_ADDRESS:192.168.185.121 >> >> You can also see that there's a schema disagreement between nodes, e.g. >> schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002 >> it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801. >> >> You can run nodetool describecluster to see it as well. >> >> So I suggest to change the rpc_address to the ip_address of the node or set >> it to 0.0.0.0 and it should resolve the issue. >> >> >> >> Hope this helps! >> >> >> >> >> >> On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen <inquial...@gmail.com> >> wrote: >> >> Hello , >> >> >> >> Check and compare everything parameters >> >> >> >> 1. Java version should ideally match across all nodes in the cluster >> >> 2. Check if port 7000 is open between the nodes. Use telnet or nc commands >> >> 3. You must see some clues in system logs, why the gossip is failing. >> >> >> >> Do confirm on the above things. >> >> >> >> Thanks >> >> >> >> >> >> On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, <pm...@whoi.edu> wrote: >> >> NTP was restarted on the Cassandra nodes, but unfortunately I’m still >> getting the same result: the restarted node does not appear to be rejoining >> the cluster. >> >> >> >> Here’s another data point: “nodetool gossipinfo”, when run from the >> restarted node (“node001”) shows a status of “normal”: >> >> >> >> user@node001=> nodetool -u gossipinfo >> >> /192.168.187.121 >> >> generation:1574364410 >> >> heartbeat:209150 >> >> NET_VERSION:8 >> >> RACK:rack1 >> >> STATUS:NORMAL,-104847506331695918 >> >> RELEASE_VERSION:2.1.9 >> >> SEVERITY:0.0 >> >> LOAD:5.78684155614E11 >> >> HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b >> >> SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 >> >> DC:datacenter1 >> >> RPC_ADDRESS:192.168.185.121 >> >> >> >> When run from one of the other nodes, however, node001’s status is shown as >> “shutdown”: >> >> >> >> user@node002=> nodetool gossipinfo >> >> /192.168.187.121 >> >> generation:1491825076 >> >> heartbeat:2147483647 >> >> STATUS:shutdown,true >> >> RACK:rack1 >> >> NET_VERSION:8 >> >> LOAD:5.78679987693E11 >> >> RELEASE_VERSION:2.1.9 >> >> DC:datacenter1 >> >> SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 >> >> HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b >> >> RPC_ADDRESS:192.168.185.121 >> >> SEVERITY:0.0 >> >> >> >> >> >> Paul Mena >> >> Senior Application Administrator >> >> WHOI - Information Services >> >> 508-289-3539 >> >> >> >> From: Paul Mena >> Sent: Monday, November 25, 2019 9:29 AM >> To: user@cassandra.apache.org >> Subject: RE: Cassandra is not showing a node up hours after restart >> >> >> >> I’ve just discovered that NTP is not running on any of these Cassandra >> nodes, and that the timestamps are all over the map. Could this be causing >> my issue? >> >> >> >> user@remote=> ansible pre-prod-cassandra -a date >> >> node001.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 13:58:17 UTC 2019 >> >> >> >> node004.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 14:07:20 UTC 2019 >> >> >> >> node003.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 13:57:06 UTC 2019 >> >> >> >> node001.intra.myorg.org | CHANGED | rc=0 >> >> >> Mon Nov 25 14:07:22 UTC 2019 >> >> >> >> Paul Mena >> >> Senior Application Administrator >> >> WHOI - Information Services >> >> 508-289-3539 >> >> >> >> From: Inquistive allen <inquial...@gmail.com> >> Sent: Monday, November 25, 2019 2:46 AM >> To: user@cassandra.apache.org >> Subject: Re: Cassandra is not showing a node up hours after restart >> >> >> >> Hello team, >> >> >> >> Just to add on to the discussion, one may run, >> >> Nodetool disablebinary followed by a nodetool disablethrift followed by >> nodetool drain. >> >> Nodetool drain also does the work of nodetool flush+ declaring in the >> cluster that I'm down and not accepting traffic. >> >> >> >> Thanks >> >> >> >> >> >> On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, <surbhi.gupt...@gmail.com> >> wrote: >> >> Before Cassandra shutdown, nodetool drain should be executed first. As soon >> as you do nodetool drain, others node will see this node down and no new >> traffic will come to this node. >> >> I generally gives 10 seconds gap between nodetool drain and Cassandra stop. >> >> >> >> On Sun, Nov 24, 2019 at 9:52 AM Paul Mena <pm...@whoi.edu> wrote: >> >> Thank you for the replies. I had made no changes to the config before the >> rolling restart. >> >> >> >> I can try another restart but was wondering if I should do it differently. I >> had simply done "service cassandra stop" followed by "service cassandra >> start". Since then I've seen some suggestions to proceed the shutdown with >> "nodetool disablegossip" and/or "nodetool drain". Are these commands >> advisable? Are any other commands recommended either before the shutdown or >> after the startup? >> >> >> >> Thanks again! >> >> >> >> Paul >> >> ________________________________ >> >> From: Naman Gupta <naman.gu...@girnarsoft.com> >> Sent: Sunday, November 24, 2019 11:18:14 AM >> To: user@cassandra.apache.org >> Subject: Re: Cassandra is not showing a node up hours after restart >> >> >> >> Did you change the name of datacenter or any other config changes before the >> rolling restart? >> >> >> >> On Sun, Nov 24, 2019 at 8:49 PM Paul Mena <pm...@whoi.edu> wrote: >> >> I am in the process of doing a rolling restart on a 4-node cluster running >> Cassandra 2.1.9. I stopped and started Cassandra on node 1 via "service >> cassandra stop/start", and noted nothing unusual in either system.log or >> cassandra.log. Doing a "nodetool status" from node 1 shows all four nodes up: >> >> >> >> user@node001=> nodetool status >> >> Datacenter: datacenter1 >> >> ======================= >> >> Status=Up/Down >> >> |/ State=Normal/Leaving/Joining/Moving >> >> -- Address Load Tokens Owns Host ID >> Rack >> >> UN 192.168.187.121 538.95 GB 256 ? >> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >> >> UN 192.168.187.122 630.72 GB 256 ? >> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >> >> UN 192.168.187.123 572.73 GB 256 ? >> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >> >> UN 192.168.187.124 625.05 GB 256 ? >> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >> >> But doing the same command from any other of the 3 nodes shows node 1 still >> down. >> >> >> >> user@node002=> nodetool status >> >> Datacenter: datacenter1 >> >> ======================= >> >> Status=Up/Down >> >> |/ State=Normal/Leaving/Joining/Moving >> >> -- Address Load Tokens Owns Host ID >> Rack >> >> DN 192.168.187.121 538.94 GB 256 ? >> c99cf581-f4ae-4aa9-ab37-1a114ab2429b rack1 >> >> UN 192.168.187.122 630.72 GB 256 ? >> bfa07f47-7e37-42b4-9c0b-024b3c02e93f rack1 >> >> UN 192.168.187.123 572.73 GB 256 ? >> 273df9f3-e496-4c65-a1f2-325ed288a992 rack1 >> >> UN 192.168.187.124 625.04 GB 256 ? >> b8639cf1-5413-4ece-b882-2161bbb8a9c3 rack1 >> >> Is there something I can do to remedy this current situation - so that I can >> continue with the rolling restart? >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org