Re: Cassandra is not showing a node up hours after restart

Hossein Ghiyasi Mehr Sun, 08 Dec 2019 02:47:44 -0800

Which version of Cassandra did you install? deb or tar?
If it's deb, its script should be used for start/stop.
If it's tar, kill pid of cassandra to stop and use bin/cassandra to start.


Stop doesn't need any other actions: drain, disable gossip & etc.

Where do you use Cassandra?
*-------------------------------------------------------*
*VafaTech <http://www.vafatech.com> : A Total Solution for Data Gathering &
Analysis*
*-------------------------------------------------------*


On Fri, Dec 6, 2019 at 11:20 PM Paul Mena <pm...@whoi.edu> wrote:

> As we are still without a functional Cassandra cluster in our development
> environment, I thought I’d try restarting the same node (one of 4 in the
> cluster) with the following command:
>
>
>
> ip=$(cat /etc/hostname); nodetool disablethrift && nodetool disablebinary
> && sleep 5 && nodetool disablegossip && nodetool drain && sleep 10 && sudo
> service cassandra restart && until echo "SELECT * FROM system.peers LIMIT
> 1;" | cqlsh $ip > /dev/null 2>&1; do echo "Node $ip is still DOWN"; sleep
> 10; done && echo "Node $ip is now UP"
>
>
>
> The above command returned “Node is now UP” after about 40 seconds,
> confirmed on “node001” via “nodetool status”:
>
>
>
> user@node001=> nodetool status
>
> Datacenter: datacenter1
>
> =======================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address          Load       Tokens  Owns    Host
> ID                               Rack
>
> UN  192.168.187.121  539.43 GB  256     ?
> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>
> UN  192.168.187.122  633.92 GB  256     ?
> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>
> UN  192.168.187.123  576.31 GB  256     ?
> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>
> UN  192.168.187.124  628.5 GB   256     ?
> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>
>
>
> As was the case before, running “nodetool status” on any of the other
> nodes shows that “node001” is still down:
>
>
>
> user@node002=> nodetool status
>
> Datacenter: datacenter1
>
> =======================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address          Load       Tokens  Owns    Host
> ID                               Rack
>
> DN  192.168.187.121  538.94 GB  256     ?
> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>
> UN  192.168.187.122  634.04 GB  256     ?
> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>
> UN  192.168.187.123  576.42 GB  256     ?
> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>
> UN  192.168.187.124  628.56 GB  256     ?
> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>
>
>
> Is it inadvisable to continue with the rolling restart?
>
>
>
> *Paul Mena*
>
> Senior Application Administrator
>
> WHOI - Information Services
>
> 508-289-3539
>
>
>
> *From:* Shalom Sagges <shalomsag...@gmail.com>
> *Sent:* Tuesday, November 26, 2019 12:59 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra is not showing a node up hours after restart
>
>
>
> Hi Paul,
>
>
>
> From the gossipinfo output, it looks like the node's IP address and
> rpc_address are different.
>
> /192.168.*187*.121 vs RPC_ADDRESS:192.168.*185*.121
>
> You can also see that there's a schema disagreement between nodes, e.g.
> schema_id on node001 is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801 and on node002
> it is fd2dcb4b-ca62-30df-b8f2-d3fd774f2801.
>
> You can run nodetool describecluster to see it as well.
>
> So I suggest to change the rpc_address to the ip_address of the node or
> set it to 0.0.0.0 and it should resolve the issue.
>
>
>
> Hope this helps!
>
>
>
>
>
> On Tue, Nov 26, 2019 at 4:05 AM Inquistive allen <inquial...@gmail.com>
> wrote:
>
> Hello ,
>
>
>
> Check and compare everything parameters
>
>
>
> 1. Java version should ideally match across all nodes in the cluster
>
> 2. Check if port 7000 is open between the nodes. Use telnet or nc commands
>
> 3. You must see some clues in system logs, why the gossip is failing.
>
>
>
> Do confirm on the above things.
>
>
>
> Thanks
>
>
>
>
>
> On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, <pm...@whoi.edu> wrote:
>
> NTP was restarted on the Cassandra nodes, but unfortunately I’m still
> getting the same result: the restarted node does not appear to be rejoining
> the cluster.
>
>
>
> Here’s another data point: “nodetool gossipinfo”, when run from the
> restarted node (“node001”) shows a status of “normal”:
>
>
>
> user@node001=> nodetool -u gossipinfo
>
> /192.168.187.121
>
>   generation:1574364410
>
>   heartbeat:209150
>
>   NET_VERSION:8
>
>   RACK:rack1
>
>   STATUS:NORMAL,-104847506331695918
>
>   RELEASE_VERSION:2.1.9
>
>   SEVERITY:0.0
>
>   LOAD:5.78684155614E11
>
>   HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
>
>   SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
>
>   DC:datacenter1
>
>   RPC_ADDRESS:192.168.185.121
>
>
>
> When run from one of the other nodes, however, node001’s status is shown
> as “shutdown”:
>
>
>
> user@node002=> nodetool gossipinfo
>
> /192.168.187.121
>
>   generation:1491825076
>
>   heartbeat:2147483647
>
>   STATUS:shutdown,true
>
>   RACK:rack1
>
>   NET_VERSION:8
>
>   LOAD:5.78679987693E11
>
>   RELEASE_VERSION:2.1.9
>
>   DC:datacenter1
>
>   SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
>
>   HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
>
>   RPC_ADDRESS:192.168.185.121
>
>   SEVERITY:0.0
>
>
>
>
>
> *Paul Mena*
>
> Senior Application Administrator
>
> WHOI - Information Services
>
> 508-289-3539
>
>
>
> *From:* Paul Mena
> *Sent:* Monday, November 25, 2019 9:29 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: Cassandra is not showing a node up hours after restart
>
>
>
> I’ve just discovered that NTP is not running on any of these Cassandra
> nodes, and that the timestamps are all over the map. Could this be causing
> my issue?
>
>
>
> user@remote=> ansible pre-prod-cassandra -a date
>
> node001.intra.myorg.org | CHANGED | rc=0 >>
>
> Mon Nov 25 13:58:17 UTC 2019
>
>
>
> node004.intra.myorg.org | CHANGED | rc=0 >>
>
> Mon Nov 25 14:07:20 UTC 2019
>
>
>
> node003.intra.myorg.org | CHANGED | rc=0 >>
>
> Mon Nov 25 13:57:06 UTC 2019
>
>
>
> node001.intra.myorg.org | CHANGED | rc=0 >>
>
> Mon Nov 25 14:07:22 UTC 2019
>
>
>
> *Paul Mena*
>
> Senior Application Administrator
>
> WHOI - Information Services
>
> 508-289-3539
>
>
>
> *From:* Inquistive allen <inquial...@gmail.com>
> *Sent:* Monday, November 25, 2019 2:46 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra is not showing a node up hours after restart
>
>
>
> Hello team,
>
>
>
> Just to add on to the discussion, one may run,
>
> Nodetool disablebinary followed by a nodetool disablethrift followed by
> nodetool drain.
>
> Nodetool drain also does the work of nodetool flush+ declaring in the
> cluster that I'm down and not accepting traffic.
>
>
>
> Thanks
>
>
>
>
>
> On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, <surbhi.gupt...@gmail.com>
> wrote:
>
> Before Cassandra shutdown, nodetool drain should be executed first. As
> soon as you do nodetool drain, others node will see this node down and no
> new traffic will come to this node.
>
> I generally gives 10 seconds gap between nodetool drain and Cassandra
> stop.
>
>
>
> On Sun, Nov 24, 2019 at 9:52 AM Paul Mena <pm...@whoi.edu> wrote:
>
> Thank you for the replies. I had made no changes to the config before the
> rolling restart.
>
>
>
> I can try another restart but was wondering if I should do it differently.
> I had simply done "service cassandra stop" followed by "service cassandra
> start".  Since then I've seen some suggestions to proceed the shutdown with
> "nodetool disablegossip" and/or "nodetool drain". Are these commands
> advisable? Are any other commands recommended either before the shutdown or
> after the startup?
>
>
>
> Thanks again!
>
>
>
> Paul
> ------------------------------
>
> *From:* Naman Gupta <naman.gu...@girnarsoft.com>
> *Sent:* Sunday, November 24, 2019 11:18:14 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra is not showing a node up hours after restart
>
>
>
> Did you change the name of datacenter or any other config changes before
> the rolling restart?
>
>
>
> On Sun, Nov 24, 2019 at 8:49 PM Paul Mena <pm...@whoi.edu> wrote:
>
> I am in the process of doing a rolling restart on a 4-node cluster running
> Cassandra 2.1.9. I stopped and started Cassandra on node 1 via "service
> cassandra stop/start", and noted nothing unusual in either system.log or
> cassandra.log. Doing a "nodetool status" from node 1 shows all four nodes
> up:
>
>
>
> user@node001=> nodetool status
>
> Datacenter: datacenter1
>
> =======================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address          Load       Tokens  Owns    Host ID                       
>         Rack
>
> UN  192.168.187.121  538.95 GB  256     ?       
> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>
> UN  192.168.187.122  630.72 GB  256     ?       
> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>
> UN  192.168.187.123  572.73 GB  256     ?       
> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>
> UN  192.168.187.124  625.05 GB  256     ?       
> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>
> But doing the same command from any other of the 3 nodes shows node 1
> still down.
>
>
>
> user@node002=> nodetool status
>
> Datacenter: datacenter1
>
> =======================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address          Load       Tokens  Owns    Host ID                       
>         Rack
>
> DN  192.168.187.121  538.94 GB  256     ?       
> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>
> UN  192.168.187.122  630.72 GB  256     ?       
> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>
> UN  192.168.187.123  572.73 GB  256     ?       
> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>
> UN  192.168.187.124  625.04 GB  256     ?       
> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>
> Is there something I can do to remedy this current situation - so that I
> can continue with the rolling restart?
>
>
>
>

Re: Cassandra is not showing a node up hours after restart

Reply via email to