Hello ,

Check and compare everything parameters

1. Java version should ideally match across all nodes in the cluster
2. Check if port 7000 is open between the nodes. Use telnet or nc commands
3. You must see some clues in system logs, why the gossip is failing.

Do confirm on the above things.

Thanks


On Tue, 26 Nov, 2019, 2:50 AM Paul Mena, <pm...@whoi.edu> wrote:

> NTP was restarted on the Cassandra nodes, but unfortunately I’m still
> getting the same result: the restarted node does not appear to be rejoining
> the cluster.
>
>
>
> Here’s another data point: “nodetool gossipinfo”, when run from the
> restarted node (“node001”) shows a status of “normal”:
>
>
>
> user@node001=> nodetool -u gossipinfo
>
> /192.168.187.121
>
>   generation:1574364410
>
>   heartbeat:209150
>
>   NET_VERSION:8
>
>   RACK:rack1
>
>   STATUS:NORMAL,-104847506331695918
>
>   RELEASE_VERSION:2.1.9
>
>   SEVERITY:0.0
>
>   LOAD:5.78684155614E11
>
>   HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
>
>   SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
>
>   DC:datacenter1
>
>   RPC_ADDRESS:192.168.185.121
>
>
>
> When run from one of the other nodes, however, node001’s status is shown
> as “shutdown”:
>
>
>
> user@node002=> nodetool gossipinfo
>
> /192.168.187.121
>
>   generation:1491825076
>
>   heartbeat:2147483647
>
>   STATUS:shutdown,true
>
>   RACK:rack1
>
>   NET_VERSION:8
>
>   LOAD:5.78679987693E11
>
>   RELEASE_VERSION:2.1.9
>
>   DC:datacenter1
>
>   SCHEMA:fd2dcb4b-ca62-30df-b8f2-d3fd774f2801
>
>   HOST_ID:c99cf581-f4ae-4aa9-ab37-1a114ab2429b
>
>   RPC_ADDRESS:192.168.185.121
>
>   SEVERITY:0.0
>
>
>
>
>
> *Paul Mena*
>
> Senior Application Administrator
>
> WHOI - Information Services
>
> 508-289-3539
>
>
>
> *From:* Paul Mena
> *Sent:* Monday, November 25, 2019 9:29 AM
> *To:* user@cassandra.apache.org
> *Subject:* RE: Cassandra is not showing a node up hours after restart
>
>
>
> I’ve just discovered that NTP is not running on any of these Cassandra
> nodes, and that the timestamps are all over the map. Could this be causing
> my issue?
>
>
>
> user@remote=> ansible pre-prod-cassandra -a date
>
> node001.intra.myorg.org | CHANGED | rc=0 >>
>
> Mon Nov 25 13:58:17 UTC 2019
>
>
>
> node004.intra.myorg.org | CHANGED | rc=0 >>
>
> Mon Nov 25 14:07:20 UTC 2019
>
>
>
> node003.intra.myorg.org | CHANGED | rc=0 >>
>
> Mon Nov 25 13:57:06 UTC 2019
>
>
>
> node001.intra.myorg.org | CHANGED | rc=0 >>
>
> Mon Nov 25 14:07:22 UTC 2019
>
>
>
> *Paul Mena*
>
> Senior Application Administrator
>
> WHOI - Information Services
>
> 508-289-3539
>
>
>
> *From:* Inquistive allen <inquial...@gmail.com>
> *Sent:* Monday, November 25, 2019 2:46 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra is not showing a node up hours after restart
>
>
>
> Hello team,
>
>
>
> Just to add on to the discussion, one may run,
>
> Nodetool disablebinary followed by a nodetool disablethrift followed by
> nodetool drain.
>
> Nodetool drain also does the work of nodetool flush+ declaring in the
> cluster that I'm down and not accepting traffic.
>
>
>
> Thanks
>
>
>
>
>
> On Mon, 25 Nov, 2019, 12:55 AM Surbhi Gupta, <surbhi.gupt...@gmail.com>
> wrote:
>
> Before Cassandra shutdown, nodetool drain should be executed first. As
> soon as you do nodetool drain, others node will see this node down and no
> new traffic will come to this node.
>
> I generally gives 10 seconds gap between nodetool drain and Cassandra
> stop.
>
>
>
> On Sun, Nov 24, 2019 at 9:52 AM Paul Mena <pm...@whoi.edu> wrote:
>
> Thank you for the replies. I had made no changes to the config before the
> rolling restart.
>
>
>
> I can try another restart but was wondering if I should do it differently.
> I had simply done "service cassandra stop" followed by "service cassandra
> start".  Since then I've seen some suggestions to proceed the shutdown with
> "nodetool disablegossip" and/or "nodetool drain". Are these commands
> advisable? Are any other commands recommended either before the shutdown or
> after the startup?
>
>
>
> Thanks again!
>
>
>
> Paul
> ------------------------------
>
> *From:* Naman Gupta <naman.gu...@girnarsoft.com>
> *Sent:* Sunday, November 24, 2019 11:18:14 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Cassandra is not showing a node up hours after restart
>
>
>
> Did you change the name of datacenter or any other config changes before
> the rolling restart?
>
>
>
> On Sun, Nov 24, 2019 at 8:49 PM Paul Mena <pm...@whoi.edu> wrote:
>
> I am in the process of doing a rolling restart on a 4-node cluster running
> Cassandra 2.1.9. I stopped and started Cassandra on node 1 via "service
> cassandra stop/start", and noted nothing unusual in either system.log or
> cassandra.log. Doing a "nodetool status" from node 1 shows all four nodes
> up:
>
>
>
> user@node001=> nodetool status
>
> Datacenter: datacenter1
>
> =======================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address          Load       Tokens  Owns    Host ID                       
>         Rack
>
> UN  192.168.187.121  538.95 GB  256     ?       
> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>
> UN  192.168.187.122  630.72 GB  256     ?       
> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>
> UN  192.168.187.123  572.73 GB  256     ?       
> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>
> UN  192.168.187.124  625.05 GB  256     ?       
> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>
> But doing the same command from any other of the 3 nodes shows node 1
> still down.
>
>
>
> user@node002=> nodetool status
>
> Datacenter: datacenter1
>
> =======================
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address          Load       Tokens  Owns    Host ID                       
>         Rack
>
> DN  192.168.187.121  538.94 GB  256     ?       
> c99cf581-f4ae-4aa9-ab37-1a114ab2429b  rack1
>
> UN  192.168.187.122  630.72 GB  256     ?       
> bfa07f47-7e37-42b4-9c0b-024b3c02e93f  rack1
>
> UN  192.168.187.123  572.73 GB  256     ?       
> 273df9f3-e496-4c65-a1f2-325ed288a992  rack1
>
> UN  192.168.187.124  625.04 GB  256     ?       
> b8639cf1-5413-4ece-b882-2161bbb8a9c3  rack1
>
> Is there something I can do to remedy this current situation - so that I
> can continue with the rolling restart?
>
>
>
>

Reply via email to