I can reproduce the issue. I did drain Cassandra node then stop and started Cassandra instance . Cassandra instance comes up but other nodes will be in DN state around 10 minutes.
I don't see error in the systemlog DN xx.xx.xx.59 420.85 MiB 256 48.2% id 2 UN xx.xx.xx.30 432.14 MiB 256 50.0% id 0 UN xx.xx.xx.79 447.33 MiB 256 51.1% id 4 DN xx.xx.xx.144 452.59 MiB 256 51.6% id 1 DN xx.xx.xx.19 431.7 MiB 256 50.1% id 5 UN xx.xx.xx.6 421.79 MiB 256 48.9% when i do nodetool status 3 nodes still showing down. and i dont see errors in system.log and after 10 mins it shows the other node is up as well. INFO [HANDSHAKE-/10.72.100.156] 2019-11-05 15:05:09,133 OutboundTcpConnection.java:561 - Handshaking version with /stopandstarted node INFO [RequestResponseStage-7] 2019-11-05 15:16:27,166 Gossiper.java:1019 - InetAddress /nodewhichitwasshowing down is now UP what is causing delay for 10mins to be able to say that node is reachable On Wed, Oct 30, 2019, 8:37 AM Rahul Reddy <rahulreddy1...@gmail.com> wrote: > And also aws ec2 stop and start comes with new instance with same ip and > all our file systems are in ebs mounted fine. Does coming new instance > with same ip cause any gossip issues? > > On Tue, Oct 29, 2019, 6:16 PM Rahul Reddy <rahulreddy1...@gmail.com> > wrote: > >> Thanks Alex. We have 6 nodes in each DC with RF=3 with CL local qourum . >> and we stopped and started only one instance at a time . Tough nodetool >> status says all nodes UN and system.log says canssandra started and started >> listening . Jmx explrter shows instance stayed down longer how do we >> determine what caused the Cassandra unavialbe though log says its stared >> and listening ? >> >> On Tue, Oct 29, 2019, 4:44 PM Oleksandr Shulgin < >> oleksandr.shul...@zalando.de> wrote: >> >>> On Tue, Oct 29, 2019 at 9:34 PM Rahul Reddy <rahulreddy1...@gmail.com> >>> wrote: >>> >>>> >>>> We have our infrastructure on aws and we use ebs storage . And aws was >>>> retiring on of the node. Since our storage was persistent we did nodetool >>>> drain and stopped and start the instance . This caused 500 errors in the >>>> service. We have local_quorum and rf=3 why does stopping one instance cause >>>> application to have issues? >>>> >>> >>> Can you still look up what was the underlying error from Cassandra >>> driver in the application logs? Was it request timeout or not enough >>> replicas? >>> >>> For example, if you only had 3 Cassandra nodes, restarting one of them >>> reduces your cluster capacity by 33% temporarily. >>> >>> Cheers, >>> -- >>> Alex >>> >>>