Thanks Ben, Before stoping the ec2 I did run nodetool drain .so i ruled it out and system.log also doesn't show commitlogs being applied.
On Tue, Nov 5, 2019, 7:51 PM Ben Slater <ben.sla...@instaclustr.com> wrote: > The logs between first start and handshaking should give you a clue but my > first guess would be replaying commit logs. > > Cheers > Ben > > --- > > > *Ben Slater**Chief Product Officer* > > <https://www.instaclustr.com/platform/> > > <https://www.facebook.com/instaclustr> <https://twitter.com/instaclustr> > <https://www.linkedin.com/company/instaclustr> > > Read our latest technical blog posts here > <https://www.instaclustr.com/blog/>. > > This email has been sent on behalf of Instaclustr Pty. Limited (Australia) > and Instaclustr Inc (USA). > > This email and any attachments may contain confidential and legally > privileged information. If you are not the intended recipient, do not copy > or disclose its content, but please reply to this email immediately and > highlight the error to the sender and then immediately delete the message. > > > On Wed, 6 Nov 2019 at 04:36, Rahul Reddy <rahulreddy1...@gmail.com> wrote: > >> I can reproduce the issue. >> >> I did drain Cassandra node then stop and started Cassandra instance . >> Cassandra instance comes up but other nodes will be in DN state around 10 >> minutes. >> >> I don't see error in the systemlog >> >> DN xx.xx.xx.59 420.85 MiB 256 48.2% id 2 >> UN xx.xx.xx.30 432.14 MiB 256 50.0% id 0 >> UN xx.xx.xx.79 447.33 MiB 256 51.1% id 4 >> DN xx.xx.xx.144 452.59 MiB 256 51.6% id 1 >> DN xx.xx.xx.19 431.7 MiB 256 50.1% id 5 >> UN xx.xx.xx.6 421.79 MiB 256 48.9% >> >> when i do nodetool status 3 nodes still showing down. and i dont see >> errors in system.log >> >> and after 10 mins it shows the other node is up as well. >> >> >> INFO [HANDSHAKE-/10.72.100.156] 2019-11-05 15:05:09,133 >> OutboundTcpConnection.java:561 - Handshaking version with /stopandstarted >> node >> INFO [RequestResponseStage-7] 2019-11-05 15:16:27,166 Gossiper.java:1019 >> - InetAddress /nodewhichitwasshowing down is now UP >> >> what is causing delay for 10mins to be able to say that node is reachable >> >> On Wed, Oct 30, 2019, 8:37 AM Rahul Reddy <rahulreddy1...@gmail.com> >> wrote: >> >>> And also aws ec2 stop and start comes with new instance with same ip and >>> all our file systems are in ebs mounted fine. Does coming new instance >>> with same ip cause any gossip issues? >>> >>> On Tue, Oct 29, 2019, 6:16 PM Rahul Reddy <rahulreddy1...@gmail.com> >>> wrote: >>> >>>> Thanks Alex. We have 6 nodes in each DC with RF=3 with CL local qourum >>>> . and we stopped and started only one instance at a time . Tough nodetool >>>> status says all nodes UN and system.log says canssandra started and started >>>> listening . Jmx explrter shows instance stayed down longer how do we >>>> determine what caused the Cassandra unavialbe though log says its stared >>>> and listening ? >>>> >>>> On Tue, Oct 29, 2019, 4:44 PM Oleksandr Shulgin < >>>> oleksandr.shul...@zalando.de> wrote: >>>> >>>>> On Tue, Oct 29, 2019 at 9:34 PM Rahul Reddy <rahulreddy1...@gmail.com> >>>>> wrote: >>>>> >>>>>> >>>>>> We have our infrastructure on aws and we use ebs storage . And aws >>>>>> was retiring on of the node. Since our storage was persistent we did >>>>>> nodetool drain and stopped and start the instance . This caused 500 >>>>>> errors >>>>>> in the service. We have local_quorum and rf=3 why does stopping one >>>>>> instance cause application to have issues? >>>>>> >>>>> >>>>> Can you still look up what was the underlying error from Cassandra >>>>> driver in the application logs? Was it request timeout or not enough >>>>> replicas? >>>>> >>>>> For example, if you only had 3 Cassandra nodes, restarting one of them >>>>> reduces your cluster capacity by 33% temporarily. >>>>> >>>>> Cheers, >>>>> -- >>>>> Alex >>>>> >>>>>