Thanks Ben, Jeff Sorry that my explanation confused you.
Only node1 is the seed node. Node2 whose C* data is deleted is NOT a seed. I restarted the failure node(node2) after restarting the seed node(node1). The restarting node2 succeeded without the exception. (I couldn't restart node2 before restarting node1 as expected.) Regards, On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> wrote: > The unstated "problem" here is that node1 is a seed, which implies > auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly > setup to start without bootstrapping). > > That means once the data dir is wiped, it's going to start again without a > bootstrap, and make a single node cluster or join an existing cluster if > the seed list is valid > > > > -- > Jeff Jirsa > > > On Oct 17, 2016, at 8:51 PM, Ben Slater <ben.sla...@instaclustr.com> > wrote: > > OK, sorry - I think understand what you are asking now. > > However, I’m still a little confused by your description. I think your > scenario is: > 1) Stop C* on all nodes in a cluster (Nodes A,B,C) > 2) Delete all data from Node A > 3) Restart Node A > 4) Restart Node B,C > > Is this correct? > > If so, this isn’t a scenario I’ve tested/seen but I’m not surprised Node A > starts succesfully as there are no running nodes to tell it via gossip that > it shouldn’t start up without the “replaces” flag. > > I think that right way to recover in this scenario is to run a nodetool > rebuild on Node A after the other two nodes are running. You could > theoretically also run a repair (which would be good practice after a weird > failure scenario like this) but rebuild will probably be quicker given you > know all the data needs to be re-streamed. > > Cheers > Ben > > On Tue, 18 Oct 2016 at 14:03 Yuji Ito <y...@imagine-orb.com> wrote: > >> Thank you Ben, Yabin >> >> I understood the rejoin was illegal. >> I expected this rejoin would fail with the exception. >> But I could add the failure node to the cluster without the >> exception after 2) and 3). >> I want to know why the rejoin succeeds. Should the exception happen? >> >> Regards, >> >> >> On Tue, Oct 18, 2016 at 1:51 AM, Yabin Meng <yabinm...@gmail.com> wrote: >> >> The exception you run into is expected behavior. This is because as Ben >> pointed out, when you delete everything (including system schemas), C* >> cluster thinks you're bootstrapping a new node. However, node2's IP is >> still in gossip and this is why you see the exception. >> >> I'm not clear the reasoning why you need to delete C* data directory. >> That is a dangerous action, especially considering that you delete system >> schemas. If in any case the failure node is gone for a while, what you need >> to do is to is remove the node first before doing "rejoin". >> >> Cheers, >> >> Yabin >> >> On Mon, Oct 17, 2016 at 1:48 AM, Ben Slater <ben.sla...@instaclustr.com> >> wrote: >> >> To cassandra, the node where you deleted the files looks like a brand new >> machine. It doesn’t automatically rebuild machines to prevent accidental >> replacement. You need to tell it to build the “new” machines as a >> replacement for the “old” machine with that IP by setting >> -Dcassandra.replace_address_first_boot=<dead_node_ip>. See >> http://cassandra.apache.org/doc/latest/operating/topo_changes.html >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_operating_topo-5Fchanges.html&d=DQMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=KGo0EnUT-Bop-0OnyQJRuFvNOf99S9tWEgziATmNfJ8&s=YazqmnV8TuuQXt9PDn0kFe6C08b7tQQXrqouXBCVVXE&e=> >> . >> >> Cheers >> Ben >> >> On Mon, 17 Oct 2016 at 16:41 Yuji Ito <y...@imagine-orb.com> wrote: >> >> Hi all, >> >> A failure node can rejoin a cluster. >> On the node, all data in /var/lib/cassandra were deleted. >> Is it normal? >> >> I can reproduce it as below. >> >> cluster: >> - C* 2.2.7 >> - a cluster has node1, 2, 3 >> - node1 is a seed >> - replication_factor: 3 >> >> how to: >> 1) stop C* process and delete all data in /var/lib/cassandra on node2 >> ($sudo rm -rf /var/lib/cassandra/*) >> 2) stop C* process on node1 and node3 >> 3) restart C* on node1 >> 4) restart C* on node2 >> >> nodetool status after 4): >> Datacenter: datacenter1 >> ======================= >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> DN [node3 IP] ? 256 100.0% >> 325553c6-3e05-41f6-a1f7-47436743816f rack1 >> UN [node2 IP] 7.76 MB 256 100.0% >> 05bdb1d4-c39b-48f1-8248-911d61935925 rack1 >> UN [node1 IP] 416.13 MB 256 100.0% >> a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b rack1 >> >> If I restart C* on node 2 when C* on node1 and node3 are running (without >> 2), 3)), a runtime exception happens. >> RuntimeException: "A node with address [node2 IP] already exists, >> cancelling join..." >> >> I'm not sure this causes data lost. All data can be read properly just >> after this rejoin. >> But some rows are lost when I kill&restart C* for destructive tests after >> this rejoin. >> >> Thanks. >> >> -- >> ———————— >> Ben Slater >> Chief Product Officer >> Instaclustr: Cassandra + Spark - Managed | Consulting | Support >> +61 437 929 798 >> >> >> >> -- > ———————— > Ben Slater > Chief Product Officer > Instaclustr: Cassandra + Spark - Managed | Consulting | Support > +61 437 929 798 > > ____________________________________________________________________ > CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential and > may be legally privileged. If you are not the intended recipient, do not > disclose, copy, distribute, or use this email or any attachments. If you > have received this in error please let the sender know and then delete the > email and all attachments. >