thanks Ben, > 1) At what stage did you have (or expect to have) 1000 rows (and have the mismatch between actual and expected) - at that end of operation (2) or after operation (3)?
after operation 3), at operation 4) which reads all rows by cqlsh with CL.SERIAL > 2) What replication factor and replication strategy is used by the test keyspace? What consistency level is used by your operations? - create keyspace testkeyspace WITH REPLICATION = {'class':'SimpleStrategy','replication_factor':3}; - consistency level is SERIAL On Fri, Oct 21, 2016 at 12:04 PM, Ben Slater <ben.sla...@instaclustr.com> wrote: > > A couple of questions: > 1) At what stage did you have (or expect to have) 1000 rows (and have the > mismatch between actual and expected) - at that end of operation (2) or > after operation (3)? > 2) What replication factor and replication strategy is used by the test > keyspace? What consistency level is used by your operations? > > > Cheers > Ben > > On Fri, 21 Oct 2016 at 13:57 Yuji Ito <y...@imagine-orb.com> wrote: > >> Thanks Ben, >> >> I tried to run a rebuild and repair after the failure node rejoined the >> cluster as a "new" node with -Dcassandra.replace_address_first_boot. >> The failure node could rejoined and I could read all rows successfully. >> (Sometimes a repair failed because the node cannot access other node. If >> it failed, I retried a repair) >> >> But some rows were lost after my destructive test repeated (after about >> 5-6 hours). >> After the test inserted 1000 rows, there were only 953 rows at the end of >> the test. >> >> My destructive test: >> - each C* node is killed & restarted at the random interval (within about >> 5 min) throughout this test >> 1) truncate all tables >> 2) insert initial rows (check if all rows are inserted successfully) >> 3) request a lot of read/write to random rows for about 30min >> 4) check all rows >> If operation 1), 2) or 4) fail due to C* failure, the test retry the >> operation. >> >> Does anyone have the similar problem? >> What causes data lost? >> Does the test need any operation when C* node is restarted? (Currently, I >> just restarted C* process) >> >> Regards, >> >> >> On Tue, Oct 18, 2016 at 2:18 PM, Ben Slater <ben.sla...@instaclustr.com> >> wrote: >> >> OK, that’s a bit more unexpected (to me at least) but I think the >> solution of running a rebuild or repair still applies. >> >> On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote: >> >> Thanks Ben, Jeff >> >> Sorry that my explanation confused you. >> >> Only node1 is the seed node. >> Node2 whose C* data is deleted is NOT a seed. >> >> I restarted the failure node(node2) after restarting the seed node(node1). >> The restarting node2 succeeded without the exception. >> (I couldn't restart node2 before restarting node1 as expected.) >> >> Regards, >> >> >> On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com> >> wrote: >> >> The unstated "problem" here is that node1 is a seed, which implies >> auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly >> setup to start without bootstrapping). >> >> That means once the data dir is wiped, it's going to start again without >> a bootstrap, and make a single node cluster or join an existing cluster if >> the seed list is valid >> >> >> >> -- >> Jeff Jirsa >> >> >> On Oct 17, 2016, at 8:51 PM, Ben Slater <ben.sla...@instaclustr.com> >> wrote: >> >> OK, sorry - I think understand what you are asking now. >> >> However, I’m still a little confused by your description. I think your >> scenario is: >> 1) Stop C* on all nodes in a cluster (Nodes A,B,C) >> 2) Delete all data from Node A >> 3) Restart Node A >> 4) Restart Node B,C >> >> Is this correct? >> >> If so, this isn’t a scenario I’ve tested/seen but I’m not surprised Node >> A starts succesfully as there are no running nodes to tell it via gossip >> that it shouldn’t start up without the “replaces” flag. >> >> I think that right way to recover in this scenario is to run a nodetool >> rebuild on Node A after the other two nodes are running. You could >> theoretically also run a repair (which would be good practice after a weird >> failure scenario like this) but rebuild will probably be quicker given you >> know all the data needs to be re-streamed. >> >> Cheers >> Ben >> >> On Tue, 18 Oct 2016 at 14:03 Yuji Ito <y...@imagine-orb.com> wrote: >> >> Thank you Ben, Yabin >> >> I understood the rejoin was illegal. >> I expected this rejoin would fail with the exception. >> But I could add the failure node to the cluster without the >> exception after 2) and 3). >> I want to know why the rejoin succeeds. Should the exception happen? >> >> Regards, >> >> >> On Tue, Oct 18, 2016 at 1:51 AM, Yabin Meng <yabinm...@gmail.com> wrote: >> >> The exception you run into is expected behavior. This is because as Ben >> pointed out, when you delete everything (including system schemas), C* >> cluster thinks you're bootstrapping a new node. However, node2's IP is >> still in gossip and this is why you see the exception. >> >> I'm not clear the reasoning why you need to delete C* data directory. >> That is a dangerous action, especially considering that you delete system >> schemas. If in any case the failure node is gone for a while, what you need >> to do is to is remove the node first before doing "rejoin". >> >> Cheers, >> >> Yabin >> >> On Mon, Oct 17, 2016 at 1:48 AM, Ben Slater <ben.sla...@instaclustr.com> >> wrote: >> >> To cassandra, the node where you deleted the files looks like a brand new >> machine. It doesn’t automatically rebuild machines to prevent accidental >> replacement. You need to tell it to build the “new” machines as a >> replacement for the “old” machine with that IP by setting >> -Dcassandra.replace_address_first_boot=<dead_node_ip>. See >> http://cassandra.apache.org/doc/latest/operating/topo_changes.html >> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_operating_topo-5Fchanges.html&d=DQMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=KGo0EnUT-Bop-0OnyQJRuFvNOf99S9tWEgziATmNfJ8&s=YazqmnV8TuuQXt9PDn0kFe6C08b7tQQXrqouXBCVVXE&e=> >> . >> >> Cheers >> Ben >> >> On Mon, 17 Oct 2016 at 16:41 Yuji Ito <y...@imagine-orb.com> wrote: >> >> Hi all, >> >> A failure node can rejoin a cluster. >> On the node, all data in /var/lib/cassandra were deleted. >> Is it normal? >> >> I can reproduce it as below. >> >> cluster: >> - C* 2.2.7 >> - a cluster has node1, 2, 3 >> - node1 is a seed >> - replication_factor: 3 >> >> how to: >> 1) stop C* process and delete all data in /var/lib/cassandra on node2 >> ($sudo rm -rf /var/lib/cassandra/*) >> 2) stop C* process on node1 and node3 >> 3) restart C* on node1 >> 4) restart C* on node2 >> >> nodetool status after 4): >> Datacenter: datacenter1 >> ======================= >> Status=Up/Down >> |/ State=Normal/Leaving/Joining/Moving >> -- Address Load Tokens Owns (effective) Host ID >> Rack >> DN [node3 IP] ? 256 100.0% >> 325553c6-3e05-41f6-a1f7-47436743816f rack1 >> UN [node2 IP] 7.76 MB 256 100.0% >> 05bdb1d4-c39b-48f1-8248-911d61935925 rack1 >> UN [node1 IP] 416.13 MB 256 100.0% >> a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b rack1 >> >> If I restart C* on node 2 when C* on node1 and node3 are running (without >> 2), 3)), a runtime exception happens. >> RuntimeException: "A node with address [node2 IP] already exists, >> cancelling join..." >> >> I'm not sure this causes data lost. All data can be read properly just >> after this rejoin. >> But some rows are lost when I kill&restart C* for destructive tests after >> this rejoin. >> >> Thanks. >> >> -- >> ———————— >> Ben Slater >> Chief Product Officer >> Instaclustr: Cassandra + Spark - Managed | Consulting | Support >> +61 437 929 798 >> >> >> >> -- >> ———————— >> Ben Slater >> Chief Product Officer >> Instaclustr: Cassandra + Spark - Managed | Consulting | Support >> +61 437 929 798 >> >> ____________________________________________________________________ >> CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential >> and may be legally privileged. If you are not the intended recipient, do >> not disclose, copy, distribute, or use this email or any attachments. If >> you have received this in error please let the sender know and then delete >> the email and all attachments. >> >> >> -- >> ———————— >> Ben Slater >> Chief Product Officer >> Instaclustr: Cassandra + Spark - Managed | Consulting | Support >> +61 437 929 798 >> >> >> -- > ———————— > Ben Slater > Chief Product Officer > Instaclustr: Cassandra + Spark - Managed | Consulting | Support > +61 437 929 798 >