Re: failure node rejoin

Yuji Ito Thu, 20 Oct 2016 22:31:07 -0700

thanks Ben,

> 1) At what stage did you have (or expect to have) 1000 rows (and have the
mismatch between actual and expected) - at that end of operation (2) or
after operation (3)?


after operation 3), at operation 4) which reads all rows by cqlsh with
CL.SERIAL

> 2) What replication factor and replication strategy is used by the test
keyspace? What consistency level is used by your operations?

- create keyspace testkeyspace WITH REPLICATION =
{'class':'SimpleStrategy','replication_factor':3};
- consistency level is SERIAL


On Fri, Oct 21, 2016 at 12:04 PM, Ben Slater <ben.sla...@instaclustr.com>
wrote:

>
> A couple of questions:
> 1) At what stage did you have (or expect to have) 1000 rows (and have the
> mismatch between actual and expected) - at that end of operation (2) or
> after operation (3)?
> 2) What replication factor and replication strategy is used by the test
> keyspace? What consistency level is used by your operations?
>
>
> Cheers
> Ben
>
> On Fri, 21 Oct 2016 at 13:57 Yuji Ito <y...@imagine-orb.com> wrote:
>
>> Thanks Ben,
>>
>> I tried to run a rebuild and repair after the failure node rejoined the
>> cluster as a "new" node with -Dcassandra.replace_address_first_boot.
>> The failure node could rejoined and I could read all rows successfully.
>> (Sometimes a repair failed because the node cannot access other node. If
>> it failed, I retried a repair)
>>
>> But some rows were lost after my destructive test repeated (after about
>> 5-6 hours).
>> After the test inserted 1000 rows, there were only 953 rows at the end of
>> the test.
>>
>> My destructive test:
>> - each C* node is killed & restarted at the random interval (within about
>> 5 min) throughout this test
>> 1) truncate all tables
>> 2) insert initial rows (check if all rows are inserted successfully)
>> 3) request a lot of read/write to random rows for about 30min
>> 4) check all rows
>> If operation 1), 2) or 4) fail due to C* failure, the test retry the
>> operation.
>>
>> Does anyone have the similar problem?
>> What causes data lost?
>> Does the test need any operation when C* node is restarted? (Currently, I
>> just restarted C* process)
>>
>> Regards,
>>
>>
>> On Tue, Oct 18, 2016 at 2:18 PM, Ben Slater <ben.sla...@instaclustr.com>
>> wrote:
>>
>> OK, that’s a bit more unexpected (to me at least) but I think the
>> solution of running a rebuild or repair still applies.
>>
>> On Tue, 18 Oct 2016 at 15:45 Yuji Ito <y...@imagine-orb.com> wrote:
>>
>> Thanks Ben, Jeff
>>
>> Sorry that my explanation confused you.
>>
>> Only node1 is the seed node.
>> Node2 whose C* data is deleted is NOT a seed.
>>
>> I restarted the failure node(node2) after restarting the seed node(node1).
>> The restarting node2 succeeded without the exception.
>> (I couldn't restart node2 before restarting node1 as expected.)
>>
>> Regards,
>>
>>
>> On Tue, Oct 18, 2016 at 1:06 PM, Jeff Jirsa <jeff.ji...@crowdstrike.com>
>> wrote:
>>
>> The unstated "problem" here is that node1 is a seed, which implies
>> auto_bootstrap=false (can't bootstrap a seed, so it was almost certainly
>> setup to start without bootstrapping).
>>
>> That means once the data dir is wiped, it's going to start again without
>> a bootstrap, and make a single node cluster or join an existing cluster if
>> the seed list is valid
>>
>>
>>
>> --
>> Jeff Jirsa
>>
>>
>> On Oct 17, 2016, at 8:51 PM, Ben Slater <ben.sla...@instaclustr.com>
>> wrote:
>>
>> OK, sorry - I think understand what you are asking now.
>>
>> However, I’m still a little confused by your description. I think your
>> scenario is:
>> 1) Stop C* on all nodes in a cluster (Nodes A,B,C)
>> 2) Delete all data from Node A
>> 3) Restart Node A
>> 4) Restart Node B,C
>>
>> Is this correct?
>>
>> If so, this isn’t a scenario I’ve tested/seen but I’m not surprised Node
>> A starts succesfully as there are no running nodes to tell it via gossip
>> that it shouldn’t start up without the “replaces” flag.
>>
>> I think that right way to recover in this scenario is to run a nodetool
>> rebuild on Node A after the other two nodes are running. You could
>> theoretically also run a repair (which would be good practice after a weird
>> failure scenario like this) but rebuild will probably be quicker given you
>> know all the data needs to be re-streamed.
>>
>> Cheers
>> Ben
>>
>> On Tue, 18 Oct 2016 at 14:03 Yuji Ito <y...@imagine-orb.com> wrote:
>>
>> Thank you Ben, Yabin
>>
>> I understood the rejoin was illegal.
>> I expected this rejoin would fail with the exception.
>> But I could add the failure node to the cluster without the
>> exception after 2) and 3).
>> I want to know why the rejoin succeeds. Should the exception happen?
>>
>> Regards,
>>
>>
>> On Tue, Oct 18, 2016 at 1:51 AM, Yabin Meng <yabinm...@gmail.com> wrote:
>>
>> The exception you run into is expected behavior. This is because as Ben
>> pointed out, when you delete everything (including system schemas), C*
>> cluster thinks you're bootstrapping a new node. However,  node2's IP is
>> still in gossip and this is why you see the exception.
>>
>> I'm not clear the reasoning why you need to delete C* data directory.
>> That is a dangerous action, especially considering that you delete system
>> schemas. If in any case the failure node is gone for a while, what you need
>> to do is to is remove the node first before doing "rejoin".
>>
>> Cheers,
>>
>> Yabin
>>
>> On Mon, Oct 17, 2016 at 1:48 AM, Ben Slater <ben.sla...@instaclustr.com>
>> wrote:
>>
>> To cassandra, the node where you deleted the files looks like a brand new
>> machine. It doesn’t automatically rebuild machines to prevent accidental
>> replacement. You need to tell it to build the “new” machines as a
>> replacement for the “old” machine with that IP by setting
>> -Dcassandra.replace_address_first_boot=<dead_node_ip>. See
>> http://cassandra.apache.org/doc/latest/operating/topo_changes.html
>> <https://urldefense.proofpoint.com/v2/url?u=http-3A__cassandra.apache.org_doc_latest_operating_topo-5Fchanges.html&d=DQMFaQ&c=08AGY6txKsvMOP6lYkHQpPMRA1U6kqhAwGa8-0QCg3M&r=yfYEBHVkX6l0zImlOIBID0gmhluYPD5Jje-3CtaT3ow&m=KGo0EnUT-Bop-0OnyQJRuFvNOf99S9tWEgziATmNfJ8&s=YazqmnV8TuuQXt9PDn0kFe6C08b7tQQXrqouXBCVVXE&e=>
>> .
>>
>> Cheers
>> Ben
>>
>> On Mon, 17 Oct 2016 at 16:41 Yuji Ito <y...@imagine-orb.com> wrote:
>>
>> Hi all,
>>
>> A failure node can rejoin a cluster.
>> On the node, all data in /var/lib/cassandra were deleted.
>> Is it normal?
>>
>> I can reproduce it as below.
>>
>> cluster:
>> - C* 2.2.7
>> - a cluster has node1, 2, 3
>> - node1 is a seed
>> - replication_factor: 3
>>
>> how to:
>> 1) stop C* process and delete all data in /var/lib/cassandra on node2
>> ($sudo rm -rf /var/lib/cassandra/*)
>> 2) stop C* process on node1 and node3
>> 3) restart C* on node1
>> 4) restart C* on node2
>>
>> nodetool status after 4):
>> Datacenter: datacenter1
>> =======================
>> Status=Up/Down
>> |/ State=Normal/Leaving/Joining/Moving
>> --  Address        Load       Tokens       Owns (effective)  Host ID
>>                           Rack
>> DN  [node3 IP]  ?                 256          100.0%
>>  325553c6-3e05-41f6-a1f7-47436743816f  rack1
>> UN  [node2 IP]  7.76 MB      256          100.0%
>>  05bdb1d4-c39b-48f1-8248-911d61935925  rack1
>> UN  [node1 IP]  416.13 MB  256          100.0%
>>  a8ec0a31-cb92-44b0-b156-5bcd4f6f2c7b  rack1
>>
>> If I restart C* on node 2 when C* on node1 and node3 are running (without
>> 2), 3)), a runtime exception happens.
>> RuntimeException: "A node with address [node2 IP] already exists,
>> cancelling join..."
>>
>> I'm not sure this causes data lost. All data can be read properly just
>> after this rejoin.
>> But some rows are lost when I kill&restart C* for destructive tests after
>> this rejoin.
>>
>> Thanks.
>>
>> --
>> ————————
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
>>
>>
>>
>> --
>> ————————
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
>>
>> ____________________________________________________________________
>> CONFIDENTIALITY NOTE: This e-mail and any attachments are confidential
>> and may be legally privileged. If you are not the intended recipient, do
>> not disclose, copy, distribute, or use this email or any attachments. If
>> you have received this in error please let the sender know and then delete
>> the email and all attachments.
>>
>>
>> --
>> ————————
>> Ben Slater
>> Chief Product Officer
>> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
>> +61 437 929 798
>>
>>
>> --
> ————————
> Ben Slater
> Chief Product Officer
> Instaclustr: Cassandra + Spark - Managed | Consulting | Support
> +61 437 929 798
>

Re: failure node rejoin

Reply via email to