> On Aug 2, 2019, at 12:21 AM, Martin Xue <martin...@gmail.com> wrote:
> 
> Hello,
> 
> I am currently running into a production issue, and seek help from the 
> community to help.
> 
> Can anyone help with the following question regarding the Cassandra down node 
> inside cluster?
> 
> Case:
> Cassandra 3.0.14
> 3 nodes (A, B, C) in DC1, 3 nodes (D, E, F) in DC2 forming one cluster
> 
> keyspace_m: Replication Factor is 2 in DC1, and DC2
> 
> application_z read and write consistency is both local quorum
> 

RF=2 and local quorum basically guarantees an outage in a given DC if any 
single host dies, so it’s only recommended if you can fail out of a DC safely 
(which means eventually consistent data model, when you fail out the remote DC 
is in an undefined state since you’re using local quorum)

> 
> Issue:
> node A in DC1 has crashed, and has been down for more than 24 hours, (outside 
> the default hint3 hours window).
> 
> Questions:
> 1. for old data in node A, will the data be re-sync to node B, or C after 
> node A was down?

Both, but only B or C for any piece of data

With RF=2, data is on either:
AB
BC
AC

So if A crashes, bringing it back or replacing it will sync from its only 
surviving replica for each piece of data

> 2. for new data, if application_z is trying to write, will the data be always 
> written to the only two running nodes (B and C) in DC1, or it will fail if it 
> still tries to write to node A?

It will fail. Ownership doesn’t change just because one host goes down. For a 
piece of data owned by A and any other node, you’re going to fail if A is down 
and you use this replication factor and consistency 

> 3. if application_z is to read, will it fail (for old data before node A 
> crash and for new data after node A crash)? will the data be replicated from 
> A to B or C?
Fail, will throw unavailable exception 


> 3. what is the best strategy under this senario? 

Go to RF=3 or read and write at quorum so you’re doing 3/4 instead of 2/2 (but 
then you’ll fail of the wan link goes down, and your reads and writes will 
cross the wan adding latency)

> 4. Shall I bring up the node A and run repair on all the nodes (A, B, C, D, 
> E, F) 
> (a potential issue, as repair may cause the similar crash happened on node A 
> , and there are big 1TB keyspace to repair)

Since you’re past hint window, you’re going to have a lot of data to repair, 
and your chance of resurrecting data due to exceeding gc grace is nonzero, so 
it may make sense to replace. Replace will take longer, so bringing it online 
may be an easier way to end the outage, depending on the business cost of data 
resurrection (unless you have “only purge repaired tombstones” which will 
prevent resurrection, though potentially introduces other issues with 
incremental repair)

> 5. Shall I simply just decommision node A, and add new node F into DC1 into 
> cluster?

May be easier than trying to run repair. In this scenario only, you can replace 
without running repair and without violating consistency 

> 
> 
> Your help would be appreciated.
> 
> Thanks
> Regards
> Martin
> 

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reply via email to