Thanks Jake!! But I think most people have 2.0.x in Production right now as 
2.1.6 is very recently declared Production Ready. I think the bug is too 
important to be left open in 2.0.x as it leads to data loss. Should I open JIRA?

ThanksAnuj Wadehra 


     On Thursday, 25 June 2015 2:47 AM, Jake Luciani <jak...@gmail.com> wrote:
   

 This is no longer an issue in 2.1. 
https://issues.apache.org/jira/browse/CASSANDRA-2434
We now make sure the replica we bootstrap from is the one that will no longer 
own that range
On Wed, Jun 24, 2015 at 4:58 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

It looks to me that can indeed happen theoretically (I might be wrong).
However,
- Hinted Handoff tends to remove this issue, if this is big worry, you might 
want to make sure HH are enabled and well tuned- Read Repairs (synchronous or 
not) might have mitigate things also, if you read fresh data. You can set this 
to higher values.- After an outage, you should always run a nodetool repair on 
the node that went done - following the best practices, or because you 
understand the reasons - or just trust HH if it is enough to you.
So I would say that you can always "shoot yourself in your foot", whatever you 
do, yet following best practices or understanding the internals is the key imho.
I would say it is a good question though.
Alain.


2015-06-24 19:43 GMT+02:00 Anuj Wadehra <anujw_2...@yahoo.co.in>:


| Hi,
We faced a scenario where we lost little data after adding 2 nodes in the 
cluster. There were intermittent dropped mutations in the cluster. Need to 
verify my understanding how this may have happened to do Root Cause Analysis:
Scenario: 3 nodes, RF=3, Read / Write CL= Quorum
1. Due to overloaded cluster, some writes just happened on 2 nodes: node 1 & 
node 2 whike asynchronous mutations dropped on node 3.So say key K with Token T 
was not written to 3.
2. I added node 4 and suppose as per newly calculated ranges, now token T is 
supposed to have replicas on node 1, node 3, and node 4. Unfortunately node 4 
started bootstrapping from node 3 where key K was missing.
3. After 2 min gap recommended, I added node 5 and as per new token 
distribution suppose token T now is suppossed to have replicas on node 3, node 
4 and node 5. Again node 5 bootstrapped from node 3 where data was misssing.
So now key K is lost and thats how we list very few rows.
Moreover, in step 1 situation could be worse. we can also have a scenario where 
some writes just happened on one of three replicas and cassandra chooses  
replicas where this data is missing for streaming ranges to 2 new nodes.
Am I making sense?
We are using C* 2.0.3.
ThanksAnuj


Sent from Yahoo Mail on Android |







-- 
http://twitter.com/tjake

  

Reply via email to