This is no longer an issue in 2.1.
https://issues.apache.org/jira/browse/CASSANDRA-2434

We now make sure the replica we bootstrap from is the one that will no
longer own that range

On Wed, Jun 24, 2015 at 4:58 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:

> It looks to me that can indeed happen theoretically (I might be wrong).
>
> However,
>
> - Hinted Handoff tends to remove this issue, if this is big worry, you
> might want to make sure HH are enabled and well tuned
> - Read Repairs (synchronous or not) might have mitigate things also, if
> you read fresh data. You can set this to higher values.
> - After an outage, you should always run a nodetool repair on the node
> that went done - following the best practices, or because you understand
> the reasons - or just trust HH if it is enough to you.
>
> So I would say that you can always "shoot yourself in your foot", whatever
> you do, yet following best practices or understanding the internals is the
> key imho.
>
> I would say it is a good question though.
>
> Alain.
>
>
>
> 2015-06-24 19:43 GMT+02:00 Anuj Wadehra <anujw_2...@yahoo.co.in>:
>
>> Hi,
>>
>> We faced a scenario where we lost little data after adding 2 nodes in the
>> cluster. There were intermittent dropped mutations in the cluster. Need to
>> verify my understanding how this may have happened to do Root Cause
>> Analysis:
>>
>> Scenario: 3 nodes, RF=3, Read / Write CL= Quorum
>>
>> 1. Due to overloaded cluster, some writes just happened on 2 nodes: node
>> 1 & node 2 whike asynchronous mutations dropped on node 3.
>> So say key K with Token T was not written to 3.
>>
>> 2. I added node 4 and suppose as per newly calculated ranges, now token T
>> is supposed to have replicas on node 1, node 3, and node 4. Unfortunately
>> node 4 started bootstrapping from node 3 where key K was missing.
>>
>> 3. After 2 min gap recommended, I added node 5 and as per new token
>> distribution suppose token T now is suppossed to have replicas on node 3,
>> node 4 and node 5. Again node 5 bootstrapped from node 3 where data was
>> misssing.
>>
>> So now key K is lost and thats how we list very few rows.
>>
>> Moreover, in step 1 situation could be worse. we can also have a scenario
>> where some writes just happened on one of three replicas and cassandra
>> chooses  replicas where this data is missing for streaming ranges to 2 new
>> nodes.
>>
>> Am I making sense?
>>
>> We are using C* 2.0.3.
>>
>> Thanks
>> Anuj
>>
>>
>>
>> Sent from Yahoo Mail on Android
>> <https://overview.mail.yahoo.com/mobile/?.src=Android>
>>
>
>


-- 
http://twitter.com/tjake

Reply via email to