Fwd: Reading with "r = all" always succeeds in Riak 1.2 even when one of the primary nodes is down?

Tatsuya Kawano Wed, 26 Sep 2012 21:11:59 -0700

I forgot to cc to riak-users.

---------- Forwarded message ----------
From: Tatsuya Kawano <t650...@gmail.com>
Date: 2012/9/27
Subject: Re: Reading with "r = all" always succeeds in Riak 1.2 even
when one of the primary nodes is down?
To: Mike Oxford <moxf...@gmail.com>



Hi Mike,

Thanks for the detailed info. I'm currently running all Riak node on
one box. So I'll try to get more boxes and try to pull the network
cable out.

> Did you "shut down" the node or kill it by brutally powering the box down or
> yanking the network cable?

I only tried to kill the Erlang precess.

> It possible that Riak noticed the node_down and had already done the
> recovery.  While net_ticktime can be as long as 60 seconds by default, it's
> possible that you're hitting the case where you kill it and before you
> re-run the read it's already noticed and 'fixed itself?'

OK. I think "this fixed by itself" behavior is not documented in the
Riak Wiki(?). Now I understand why r=all didn't fail.

Thanks!
Tatsuya


2012/9/27 Mike Oxford <moxf...@gmail.com>:
> What is the time between "node down" and "read with R=3" ?
> Did you "shut down" the node or kill it by brutally powering the box down or
> yanking the network cable?
>
> It possible that Riak noticed the node_down and had already done the
> recovery.  While net_ticktime can be as long as 60 seconds by default, it's
> possible that you're hitting the case where you kill it and before you
> re-run the read it's already noticed and 'fixed itself?'
>
> Also, if you do a shutdown, the erlang VM is probably linked/monitoring and
> being notified that the node is shutting down so it's triggering the
> rebalance immediately.
>
> Try by pulling the network cable out of that node.  "/sbin/ifconfig eth0
> down" **may** give you the same effect.
>
> -mox
>
> On Wed, Sep 26, 2012 at 2:51 PM, Tatsuya Kawano <t650...@gmail.com> wrote:
>>
>> Hi,
>>
>> I'm having hard time to verify this behavior on the Riak wiki with my
>> Riak 1.2 test environment. Can anybody help me to figure out what is
>> happening?
>>
>>
>> http://wiki.basho.com/Eventual-Consistency.html#Failure-Scenarios
>>
>> > Reading When One Primary Fails
>> > ------------------------------
>> >
>> > 1. Data is written to a key with W=3
>> > 2. One node goes down, it happens to be a primary for that key
>> > 3. Data is read from that key with R=3
>> > 4. Riak returns not_found on first request
>> > 5. Read repair ensures data is replicated to a secondary node.
>> >    Read repair will always occur, regardless of the R value.
>> >    Even with an R of 2, read repair will kick in and ensure that
>> >    all nodes responsible for this particular data are consistent.
>> > 6. Subsequent reads return correct value with R=3, two values
>> >    coming from primary and one from secondary nodes
>>
>>
>> At the first read (step 4), Riak should return not_found, but it
>> actually retuns the correct value. I wonder when read repair will kick
>> in in Riak 1.2. (even before the first read?)
>>
>>
>> I followed the screencast "Tuning CAP Controls in Riak" on this page.
>> http://wiki.basho.com/Tunable-CAP-Controls-in-Riak.html
>>
>> I used riak_core_ring:preflist/2 to ensure that I had took down one of
>> the correct primary nodes for the key.
>>
>> Thanks,
>> Tatsuya
>>
>> --
>> Tatsuya Kawano (Mr.)
>> Tokyo, Japan
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Fwd: Reading with "r = all" always succeeds in Riak 1.2 even when one of the primary nodes is down?

Reply via email to