Re: Whole cluster times out if one node is gone

Dan Reverri Mon, 29 Nov 2010 10:02:26 -0800

Hi Jay,

I'm not able to reproduce the behavior you are seeing. Here is what I am
doing to try to reproduce the issue:
1. Setup a 4 node cluster
2. Continuously write a new object to Riak every 0.5 second
3. Continuously read a known object (GET riak/test/1) from Riak every 0.5
second
4. Reboot one of the nodes


The reads and writes continue working normally when rebooting the node.

Do you see timeouts while writing objects to Riak?
Can you try reading other objects from Riak during the reboot (i.e.
different keys)?

Thanks,
Dan

Daniel Reverri
Developer Advocate
Basho Technologies, Inc.
d...@basho.com


On Mon, Nov 29, 2010 at 9:39 AM, Jay Adkisson <j4yf...@gmail.com> wrote:

> Hey Dan/Sean,
>
> Thanks for the response.  sasl-error.log on node A is completely empty, and
> I see this pattern in erlang.log:
>
> ===== ALIVE Tue Nov 23 12:46:57 PST 2010
>
> ===== Tue Nov 23 12:57:36 PST 2010
>
> =ERROR REPORT==== 23-Nov-2010::12:57:36 ===
>  ** Node 'riak@<node D>' not responding **
> ** Removing (timedout) connection **
>
> =INFO REPORT==== 23-Nov-2010::12:58:41 ===
> Starting handoff of partition riak_kv_vnode
> 251195593916248939066258330623111144003363405824 to 'riak@<node D>'
>
> =INFO REPORT==== 23-Nov-2010::12:58:41 ===
> Handoff of partition riak_kv_vnode
> 251195593916248939066258330623111144003363405824 to 'riak@<node D>'
> completed: sent 1 objects in 0.02 seconds
> =INFO REPORT==== 23-Nov-2010::12:59:18 ===
> Starting handoff of partition riak_kv_vnode
> 707914855582156101004909840846949587645842325504 to 'riak@<node D>'
>
> =INFO REPORT==== 23-Nov-2010::12:59:18 ===
> Handoff of partition riak_kv_vnode
> 707914855582156101004909840846949587645842325504 to 'riak@<node D>'
> completed: sent 5 objects in 0.03 seconds
> =INFO REPORT==== 23-Nov-2010::12:59:20 ===
> Starting handoff of partition riak_kv_vnode
> 525227150915793236229449236757414210188850757632 to 'riak@<node D>'
>
> <handoffs, etc...>
>
> This is my testing process: I'm doing an initial load into riak of small
> image files between 1 and 150K, throttled to two images per second, with
> W=1.  In a different terminal, I'm running a wget every second against node
> A of one particular image I already know to be in the cluster, again with
> R=1.  I'm using R,W=1 because I figured that would reduce the chance of
> timing out, and with my data pattern, nothing I write to the cluster will
> ever change, so I really don't need to wait for a quorum.
>
> In response to Sean,
>
>> 1) Riak detects node outage the same way any Erlang system does - when a
>> message fails to deliver, or the heartbeat maintained by epmd fails.  The
>> default timeout in epmd is 1 minute, which is probably why you're seeing it
>> take 1 minute to be detected.
>>
> Thanks, this is enlightening.
>
> 2) If it takes too long (the vnode is overloaded, perhaps, or is just
>> starting up as a hint partition) to retrieve from any node, the request can
>> time out.
>>
> That makes sense, but I still wonder why this happens even when the quorum
> is already met by the machines that are responding normally?
>
>
>> 3) You could probably configure epmd to timeout sooner, but then you
>> become more vulnerable to temporary partitions. YMMV
>>
> I may try that - it might be a good fit with my data pattern.
>
> Thanks again,
> --Jay
>
>
> On Mon, Nov 29, 2010 at 4:44 AM, David Smith <diz...@basho.com> wrote:
>
>> On Tue, Nov 23, 2010 at 3:33 PM, Jay Adkisson <j4yf...@gmail.com> wrote:
>> > (many profuse apologies to Dan - hit "reply" instead of "reply all")
>> > Alrighty, I've done a little more digging.  When I throttle the writes
>> > heavily (2/sec) and set R and W to 1 all around, the cluster works just
>> fine
>> > after I restart the node for about 15-20 seconds.  Then the read request
>> > hangs for about a minute, until node D disappears from connected_nodes
>> in
>> > riak-admin status, at which point it returns the desired value (although
>> > sometimes I get a 503):
>>
>> Are you seeing any error messages in log/erlang.log.* or
>> log/sasl-error.log?
>>
>> Can you expound on your use case a little -- are you doing a large
>> insert, or just a random read/write mix? Did you pre-populate the
>> dataset? Why are you using r=1, instead of relying on quorom for
>> reads?
>>
>> How are you running the riak-admin status to measure the 15-20 seconds?
>>
>> Thanks.
>>
>> D.
>>
>
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Whole cluster times out if one node is gone

Reply via email to