Re: riak_core question when a node dies

Jon Brisbin Wed, 28 Mar 2012 10:09:27 -0700

I'm using get_primary_apl to get my preflist but the problem is how to handle a 
failure of trying to dispatch to a node that is just now going down and hasn't 
had time to notify the caller yet. I don't want to loose the web request 
currently in progress. Maybe I need to get a list of indexes to possibly 
dispatch to and iterate over them, stopping at the first one that doesn't blow 
up.


Sent from my iPhone

On Mar 28, 2012, at 12:00 PM, Sean Cribbs <s...@basho.com> wrote:

> Jon,
> 
> Generally I would use the riak_core_apl module to calculate the preflist for 
> your request. It takes into account node visibility and service availability. 
> Use riak_core_node_watcher:service_up to announce that your app is available 
> after registering with riak_core.
> 
> When doing some "split brain" testing/simulation for gen_leader we would do 
> something like the following on a node we wanted to partition:
> 
> 1> erlang:set_cookie(node(), riak2).
> 2> erlang:disconnect_node('dev3@127.0.0.1'), 
> erlang:disconnect_node('dev4@127.0.0.1').
> Basically, set the cookie so it can't connect to the other nodes, then 
> manually disconnect. That might help you simulate node-outage.
> 
> On Wed, Mar 28, 2012 at 12:49 PM, Jon Brisbin <j...@jbrisbin.com> wrote:
> I'm testing the example code that dispatches a web request from misultin into 
> a riak_core ring of vnodes. It works fantastic when all nodes are up! :)
> 
> Doing "ab -k -c 200 -n 10000 http://localhost:3000/"; yields a none-to-shabby 
> performance (dispatching at random into all available vnodes on two separate 
> riak_core processes):
> 
> Concurrency Level:      200
> Time taken for tests:   1.446 seconds
> Complete requests:      10000
> Failed requests:        0
> Write errors:           0
> Keep-Alive requests:    10000
> Total transferred:      1600480 bytes
> HTML transferred:       120036 bytes
> Requests per second:    6914.04 [#/sec] (mean)
> Time per request:       28.927 [ms] (mean)
> Time per request:       0.145 [ms] (mean, across all concurrent requests)
> Transfer rate:          1080.64 [Kbytes/sec] received
> 
> Connection Times (ms)
>               min  mean[+/-sd] median   max
> Connect:        0    0   1.0      0      12
> Processing:     4   28   9.8     27      78
> Waiting:        4   28   9.8     27      78
> Total:          4   28  10.1     27      83
> 
> Percentage of the requests served within a certain time (ms)
>   50%     27
>   66%     31
>   75%     34
>   80%     36
>   90%     41
>   95%     47
>   98%     53
>   99%     58
>  100%     83 (longest request)
> 
> If I were really zealous, I'd set up haproxy to load balance between these 
> two misultin servers and get double failover.
> 
> I'm trying to catch the situation of going into the console of one of my 
> nodes and hitting "CTL-C" to kill that process. I'm not sure what the best 
> way is to handle this. Check before I dispatch to make sure the node is up? 
> Keep a watch of some other kind that, when it sees that node go down and if 
> it's trying to dispatch to that node, it tries to find another one?
> 
> Essentially, I'm trying to prevent misultin from completely bailing on the 
> request because the sync_spawn_command blows up trying to do a 
> gen_server:call to a non-existent node. I'd like to retry to dispatch to a 
> different node if one happens to have crashed while I'm serving requests (I 
> don't want to loose a request, essentially).
> 
> Thanks!
> 
> Jon Brisbin
> http://about.me/jonbrisbin
> 
> 
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
> 
> 
> 
> 
> -- 
> Sean Cribbs <s...@basho.com>
> Software Engineer
> Basho Technologies, Inc.
> http://basho.com/
>

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: riak_core question when a node dies

Reply via email to