I got nasty problem in production. We make connection pool with official erlang 
pb client. Everything works fine. To organize pool we use hottub (we try 
several,but that is simplest). Each connection used at least once in 3-5 
minutes(production is not full loaded now).

After several days riak server disconnect us. But socket process doesn't die, 
on any request it answers {error, disconnected}. So far I wrote pool workers 
checker, if it is_connected(Pid) return not true, we kill worker and pool 
create new one. I fired it every ten minutes. But it didn't help. It return 
true, but then I am making request I get {error, disconnected}. Only solution 
that work so far is pool full reinit if some worker return {error, 
disconnected}. It is very barbaric and may crash whole app.

When I checked server logs I found many errors like this two: 2012-09-20 
00:10:10.976 [error] <0.803.0>@riak_core_vnode:handle_info:510 
296867520082839655260123481645494988367611297792 riak_kv_vnode worker pool 
crashed 
{timeout,{gen_server,call,[<0.819.0>,{work,<0.806.0>,{fold,#Fun,#Fun},{raw,59205031,<0.28969.11>}}]}}
 2012-09-20 00:10:10.976 [error] <0.862.0>@riak_core_vnode:handle_info:510 
365375409332725729550921208179070754913983135744 riak_kv_vnode worker pool 
crashed 
{timeout,{gen_fsm,sync_send_event,[<0.866.0>,{checkout,false,5000},5000]}}

I guess that is real problem, but I think client connection should at least log 
something, get connection problem failures list or die. I got is_connected(Pid) 
= true

How are you organizing connection pools which work 24/7? How you check pool 
workers or refresh them?



_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to