Re: TCP recv timeout and handoffs almost all the time

2013-07-20 Thread Simon Effenberg
No system limit crash anymore but now I have stuff like this: 2013-07-20 08:23:10 UTC =CRASH REPORT crasher: initial call: mochiweb_acceptor:init/3 pid: <0.232.0> registered_name: [] exception error: {function_clause,[{webmachine_request,peer_from_peername,[{error,enotconn},

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
only after restarting the Riak instance on this node the awaiting handoffs where processed.. this is weird :( On Fri, 19 Jul 2013 15:55:43 +0200 Simon Effenberg wrote: > It looked good for some hours but now again we got > > 2013-07-19 13:27:07.800 UTC [error] > <0.18747.29>@riak_core_handoff

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
I'm getting again crash reports about system_limits: 2013-07-19 14:30:58 UTC =CRASH REPORT crasher: initial call: riak_kv_exchange_fsm:init/1 pid: <0.25883.24> registered_name: [] exception exit: {{{system_limit,[{erlang,spawn,[riak_kv_get_put_monitor,spawned,[gets,<0.11717.

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
It looked good for some hours but now again we got 2013-07-19 13:27:07.800 UTC [error] <0.18747.29>@riak_core_handoff_sender:start_fold:216 hinted_handoff transfer of riak_kv_vnode from 'riak@10.46.109.207' 1136089163393944065322395631681798128560666312704 to 'riak@10.47.109.202' 113608916339

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
once again with the list included... argh Hey Christian, so it could be also a erlang limit? I found out why my riak instances are all having different processlimits. My mcollectived daemons have the different limits and when I triggered a puppetrun through mcollective they got this processlimit

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Christian Dahlqvist
Hi Simon, If you have objects that can be a s big as 15MB, it is probably wise to increase the size of +zdbbl in order to avoid filling up buffers when these large objects need to be transferred between nodes. What an appropriate level is depends a lot on the size distribution of your data and

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
wow.. now I have something to search for.. riak46-1 Max processes unlimitedunlimited processes riak46-2 Max processes unlimitedunlimited processes riak46-3 Max processes unlimitedunlimited

Re: TCP recv timeout and handoffs almost all the time

2013-07-19 Thread Simon Effenberg
The +zdbbl parameter helped a lot but the hinted handoffs didn't disappear completely. I have no more busy dist port errors in the _console.log_ (why aren't they in the error.log? it looks for me like a serious problem you have.. at least our cluster was behaving not that nice). I'll try to increa

Re: TCP recv timeout and handoffs almost all the time

2013-07-18 Thread Guido Medina
Follow the white rabbit: http://docs.basho.com/riak/latest/cookbooks/Linux-Performance-Tuning/ Most recommended parameters are on that link. HTH, Guido. On 18/07/13 19:48, Simon Effenberg wrote: Sounds like zdbbl.. I'm running 1.3.1 but it started after added 6 more nodes to the previously 1

Re: TCP recv timeout and handoffs almost all the time

2013-07-18 Thread Simon Effenberg
Sounds like zdbbl.. I'm running 1.3.1 but it started after added 6 more nodes to the previously 12 node cluster. So maybe it is because of a 18 node cluster? I'll try the zdbbl stuff. Any other hint would be cool (if the new kernel parameters are also good for 1.3.1.. could you provide them?). Ch

Re: TCP recv timeout and handoffs almost all the time

2013-07-18 Thread Simon Effenberg
More informations in the console.log: 2013-07-18 18:30:18.768 UTC [info] <0.76.0>@riak_core_sysmon_handler:handle_event:92 monitor busy_dist_port <0.21558.67> {#Port<0.7283>,'riak@10.47.109.203'} 2013-07-18 18:30:33.760 UTC [info] <0.76.0>@riak_core_sysmon_handler:handle_event:92 monitor busy_

Re: TCP recv timeout and handoffs almost all the time

2013-07-18 Thread Guido Medina
If what you are describing is happening for 1.4, type riak-admin diag and see the new recommended kernel parameters, also, on vm.args uncomment the +zdbbl 32768 parameter, since what you are describing is similar to what happened to us when we upgraded to 1.4. HTH, Guido. On 18/07/13 19:21,

Re: TCP recv timeout and handoffs almost all the time

2013-07-18 Thread Simon Effenberg
It's more than 30 handoffs sometimes: Attempting to restart script through sudo -H -u riak 'riak@10.47.109.209' waiting to handoff 6 partitions 'riak@10.47.109.208' waiting to handoff 2 partitions 'riak@10.47.109.207' waiting to handoff 1 partitions 'riak@10.47.109.206' waiting to handoff 14 parti