No system limit crash anymore but now I have stuff like this:
2013-07-20 08:23:10 UTC =CRASH REPORT
crasher:
initial call: mochiweb_acceptor:init/3
pid: <0.232.0>
registered_name: []
exception error:
{function_clause,[{webmachine_request,peer_from_peername,[{error,enotconn},
only after restarting the Riak instance on this node the awaiting
handoffs where processed.. this is weird :(
On Fri, 19 Jul 2013 15:55:43 +0200
Simon Effenberg wrote:
> It looked good for some hours but now again we got
>
> 2013-07-19 13:27:07.800 UTC [error]
> <0.18747.29>@riak_core_handoff
I'm getting again crash reports about system_limits:
2013-07-19 14:30:58 UTC =CRASH REPORT
crasher:
initial call: riak_kv_exchange_fsm:init/1
pid: <0.25883.24>
registered_name: []
exception exit:
{{{system_limit,[{erlang,spawn,[riak_kv_get_put_monitor,spawned,[gets,<0.11717.
It looked good for some hours but now again we got
2013-07-19 13:27:07.800 UTC [error]
<0.18747.29>@riak_core_handoff_sender:start_fold:216 hinted_handoff transfer of
riak_kv_vnode from 'riak@10.46.109.207'
1136089163393944065322395631681798128560666312704 to 'riak@10.47.109.202'
113608916339
once again with the list included... argh
Hey Christian,
so it could be also a erlang limit? I found out why my riak instances
are all having different processlimits. My mcollectived daemons have
the different limits and when I triggered a puppetrun through
mcollective they got this processlimit
Hi Simon,
If you have objects that can be a s big as 15MB, it is probably wise to
increase the size of +zdbbl in order to avoid filling up buffers when these
large objects need to be transferred between nodes. What an appropriate level
is depends a lot on the size distribution of your data and
wow.. now I have something to search for..
riak46-1 Max processes unlimitedunlimited
processes
riak46-2 Max processes unlimitedunlimited
processes
riak46-3 Max processes unlimitedunlimited
The +zdbbl parameter helped a lot but the hinted handoffs didn't
disappear completely. I have no more busy dist port errors in the
_console.log_ (why aren't they in the error.log? it looks for me like a
serious problem you have.. at least our cluster was behaving not that
nice).
I'll try to increa
Follow the white rabbit:
http://docs.basho.com/riak/latest/cookbooks/Linux-Performance-Tuning/
Most recommended parameters are on that link.
HTH,
Guido.
On 18/07/13 19:48, Simon Effenberg wrote:
Sounds like zdbbl.. I'm running 1.3.1 but it started after added 6 more
nodes to the previously 1
Sounds like zdbbl.. I'm running 1.3.1 but it started after added 6 more
nodes to the previously 12 node cluster. So maybe it is because of a 18
node cluster?
I'll try the zdbbl stuff. Any other hint would be cool (if the new
kernel parameters are also good for 1.3.1.. could you provide them?).
Ch
More informations in the console.log:
2013-07-18 18:30:18.768 UTC [info]
<0.76.0>@riak_core_sysmon_handler:handle_event:92 monitor busy_dist_port
<0.21558.67> {#Port<0.7283>,'riak@10.47.109.203'}
2013-07-18 18:30:33.760 UTC [info]
<0.76.0>@riak_core_sysmon_handler:handle_event:92 monitor busy_
If what you are describing is happening for 1.4, type riak-admin diag
and see the new recommended kernel parameters, also, on vm.args
uncomment the +zdbbl 32768 parameter, since what you are describing is
similar to what happened to us when we upgraded to 1.4.
HTH,
Guido.
On 18/07/13 19:21,
It's more than 30 handoffs sometimes:
Attempting to restart script through sudo -H -u riak
'riak@10.47.109.209' waiting to handoff 6 partitions
'riak@10.47.109.208' waiting to handoff 2 partitions
'riak@10.47.109.207' waiting to handoff 1 partitions
'riak@10.47.109.206' waiting to handoff 14 parti
13 matches
Mail list logo