Today the problem did *not* occur. This is the first time in a few weeks.
Turns out one of our internally-written modules, containing a fix for how
connections were utilized, got deployed.
Hopefully this has solved the problem.
Thanks for the help Evan!
--
Dave Brady
----- Original Message -----
From: "Evan Vigil-McClanahan" <emcclana...@basho.com>
To: "Dave Brady" <dbr...@weborama.com>
Cc: riak-users@lists.basho.com
Sent: Tuesday, April 2, 2013 4:47:31 PM GMT +01:00 Amsterdam / Berlin / Bern /
Rome / Stockholm / Vienna
Subject: Re: Having to raise VM number-of-processes limit
If your n_val is still three, then three sad nodes is a suspicious
number. My first guess would be a very large value being put in and
other requests backing up behind it. That would explain the
health-check failures (especially if you're normally doing a lot of
small/fast reads and writes).
However, even that explanation doesn't get us anywhere near 500000
processes. It'd be really nice to see that top output. Maybe leave
it running and spooling to a file to see if you can capture the
output? What does a frame of it look like now, without the problem
happening?
On Tue, Apr 2, 2013 at 7:31 AM, Dave Brady <dbr...@weborama.com> wrote:
> It happened again today, though I was not available to watch it at the time.
>
> Three nodes each showed riak_kv being stopped for one minute:
>
> 2013-04-02 11:10:57.923 [info] <0.2833.1447>@riak_kv_app:check_kv_health:239
> Disabling riak_kv due to large message queues. Offending vnodes:
> [{319703483166135013357056057156686910549735243776,5798}]
> 2013-04-02 11:11:57.924 [info] <0.3589.1447>@riak_kv_app:check_kv_health:242
> Re-enabling riak_kv after successful health check
>
> --
> Dave Brady
>
> ----- Original Message -----
> From: "Dave Brady" <dbr...@weborama.com>
> To: "Evan Vigil-McClanahan" <emcclana...@basho.com>
> Cc: riak-users@lists.basho.com
> Sent: Monday, April 1, 2013 11:15:47 AM GMT +01:00 Amsterdam / Berlin / Bern
> / Rome / Stockholm / Vienna
> Subject: Re: Having to raise VM number-of-processes limit
>
> Hi Evan,
>
> Thanks for the suggestions!
>
> I did not think that raising that limit was normal. Glad to have
> confirmation.
>
> I'll go through the logs again, and run 'riak-admin top ...' the next time it
> happens.
>
> --
> Dave Brady
>
> ----- Original Message -----
> From: "Evan Vigil-McClanahan" <emcclana...@basho.com>
> To: "Dave Brady" <dbr...@weborama.com>
> Cc: riak-users@lists.basho.com
> Sent: Saturday, March 30, 2013 11:03:30 PM GMT +01:00 Amsterdam / Berlin /
> Bern / Rome / Stockholm / Vienna
> Subject: Re: Having to raise VM number-of-processes limit
>
> Dave,
>
> If you're seeing the process count go that high, it suggests to me
> that something else is wrong. Typically, even for heavily loaded
> clusters, hundreds of thousands of processes isn't normal. Is there
> anything else in the logs?
>
> When a node sees this sort of behavior start, does riak-admin top
> -sort msg_q look like?
>
> On Sat, Mar 30, 2013 at 2:07 PM, Dave Brady <dbr...@weborama.com> wrote:
>> Hello,
>>
>> I have run into a situation whereby I started seeing:
>>
>> [error] emulator Too many processes
>>
>> when some of our new jobs ran. These jobs are in perl using Net::Riak,
>> communicating to the cluster via PBC. They fire tens of thousands of fetchs
>> and stores over the course of about 20 minutes.
>>
>> Our cluster has five nodes with 1.3, using eLevelDB.
>>
>> I have been raising the limit (+P in vm.args) in increments from the default
>> of 32768. Currently at 524288, and that is still not high enough.
>>
>> Have any of you had to increase this limit?
>>
>> Thanks!
>>
>> _______________________________________________
>> riak-users mailing list
>> riak-users@lists.basho.com
>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
>>
>
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com