Re: Warning "Can not start proc_lib:init_p"

2013-04-08 Thread Ingo Rockel
I already tried the HEAD-Request, which resulted in the same 503 after some time. And I don't have the vector clock. So what would be the other options? I already patched our software and this object is ignored now. It only contains data no-one needs (it is an archive of references to archived

Re: Warning "Can not start proc_lib:init_p"

2013-04-08 Thread Evan Vigil-McClanahan
You could try to read it by doing a HEAD request, which while cluster impacting wouldn't try to pass everything over the wire. This isn't necessary if you already have the latest vclock from some previous attempt or write. Then you could overwrite the object with the latest vclock (again, more imp

Re: Warning "Can not start proc_lib:init_p"

2013-04-08 Thread Ingo Rockel
Hi, I've finally been able to identify the big object (was a tough one), but unfortuneately, riak fails deleting it: irockel@bighead:~$ curl -v -X DELETE "http://172.22.3.22:8091/riak/m/Oa|1" * About to connect() to 172.22.3.22 port 8091 (#0) * Trying 172.22.3.22... connected > DELETE /riak/

Re: Warning "Can not start proc_lib:init_p"

2013-04-04 Thread Ingo Rockel
Thanks a lot for pointing into the right direction (the huge object), would have taken a lot longer for me to find out myself! Am 04.04.2013 17:51, schrieb Evan Vigil-McClanahan: Possible, but would need more information to make a guess. I'd keep a close eye on that node. On Thu, Apr 4, 2013

Re: Warning "Can not start proc_lib:init_p"

2013-04-04 Thread Evan Vigil-McClanahan
Possible, but would need more information to make a guess. I'd keep a close eye on that node. On Thu, Apr 4, 2013 at 10:34 AM, Ingo Rockel wrote: > thanks, but it was a very obvious c&p error :) and we already have the > ERL_MAX_ETS_TABLES set to 8192 as it is in the default vm.args. > > The onl

Re: Warning "Can not start proc_lib:init_p"

2013-04-04 Thread Ingo Rockel
thanks, but it was a very obvious c&p error :) and we already have the ERL_MAX_ETS_TABLES set to 8192 as it is in the default vm.args. The only other messages were about a lot of handoff going on. Maybe the node was getting some data concerning the 2GB object? Ingo Am 04.04.2013 17:25, schrie

Re: Warning "Can not start proc_lib:init_p"

2013-04-04 Thread Evan Vigil-McClanahan
Major error on my part here! > your vm.args: > -env ERL_MAX_ETS_TABLES 819 This should be -env ERL_MAX_ETS_TABLES 8192 Sorry for the sloppy cut and paste. Please do not do the former thing, or it will be very bad. > This is a good idea for all systems but is especially important for > people

Re: Warning "Can not start proc_lib:init_p"

2013-04-04 Thread Evan Vigil-McClanahan
One last note for 1.3. Please make sure that the following line is in your vm.args: -env ERL_MAX_ETS_TABLES 819 This is a good idea for all systems but is especially important for people with large rings. Were there any other messages? Riak constantly spawns new processes, but they don't tend t

Re: Warning "Can not start proc_lib:init_p"

2013-04-04 Thread Ingo Rockel
A grep for "too many processes" didn't reveal anything. The process got killed by the oom-killer. Am 04.04.2013 16:12, schrieb Evan Vigil-McClanahan: That's odd. It was getting killed by the OOM killer, or crashing because it couldn't allocate more memory? That's suggestive of something else

Re: Warning "Can not start proc_lib:init_p"

2013-04-04 Thread Ingo Rockel
the crashing node seems to be caused by the raised +P param, after last crash I commented the param and now the node runs just fine. Am 04.04.2013 15:43, schrieb Ingo Rockel: Hi Evan, we added monitoring of the object sizes and there was one object on one of the three nodes mentioned which was

Re: Warning "Can not start proc_lib:init_p"

2013-04-04 Thread Ingo Rockel
Hi Evan, we added monitoring of the object sizes and there was one object on one of the three nodes mentioned which was > 2GB!! We just changed the application code to get the id of this object to be able to delete it. But is does happen only about once a day. We right now have another node

Re: Warning "Can not start proc_lib:init_p"

2013-04-04 Thread Evan Vigil-McClanahan
If it's always the same three nodes it could well be same very large object being updated each day. Is there anything else that looks suspicious in your logs? Another sign of large objects is large_heap (or long_gc) messages from riak_sysmon. On Thu, Apr 4, 2013 at 3:58 AM, Ingo Rockel wrote: >

Re: Warning "Can not start proc_lib:init_p"

2013-04-04 Thread Ingo Rockel
Hi Evan, thanks for all the infos! I adjusted the leveldb-config as suggested, except the cache, which I reduced to 16MB, keeping this above the default helped a lot at least during load testing. And I added +P 130072 to the vm.args. Will be applied to the riak nodes the next hours. We have

Re: Warning "Can not start proc_lib:init_p"

2013-04-03 Thread Evan Vigil-McClanahan
Another engineer mentions that you posted your eleveldb section and I totally missed it: The eleveldb section: %% eLevelDB Config {eleveldb, [ {data_root, "/var/lib/riak/leveldb"}, {cache_size, 33554432}, {write_buffer_size_min, 67108864}, %% 64 MB in byte

Re: Warning "Can not start proc_lib:init_p"

2013-04-03 Thread Evan Vigil-McClanahan
Again, all of these things are signs of large objects, so if you could track the object_size stats on the cluster, I think that we might see something. Even if you have no monitoring, a simple shell script curling /stats/ on each node once a minute should do the job for a day or two. On Wed, Apr

Re: Warning "Can not start proc_lib:init_p"

2013-04-03 Thread Ingo Rockel
We just had it again (around this time of the day we have our highest user activity). I will set +P to 131072 tomorrow, anything else I should check or change? What about this memory-high-watermark which I get sporadically? Ingo Am 03.04.2013 17:57, schrieb Evan Vigil-McClanahan: As for +P i

Re: Warning "Can not start proc_lib:init_p"

2013-04-03 Thread Evan Vigil-McClanahan
As for +P it's been raised in R16 (which is on the current man page) on R15 it's only 32k. The behavior that you're describing does sound like a very large object getting put into the cluster (which may cause backups and push you up against the process limit, could have caused scheduler collapse o

Re: Warning "Can not start proc_lib:init_p"

2013-04-03 Thread Ingo Rockel
Evan, sys_process_count is somewhere between 5k and 11k on the nodes right now. Concerning your suggested +P config, according to the erlang-docs, the default for this param already is 262144, so setting it to 655536 would in fact lower it? We chose the ring size to be able to handle growth

Re: Warning "Can not start proc_lib:init_p"

2013-04-03 Thread Evan Vigil-McClanahan
Ingo, riak-admin status | grep sys_process_count will tell you how many processes are running. The default process limit on erlang is a little low, and we'd suggest raising in (especially with your extra-large ring_size). Erlang processes are cheap, so 65535 or even double that will be fine.

Re: Warning "Can not start proc_lib:init_p"

2013-04-03 Thread Ingo Rockel
I forgot to mention, we also sometimes see this one: 2013-04-03 17:04:08.551 [info] <0.56.0> alarm_handler: {set,{system_memory_high_watermark,[]}} (since Update to 1.3) Ingo Am 03.04.2013 17:03, schrieb Ingo Rockel: Hi Evan, I set swt very_low and zdbbl to 64MB, setting this params helped

Re: Warning "Can not start proc_lib:init_p"

2013-04-03 Thread Ingo Rockel
Hi Evan, I set swt very_low and zdbbl to 64MB, setting this params helped reducing the busy_dist_port and Monitor got {suppressed,... Messages a lot. But when the performance of the cluster suddenly drops we still see these messages. The cluster was updated to 1.3 in the meantime. The eleve

Re: Warning "Can not start proc_lib:init_p"

2013-04-03 Thread Evan Vigil-McClanahan
For your prior mail, I thought that someone had answered. Our initial suggestion was to add +swt very_low to your vm.args, as well as setting the +zdbbl setting that Jon recommended in the list post you pointed to. If those help, moving to 1.3 should help more. Other limits in vm.args that can c

Re: Warning "Can not start proc_lib:init_p"

2013-04-03 Thread Ingo Rockel
Hi Evan, I'm not sure, I find a lot of these: 2013-03-30 23:27:52.992 [error] <0.8036.323>@riak_api_pb_server:handle_info:141 Unrecognized message {22243034,{error,timeout}} and some of these at the same time one of the kind below gets logged (although the one has a different time stamp):

Re: Warning "Can not start proc_lib:init_p"

2013-04-03 Thread Evan Vigil-McClanahan
Resending to the list: Ingo, That is an indication that the protocol buffers server can't spawn a put fsm, which means that a put cannot be done for some reason or another. Are there any other messages that appear around this time that might indicate why? On Wed, Apr 3, 2013 at 12:09 AM