Hello group

We are experimenting with an app running over 3 nodes. The app is under
fairly constant write from 2 or 3 writing threads for 6 hours per day and a
moderate amount of read requests. We have experienced several crashes that
we would like some explanation of - we are hoping it is some configuration
issue resulting in an easy fix and give us some stability :)

After a node crashes, it can often take several attempts to restart it, IO
and CPU goes high. We have found that if we halt any read/writes during the
restart - then the node is more likely to come back and work through the
high load process of the node being bought back to life.

The two errors that show up in the logs are: *riak_kv_vnode worker pool
crashed... timeout*

*2012-11-11 06:37:16.105 UTC [error]
<0.5037.0>@riak_core_vnode:handle_info:510
645115957103093866238345258191171801645001474048 riak_kv_vnode worker pool
crashed
{timeout,{gen_fsm,sync_send_event,[<0.5040.0>,{checkout,false,5000},5000]}}
2012-11-11 06:37:16.106 UTC [error]
<0.5083.0>@riak_core_vnode:handle_info:510*


Followed or accompanied by: *Resource temporarily unavailable*

*2012-11-11 06:37:18.015 UTC [error] <0.19495.424>@riak_kv_vnode:init:265
Failed to start riak_kv_eleveldb_backend Reason: {db_open,"IO error: lock
/var/lib/riak/leveldb/576608067853207791947547531657596035098629636096/LOCK:
Resource temporarily unavailable"}*

I have read reference that ulimit should be increased from the above error
- what should this be set to if our limit of 4096 is too low (is there any
formular based on number of vnodes etc?).

More log file context and our app.config is contained here:

https://docs.google.com/folder/d/0B5dwJ114R8NzQTZEZ2VnMWJxQWM/edit

Our configuration:
ring_creation_size: 256
Physical Nodes: 3
ulimit -n 4096

ubuntu 10.04
vm.swappiness = 0
Disk is on SAN, formatted as ext4 mounted with
"noatime,barrier=0,data=writeback" options
using deadline scheduler

Best regards

Marcus
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to