Hello group We are experimenting with an app running over 3 nodes. The app is under fairly constant write from 2 or 3 writing threads for 6 hours per day and a moderate amount of read requests. We have experienced several crashes that we would like some explanation of - we are hoping it is some configuration issue resulting in an easy fix and give us some stability :)
After a node crashes, it can often take several attempts to restart it, IO and CPU goes high. We have found that if we halt any read/writes during the restart - then the node is more likely to come back and work through the high load process of the node being bought back to life. The two errors that show up in the logs are: *riak_kv_vnode worker pool crashed... timeout* *2012-11-11 06:37:16.105 UTC [error] <0.5037.0>@riak_core_vnode:handle_info:510 645115957103093866238345258191171801645001474048 riak_kv_vnode worker pool crashed {timeout,{gen_fsm,sync_send_event,[<0.5040.0>,{checkout,false,5000},5000]}} 2012-11-11 06:37:16.106 UTC [error] <0.5083.0>@riak_core_vnode:handle_info:510* Followed or accompanied by: *Resource temporarily unavailable* *2012-11-11 06:37:18.015 UTC [error] <0.19495.424>@riak_kv_vnode:init:265 Failed to start riak_kv_eleveldb_backend Reason: {db_open,"IO error: lock /var/lib/riak/leveldb/576608067853207791947547531657596035098629636096/LOCK: Resource temporarily unavailable"}* I have read reference that ulimit should be increased from the above error - what should this be set to if our limit of 4096 is too low (is there any formular based on number of vnodes etc?). More log file context and our app.config is contained here: https://docs.google.com/folder/d/0B5dwJ114R8NzQTZEZ2VnMWJxQWM/edit Our configuration: ring_creation_size: 256 Physical Nodes: 3 ulimit -n 4096 ubuntu 10.04 vm.swappiness = 0 Disk is on SAN, formatted as ext4 mounted with "noatime,barrier=0,data=writeback" options using deadline scheduler Best regards Marcus
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com