On Tue, Sep 20, 2011 at 7:53 AM, Ryan Zezeski <rzeze...@basho.com> wrote:
> Elias, > > It's hard to say from just this one stacktrace but it seems that the > vnode/leveldb backend might be failing under load causing the R value to go > unmet. The Search hook has to perform a read of a special object it stores > in the backend and that's what is failing here. However, the root cause > seems to be vnodes failing. I say this because of the presence of the > `{r_val_unsatisfied,2,1}` msg. Could you check the error and crash log > files and see if you can't find other traces that might shed more light on > this? > Alas, I upgraded to 1.0.0pre4 and no longer observe the behavior. Before that I verified that the problem also occurred when using Bitcask, so it seemed not related to the backend in use. That said, now I am seeing a different error in 1.0.0pre4. I am using the same set up (3 nodes, 1 client with 12 concurrent PB connections spread across the nodes, inserting data as fast as it can). This error is a lot rarer. I have to insert several hundred or million objects before it manifests itself, although I've seen it happen once soon after starting the load script. The error occurs within one of the nodes and causes the node to go into a tight loop. The node will not respond to a "riak stop" command. I usually have to kill the riak processes. The node loops generating the following error: 2011-09-20 01:06:45.819 [error] <0.107.0> {mochiweb_socket_server,310,{acceptor_error,{error,accept_failed}}} 2011-09-20 01:06:45.820 [error] <0.13978.273> application: mochiweb, "Accept failed error", "{error,emfile} The long preamble of errors leading to the loop can be seen at http://pastebin.com/4Eu2UMYf I particularly found the following puzzling: 2011-09-20 00:43:39.144 [error] <0.13420.273> CRASH REPORT Process [] with 0 neighbours crashed with reason: {error,{badmatch,{error,emfile}}} Notice that the process list is empty. Now from what I've been able to find,{error,emfile} usually means you are out of file descriptors. Yes? If so, the system is running with a fd ulimit of 4096. Is that not considered sufficient? Again, this is a single client with only 12 concurrent connections. I am using the leveldb backend, if that makes a difference. Could there be a fd leak somewhere within Riak? Maybe the new eleveldb backend? Is there some command to show how many file descriptors are in use while the node is running?
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com