Fyodor, I can't tell you exactly what caused this to happen but I can tell you how to move past it. Search uses two data structures to store the index: buffers and segments. A buffer is an in-memory structure backed by a file on disk. Overtime buffers are converted to segments. All segments live on disk but there is an in-memory offset table to perform lookups. During a request if the vnode to handle that request is not already up it will be started. During the vnode's initialization it will read all buffers and segment tables into memory. In your case, each time the vnode is started it crashes while trying to read the buffer file. Looking at the binary in your trace it looks like somehow the data became corrupted. First off, I'm confused by the syntax of the binary in your stack trace. I.e. what's up with the brackets surrounding that binary data? That aside, I see two terms in that data, i.e. there are two occurrences of the byte '131' which indicates the start of a term. The second term is valid:
[{{<<"logs">>,<<"text">>,<<"SEQ=1">>}, <<"ae2b12ae-a155-11e0-9e33-00219bfc3293">>, -1309244813808575, [{p,[14]}]}] However, the first term seems to have been truncated/corrupted somehow. Why? I'm not sure. My immediate guess would be that a write failed at some point, writing bad data to the buffer file, the vnode crashed, and then when it started back up it couldn't read back the buffer file. The code to read the buffer data expects correct data or it will simply crash, as you see. This will cause a perpetual series of crashes until the problem is manually resolved. In this case you can just move your buffer files, for the crashing vnodes, one at a time until the problem goes away. This will cause you to lose some of your indexed data. For example, in your case the crashing vnode is for partition 433883298582611803841718934712646521460354973696. You can cd to riak_search/data/merge_index/433883298582611803841718934712646521460354973696 and then mv your buffer.* files to something like corrupt-buffer.*. TL;DR - For one reason or another a buffer file became corrupted. As a workaround you can move your buffer files out of the way. -Ryan On Sat, Jul 2, 2011 at 6:40 AM, Fyodor Yarochkin <fyodo...@armorize.com>wrote: > Greetings, > > I've been running a single node riaksearch instance, while came > across this problem: after inserting roughly 200Mb of data every > consequential insert (into any bucket) would start to time out with a > sequence of errors logs that point on riak_search_vnode_master crash: > > =SUPERVISOR REPORT==== 2-Jul-2011::06:04:57 === > Supervisor: {local,riak_search_sup} > Context: child_terminated > Reason: > > {{badmatch,{error,{{badmatch,{error,{badarg,[{erlang,binary_to_term,[<<[131,108,0,0,0,2,104,4,104,3,109,0,0,0,4,108,111,103,115,109,0,0,0,4,116,101,120,116,109,0,0,0,16,91,49,50,49,49,49,56,48,46,55,49,54,51,55,52,93,109,0,0,0,36,97,97,54,55,53,52,53,99,45,97,49,53,53,45,49,49,101,48,45,57,101,51,51,45,48,48,50,49,57,98,102,99,51,50,57,51,110,7,1,112,21,181,79,192,166,4,108,0,0,0,1,104,0,0,0,106,131,108,0,0,0,1,104,4,104,3,109,0,0,0,4,108,111,103,115,109,0,0,0,4,116,101,120,116,109,0,0,0,5,83,69,81,61,49,109,0,0,0,36,97,101,50,98,49,50,97,101,45,97,49,53,53,45,49,49,101,48,45,57,101,51,51,45,48,48,50,49,57,98,102,99,51,50,57,51,110,7,1,191,19,13,80,192,166,4,108,0,0,0,1,104,2,100,0,1,112,107,0,1,14,106,106]>>]},{mi_buffer,read_value,1},{mi_buffer,open_inner,2},{mi_buffer,new,1},{mi_server,read_buffers,4},{mi_server,read_buf_and_seg,1},{mi_server,init,1},{gen_server,init_it,6}]}}},[{merge_index_backend,start,2},{riak_search_vnode,init,1},{riak_core_vnode,init,1},{gen_fsm,init_it,6},{proc_lib,init_p_do_apply,3}]}}},[{riak_core_vnode_master,get_vnode,2},{riak_core_vnode_master,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]} > Offender: > > [{pid,<0.754.0>},{name,riak_search_vnode_master},{mfa,{riak_core_vnode_master,start_link,[riak_search_vnode]}},{restart_type,permanent},{shutdown,5000},{child_type,worker}] > > > (the full paste of error dump log is here http://pastebin.com/0Bj5cJAQ) > > Reads still work and I am slighly confused on the reason of the crash. > The availability of RAM is one of the things I suspect here: > "mem_total":1059192832,"mem_allocated":893632512,". There is no > shortage of the disk space or other resources on the system. I am > abit stuck as to where to start troubleshooting this issue. Any > pointers or hints would be appreciated greatly! :) > > > regards, > -F > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com