Hi Fyodor, Following up, to help us troubleshoot, would you mind answering a few questions about your environment:
- What platform are you running? - What version of Riak Search are you using? - Did you install Riak Search from our pre-built binaries, or did you compile from source? - If you compiled from source, what version of Erlang are you running? - What interface are you using to index the files? (Solr or KV?) - Are you using the default schema? If not, can you send a copy of your schema file? - Can you send us a sampling of your data, anonymized if necessary. Best, Rusty On Tue, Jul 5, 2011 at 9:15 AM, Ryan Zezeski <rzeze...@basho.com> wrote: > Fyodor, > > I can't tell you exactly what caused this to happen but I can tell you how > to move past it. Search uses two data structures to store the index: > buffers and segments. A buffer is an in-memory structure backed by a file > on disk. Overtime buffers are converted to segments. All segments live on > disk but there is an in-memory offset table to perform lookups. During a > request if the vnode to handle that request is not already up it will be > started. During the vnode's initialization it will read all buffers and > segment tables into memory. In your case, each time the vnode is started it > crashes while trying to read the buffer file. Looking at the binary in your > trace it looks like somehow the data became corrupted. First off, I'm > confused by the syntax of the binary in your stack trace. I.e. what's up > with the brackets surrounding that binary data? That aside, I see two terms > in that data, i.e. there are two occurrences of the byte '131' which > indicates the start of a term. The second term is valid: > > [{{<<"logs">>,<<"text">>,<<"SEQ=1">>}, > <<"ae2b12ae-a155-11e0-9e33-00219bfc3293">>, > -1309244813808575, > [{p,[14]}]}] > > However, the first term seems to have been truncated/corrupted somehow. > Why? I'm not sure. My immediate guess would be that a write failed at some > point, writing bad data to the buffer file, the vnode crashed, and then when > it started back up it couldn't read back the buffer file. The code to read > the buffer data expects correct data or it will simply crash, as you see. > This will cause a perpetual series of crashes until the problem is manually > resolved. In this case you can just move your buffer files, for the > crashing vnodes, one at a time until the problem goes away. This will cause > you to lose some of your indexed data. For example, in your case the > crashing vnode is for partition > 433883298582611803841718934712646521460354973696. > You can cd to > riak_search/data/merge_index/433883298582611803841718934712646521460354973696 > and then mv your buffer.* files to something like corrupt-buffer.*. > > TL;DR - For one reason or another a buffer file became corrupted. As a > workaround you can move your buffer files out of the way. > > -Ryan > > On Sat, Jul 2, 2011 at 6:40 AM, Fyodor Yarochkin <fyodo...@armorize.com>wrote: > >> Greetings, >> >> I've been running a single node riaksearch instance, while came >> across this problem: after inserting roughly 200Mb of data every >> consequential insert (into any bucket) would start to time out with a >> sequence of errors logs that point on riak_search_vnode_master crash: >> >> =SUPERVISOR REPORT==== 2-Jul-2011::06:04:57 === >> Supervisor: {local,riak_search_sup} >> Context: child_terminated >> Reason: >> >> {{badmatch,{error,{{badmatch,{error,{badarg,[{erlang,binary_to_term,[<<[131,108,0,0,0,2,104,4,104,3,109,0,0,0,4,108,111,103,115,109,0,0,0,4,116,101,120,116,109,0,0,0,16,91,49,50,49,49,49,56,48,46,55,49,54,51,55,52,93,109,0,0,0,36,97,97,54,55,53,52,53,99,45,97,49,53,53,45,49,49,101,48,45,57,101,51,51,45,48,48,50,49,57,98,102,99,51,50,57,51,110,7,1,112,21,181,79,192,166,4,108,0,0,0,1,104,0,0,0,106,131,108,0,0,0,1,104,4,104,3,109,0,0,0,4,108,111,103,115,109,0,0,0,4,116,101,120,116,109,0,0,0,5,83,69,81,61,49,109,0,0,0,36,97,101,50,98,49,50,97,101,45,97,49,53,53,45,49,49,101,48,45,57,101,51,51,45,48,48,50,49,57,98,102,99,51,50,57,51,110,7,1,191,19,13,80,192,166,4,108,0,0,0,1,104,2,100,0,1,112,107,0,1,14,106,106]>>]},{mi_buffer,read_value,1},{mi_buffer,open_inner,2},{mi_buffer,new,1},{mi_server,read_buffers,4},{mi_server,read_buf_and_seg,1},{mi_server,init,1},{gen_server,init_it,6}]}}},[{merge_index_backend,start,2},{riak_search_vnode,init,1},{riak_core_vnode,init,1},{gen_fsm,init_it,6},{proc_lib,init_p_do_apply,3}]}}},[{riak_core_vnode_master,get_vnode,2},{riak_core_vnode_master,handle_call,3},{gen_server,handle_msg,5},{proc_lib,init_p_do_apply,3}]} >> Offender: >> >> [{pid,<0.754.0>},{name,riak_search_vnode_master},{mfa,{riak_core_vnode_master,start_link,[riak_search_vnode]}},{restart_type,permanent},{shutdown,5000},{child_type,worker}] >> >> >> (the full paste of error dump log is here http://pastebin.com/0Bj5cJAQ) >> >> Reads still work and I am slighly confused on the reason of the crash. >> The availability of RAM is one of the things I suspect here: >> "mem_total":1059192832,"mem_allocated":893632512,". There is no >> shortage of the disk space or other resources on the system. I am >> abit stuck as to where to start troubleshooting this issue. Any >> pointers or hints would be appreciated greatly! :) >> >> >> regards, >> -F >> >> _______________________________________________ >> riak-users mailing list >> riak-users@lists.basho.com >> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >> > > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com