Hi Glory, On Tue, Feb 11, 2014 at 1:29 AM, Glory Lo <gloryl...@gmail.com> wrote: > > > While indexing it seem to run fine part way.. then I noticed it hangs (it > freezed my machine on a couple of attempts on linux mint 13). Then it > crashes. I have 3 nodes running and I only tried indexing one of them > doing a search-cmd mybucket dev1/data/leveldb >
What was the process for indexing? How much data were you indexing? What content-type? How big is each object? What is your schema? > My crash log has multiple errors of different sorts which I haven't > discern yet. However, the last errors w/ a close timestamp are as follows > which mentions some timeouts (likely with the freeze): > It's hard to discern ripple effect errors from the the origin error. I see some stuff that is indicative of disk corruption but there's a good chance that only happened because some other error caused merge_index to hard crash. Could you attach a tar.gz of all your logs? > > 2014-02-08 23:15:53 =ERROR REPORT==== > Error in process <0.2799.1> on node 'dev1@127.0.0.1' with exit value: > {badarg,[{ets,lookup,[145752322,{1118962191081472546749696200048404186924073353216,' > dev2@127.0.0.1 > '}],[]},{riak_search_client,'-process_terms_1/4-fun-2-',3,[{file,"src/riak_search_client.erl"},{line,295}]},{riak_search_utils,'-ptransform/2-fun-0-',2,[{file,"src/riak_search_utils.... > This is an error finding the temporary ETS table for building the postings list. That's a really interesting error to have and makes me wonder if you someone hit the ETS system limit. I'm not even sure that is possible given how high we've raised the default limit. > > 2014-02-08 23:18:46 =ERROR REPORT==== > Error in process <0.2350.1> on node 'dev1@127.0.0.1' with exit value: > {terminated,[{io,format,[<17869.23.0>,"DEBUG: ~p:~p - ~p~n~n > ~p~n~n",[riak_search_dir_indexer,194,"{ error , Type , Error , erlang : > get_stacktrace ( ) > }",{error,error,{case_clause,{error,timeout}},[{riak_search_client,'-index_docs/1-fun-0-'... > I'm actually a bit baffled exactly what this trace is saying. I think more detail might be in the error.log. > > 2014-02-08 23:20:00 =ERROR REPORT==== > Error in process <0.4231.1> on node 'dev1@127.0.0.1' with exit value: > {{case_clause,{data,4711}},[{cpu_sup,get_uint32_measurement,2,[{file,"cpu_sup.erl"},{line,227}]},{cpu_sup,measurement_server_loop,1,[{file,"cpu_sup.erl"},{line,585}]}]} > Yikes, this looks really bad and makes me wonder if this is an environment issue as this error should not be related to search. > > 2014-02-08 23:23:37 =ERROR REPORT==== > Error in process <0.6359.1> on node 'dev1@127.0.0.1' with exit value: > {badarg,[{erlang,binary_to_term,[<<31359 > bytes>>],[]},{mi_segment,iterate_all_bytes,2,[{file,"src/mi_segment.erl"},{line,167}]},{mi_segment_writer,from_iterator,4,[{file,"src/mi_segment_writer.erl"},{line,102}]},{mi_segment_writer,from_iterator... > This is typically what you see when data corruption occurs but it's hard to say if data corruption caused the other errors of the other errors caused corruption. > > > > 2014-02-08 23:24:58 =ERROR REPORT==== > ** State machine <0.3211.0> terminating > ** Last message in was {'EXIT',<0.168.0>,shutdown} > ** When State == active > ** Data == > {state,1438665674247607560106752257205091097473808596992,riak_search_vnode,{vstate,1438665674247607560106752257205091097473808596992,merge_index_backend,{state,1438665674247607560106752257205091097473808596992,<0.3212.0>}},undefined,none,undefined,undefined,<0.3221.0>,{pool,riak_search_worker,2,[]},undefined,86616} > ** Reason for termination = > ** {timeout,{gen_server,call,[<0.3212.0>,stop]}} > 2014-02-08 23:24:58 =CRASH REPORT==== > crasher: > initial call: riak_core_vnode:init/1 > pid: <0.3211.0> > registered_name: [] > exception exit: > {{timeout,{gen_server,call,[<0.3212.0>,stop]}},[{gen_fsm,terminate,7,[{file,"gen_fsm.erl"},{line,589}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} > ancestors: [riak_core_vnode_sup,riak_core_sup,<0.162.0>] > messages: > [{'EXIT',<0.3221.0>,shutdown},{#Ref<0.0.1.215952>,ok},{'EXIT',<0.3212.0>,normal}] > links: [] > dictionary: [{random_seed,{27839,21123,25074}}] > trap_exit: true > status: running > heap_size: 46368 > stack_size: 24 > reductions: 24758 > neighbours: > This is one of the riak_search vnodes crashing because it's merge index process crashed. Which is expected given the circumstances. > 2014-02-08 23:24:58 =ERROR REPORT==== > ** State machine <0.5392.1> terminating > ** Last message in was > {'$gen_sync_all_state_event',{<0.5390.1>,#Ref<0.0.1.215861>},{shutdown,60000}} > ** When State == ready > ** Data == {state,{[],[]},<0.5393.1>,[],undefined} > ** Reason for termination = > ** {timeout,{gen_fsm,sync_send_all_state_event,[<0.5393.1>,stop]}} > 2014-02-08 23:24:58 =CRASH REPORT==== > crasher: > initial call: riak_core_vnode_worker_pool:init/1 > pid: <0.5392.1> > registered_name: [] > exception exit: > {{timeout,{gen_fsm,sync_send_all_state_event,[<0.5393.1>,stop]}},[{gen_fsm,handle_msg,7,[{file,"gen_fsm.erl"},{line,511}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} > ancestors: [<0.5390.1>,riak_core_vnode_sup,riak_core_sup,<0.162.0>] > messages: [] > links: [<0.5390.1>,<0.5393.1>] > dictionary: [] > trap_exit: false > status: running > heap_size: 233 > stack_size: 24 > reductions: 225 > neighbours: > This is the worker pool crashing probably because it's vnode crashed. > 2014-02-08 23:25:01 =SUPERVISOR REPORT==== > Supervisor: {local,riak_core_vnode_sup} > Context: shutdown_error > Reason: {timeout,{gen_server,call,[<0.5436.1>,stop]}} > Offender: > [{nb_children,1},{name,undefined},{mfargs,{riak_core_vnode,start_link,[]}},{restart_type,temporary},{shutdown,300000},{child_type,worker}] > Supervisor reports just indicating that vnodes have crashed because of a timeout. Expected given the circumstances. > > Can someone provide some guidance as to where to troubleshoot the issue? > If it's timing out which is a mere symptom of it being in hang state. > What however is the root cause of it being stuck with those other errors > like bad arg and bad match. > Attach your logs and I should be able to take a closer look. -Z
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com