I'm running a 4-"node" cluster on one machine, riak-1.2.0. The configuration is very close to the default development environment setup, except I've turned on riak search in app.config for each node and added the indexing pre-commit hook and a schema (I've tested it on individual documents and it indexes them correctly) for one bucket. I added ~5mm documents to this bucket before I turned search on, and from what I've been told the best way to re-index existing documents is to re-add each (search:index_doc doesn't seem to do anything for me). I'm trying to do this from the local console of one of the nodes in the cluster as follows:
{ok, C} = riak:local_client(). {ok, Keys} = C:list_keys(<<"user">>). plists:foreach(fun(Key) -> {ok, Doc} = C:get(<<"user">>, Key), C:put(Doc) end, Keys, {processes, 8}). 8 processes reading from/writing to one cluster in parallel shouldn't be a problem and hopefully reduces the amount of time wasted waiting for IO. Each key is only re-added by a single process, so there should be no issues with consensus, right? This has failed every time I've tried to do it for one reason or another. This run, the first bad thing that happened was node 4 going down. From dev/dev4/log/crash.log: 2012-09-17 20:08:15 =CRASH REPORT==== crasher: initial call: application_master:init/4 pid: <0.486.0> registered_name: [] exception exit: {{bad_return,{{riak_search_app,start,[normal,[]]},{'EXIT',{badarg,[{ets,lookup,[riak_core_node_watcher,{by_node,' dev4@127.0.0.1 '}],[]},{riak_core_node_watcher,internal_get_services,1,[{file,"src/riak_core_node_watcher.erl"},{line,412}]},{riak_core,wait_for_service,2,[{file,"src/riak_core.erl"},{line,435}]},{riak_search_app,start,2,[{file,"src/riak_search_app.erl"},{line,22}]},{application_master,start_it_old,4,[{file,"application_master.erl"},{line,274}]}]}}}},[{application_master,init,4,[{file,"application_master.erl"},{line,138}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,227}]}]} ancestors: [<0.485.0>] messages: [{'EXIT',<0.487.0>,normal}] links: [<0.485.0>,<0.7.0>] dictionary: [] trap_exit: true status: running heap_size: 987 stack_size: 24 reductions: 184 neighbours: 2012-09-17 20:10:02 =ERROR REPORT==== Error in process <0.629.0> on node 'dev4@127.0.0.1' with exit value: {function_clause,[{proplists,get_value,[one,{error,{riak_api,pbc_connects},nonexistent_metric},undefined],[{file,"proplists.erl"},{line,222}]},{riak_kv_stat,backwards_compat,3,[{file,"src/riak_kv_stat.erl"},{line,337}]},{riak_kv_stat... Watching the console on node 1, I saw a lot of errors like: 20:05:02.546 [error] Supervisor riak_kv_put_fsm_sup had child undefined started with {riak_kv_put_fsm,start_link,undefined} at <0.5165.19> exit with reason {{nodedown,'dev4@127.0.0.1 '},{gen_server,call,[{riak_search_vnode_master,'dev4@127.0.0.1'},{riak_vnode_req_v1,890602560248518965780370444936484965102833893376,{server,undefined,undefined},{index_v1,[{<<"user">>,<<"user_profile_user_app_stat_fsh">>,<<"8140">>,<<"503ea1bb81340f1ff4b0dcdd">>,[{p,[0]}],1347926701849954},{<<"user">>,<<"_id">>,<<"503ea1bb81340f1ff4b0dcdd">>,<<"503ea1bb81340f1ff4b0dcdd">>,[{p,[0]}],1347926701849954}]}},infinity]}} in context child_terminated I started node 4 back up and allowed the operation to continue. Later node 1, where I was using the console to re-add documents, went down. console.log shows many of these: 20:13:03.159 [info] Starting hinted_handoff transfer of riak_kv_vnode from ' dev1@127.0.0.1' 1164634117248063262943561351070788031288321245184 to ' dev4@127.0.0.1' 1164634117248063262943561351070788031288321245184 ... 20:13:21.660 [info] An outbound handoff of partition riak_search_vnode 251195593916248939066258330623111144003363405824 was terminated for reason: {shutdown,max_concurrency} along with successful compaction entries. error.log has no entries contemporaneous to the crash. crash.log has no entries contemporaneous to the crash, just older entries from when node 4 went down: 2012-09-17 20:05:02 =SUPERVISOR REPORT==== Supervisor: {local,riak_kv_put_fsm_sup} Context: child_terminated Reason: {{nodedown,'dev4@127.0.0.1 '},{gen_server,call,[{riak_search_vnode_master,'dev4@127.0.0.1 '},{riak_vnode_req_v1,890602560248518965780370444936484965102833893376,{server,undefined,undefined},{index_v1,[{<<"user">>,<<"user_profile_user_app_stat_fsh">>,<<"8140">>,<<"503ea1bb81340f1ff4b0dcdd">>,[{p,[0]}],1347926701849954},{<<"user">>,<<"_id">>,<<"503ea1bb81340f1ff4b0dcdd">>,<<"503ea1bb81340f1ff4b0dcdd">>,[{p,[0]}],1347926701849954}]}},infinity]}} Offender: [{pid,<0.5165.19>},{name,undefined},{mfargs,{riak_kv_put_fsm,start_link,undefined}},{restart_type,temporary},{shutdown,5000},{child_type,worker}] Actual console output ends like this: 20:13:21.755 [info] An outbound handoff of partition riak_search_vnode 1255977969581244695331291653115555720016817029120 was terminated for reason: {shutdown,max_concurrency} 20:13:24.090 [info] hinted_handoff transfer of riak_search_vnode from ' dev1@127.0.0.1' 342539446249430371453988632667878832731859189760 to ' dev4@127.0.0.1' 342539446249430371453988632667878832731859189760 completed: sent 12953 objects in 2.42 seconds 20:13:27.511 [info] Pid <0.1421.0> compacted 3 segments for 4446992 bytes in 3.454234 seconds, 1.23 MB/sec 20:13:37.676 [info] Pid <0.1315.0> compacted 3 segments for 3540694 bytes in 2.156376 seconds, 1.57 MB/sec 20:13:39.907 [info] Pid <0.1408.0> compacted 3 segments for 3645337 bytes in 2.230437 seconds, 1.56 MB/sec 20:13:42.860 [info] Pid <0.1439.0> compacted 3 segments for 3242485 bytes in 1.951775 seconds, 1.58 MB/sec 20:13:47.046 [info] Pid <0.1246.0> compacted 3 segments for 2196290 bytes in 1.181681 seconds, 1.77 MB/sec Segmentation fault: 11 This isn't in keeping with the stability other people seem to have with riak, so I'm guessing my cluster is misconfigured. I can attach full logs, app.config, and anything else if needed. Thanks, Ted
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com