Hi Guys, I am evaluating Riak as Kay-Value storage where my requirement is to store huge set of data(more than RAM), so Riak was setup with LevelDB as backend.
Benchmarking involved 25 Agents doing put/store on single node for 100M records. It runs well till 3M but then complete cluster crashes with making all nodes down. Following are the System as well as Riak configurations with error & crash logs Please help to find what I am missing, I need to test riak & use it in production as soon as possible. Nodes: 2 (I know cluster of 5 is best but this is just test setup) OS: Ubuntu 12.04 32bit CPU: Core i3 RAM: 4GB HDD: 500GB app.config [changes only] %% eLevelDB Config {eleveldb, [ {data_root, "/data/riak/leveldb"}, {block_size, 262144}, %%256k {cache_size, 104857600}, %% 100MB - default cache size 8MB per-partition {write_buffer_size, 524288000}, %% 500MB in bytes {write_buffer_size_min, 524288000}, %% 500MB in bytes {write_buffer_size_max, 524288000}, %% 500MB in bytes {max_open_files, 100} %% Maximum number of files open at once per partition- Default: 20 - Minimum: 20 ]}, vm.args [changes only] ## Enable kernel poll and a few async threads +K true +A 128 Bucket "riaktest" properties: {"props":{"allow_mult":false,"basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":true,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"riaktest","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"one","rw":"one","small_vclock":50,"w":"one","young_vclock":20}} relatime set in /etc/fstab on all drives OS open files limit sysctl fs.file-max set to 800000 Following are the error.log, crash.log and console.log* *files* error.log* --------------- 2012-06-15 19:09:31.777 [error] <0.20970.188> gen_server <0.20970.188> terminated with reason: {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} 2012-06-15 19:09:32.219 [error] <0.20970.188> CRASH REPORT Process <0.20970.188> with 0 neighbours crashed with reason: {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} 2012-06-15 19:09:32.320 [error] <0.284.0> Supervisor riak_kv_pb_socket_sup had child undefined started with {riak_kv_pb_socket,start_link,undefined} at <0.20970.188> exit with reason {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} in context child_terminated 2012-06-15 19:09:32.824 [error] <0.20974.188> gen_server <0.20974.188> terminated with reason: {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} 2012-06-15 19:09:32.972 [error] <0.20974.188> CRASH REPORT Process <0.20974.188> with 0 neighbours crashed with reason: {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} * Crash.log* -------------- 2012-06-15 19:09:31 =ERROR REPORT==== ** Generic server <0.20970.188> terminating ** Last message in was {tcp,#Port<0.6076011>,[11|<<10,6,117,114,108,99,97,116,18,39,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,55,50,48,51,51,57,34,122,10,109,34,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,55,50,48,51,51,57,58,58,32,99,97,116,101,103,111,114,121,32,49,44,32,107,101,121,119,111,114,100,32,49,44,32,99,97,116,101,103,111,114,121,32,50,44,32,99,97,116,101,103,111,114,121,32,51,44,32,107,101,121,119,111,114,100,50,44,32,107,101,121,119,111,114,100,51,34,18,9,116,101,120,116,47,106,115,111,110,40,2,48,2,56,1>>]} ** When Server state == {state,#Port<0.6076011>,{riak_client,' riak@10.90.15.198',undefined},undefined,undefined,<<0,0,0,0>>} ** Reason for termination == ** {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} 2012-06-15 19:09:32 =CRASH REPORT==== crasher: initial call: gen:init_it/6 pid: <0.20970.188> registered_name: [] exception exit: {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} in function gen_server2:terminate/6 in call from proc_lib:init_p_do_apply/3 ancestors: [riak_kv_pb_socket_sup,riak_kv_sup,<0.279.0>] messages: [] links: [#Port<0.6076023>,<0.284.0>,#Port<0.6076011>] dictionary: [] trap_exit: false status: running heap_size: 987 stack_size: 24 reductions: 974 neighbours: 2012-06-15 19:09:32 =SUPERVISOR REPORT==== Supervisor: {local,riak_kv_pb_socket_sup} Context: child_terminated Reason: {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} Offender: [{pid,<0.20970.188>},{name,undefined},{mfargs,{riak_kv_pb_socket,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}] 2012-06-15 19:09:32 =ERROR REPORT==== ** Generic server <0.20974.188> terminating ** Last message in was {tcp,#Port<0.6076015>,[11|<<10,6,117,114,108,99,97,116,18,39,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,49,50,48,51,57,57,34,122,10,109,34,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,49,50,48,51,57,57,58,58,32,99,97,116,101,103,111,114,121,32,49,44,32,107,101,121,119,111,114,100,32,49,44,32,99,97,116,101,103,111,114,121,32,50,44,32,99,97,116,101,103,111,114,121,32,51,44,32,107,101,121,119,111,114,100,50,44,32,107,101,121,119,111,114,100,51,34,18,9,116,101,120,116,47,106,115,111,110,40,2,48,2,56,1>>]} ** When Server state == {state,#Port<0.6076015>,{riak_client,' riak@10.90.15.198',undefined},undefined,undefined,<<0,0,0,0>>} ** Reason for termination == ** {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} 2012-06-15 19:09:33 =CRASH REPORT==== crasher: initial call: gen:init_it/6 pid: <0.20974.188> registered_name: [] exception exit: {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} in function gen_server2:terminate/6 in call from proc_lib:init_p_do_apply/3 ancestors: [riak_kv_pb_socket_sup,riak_kv_sup,<0.279.0>] messages: [] links: [#Port<0.6076029>,<0.284.0>,#Port<0.6076015>] dictionary: [] trap_exit: false status: running heap_size: 987 stack_size: 24 reductions: 910 neighbours: *Console.log* -------------------- 2012-06-15 17:50:48.811 [info] <0.7.0> Application lager started on node ' riak@10.90.15.198' 2012-06-15 17:50:48.970 [info] <0.7.0> Application public_key started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.003 [info] <0.7.0> Application ssl started on node ' riak@10.90.15.198' 2012-06-15 17:50:49.037 [info] <0.7.0> Application riak_core started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.060 [info] <0.7.0> Application riak_control started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.061 [info] <0.7.0> Application basho_metrics started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.063 [info] <0.7.0> Application cluster_info started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.072 [info] <0.7.0> Application merge_index started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.083 [info] <0.180.0>@riak_core:wait_for_service:416 Waiting for service riak_pipe to start (0 seconds) 2012-06-15 17:50:49.110 [info] <0.249.0>@riak_core:wait_for_application:396 Waiting for application riak_pipe to start (0 seconds). 2012-06-15 17:50:49.111 [info] <0.7.0> Application riak_pipe started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.145 [info] <0.7.0> Application inets started on node ' riak@10.90.15.198' 2012-06-15 17:50:49.151 [info] <0.7.0> Application mochiweb started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.169 [info] <0.7.0> Application erlang_js started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.176 [info] <0.7.0> Application luke started on node ' riak@10.90.15.198' 2012-06-15 17:50:49.197 [info] <0.283.0>@riak_core:wait_for_service:416 Waiting for service riak_kv to start (0 seconds) 2012-06-15 17:50:49.212 [info] <0.249.0>@riak_core:wait_for_application:390 Wait complete for application riak_pipe (0 seconds) 2012-06-15 17:50:49.285 [info] <0.180.0>@riak_core:wait_for_service:410 Wait complete for service riak_pipe (0 seconds) 2012-06-15 17:50:49.291 [info] <0.367.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting (<0.367.0>) 2012-06-15 17:50:49.296 [info] <0.368.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting (<0.368.0>) 2012-06-15 17:50:49.302 [info] <0.369.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting (<0.369.0>) 2012-06-15 17:50:49.307 [info] <0.370.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting (<0.370.0>) 2012-06-15 17:50:49.311 [info] <0.371.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting (<0.371.0>) 2012-06-15 17:50:49.316 [info] <0.372.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting (<0.372.0>) 2012-06-15 17:50:49.320 [info] <0.373.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting (<0.373.0>) 2012-06-15 17:50:49.324 [info] <0.374.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting (<0.374.0>) 2012-06-15 17:50:49.333 [info] <0.376.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host starting (<0.376.0>) 2012-06-15 17:50:49.341 [info] <0.377.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host starting (<0.377.0>) 2012-06-15 17:50:49.348 [info] <0.378.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host starting (<0.378.0>) 2012-06-15 17:50:49.354 [info] <0.379.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host starting (<0.379.0>) 2012-06-15 17:50:49.360 [info] <0.380.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host starting (<0.380.0>) 2012-06-15 17:50:49.366 [info] <0.381.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host starting (<0.381.0>) 2012-06-15 17:50:49.371 [info] <0.383.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_hook) host starting (<0.383.0>) 2012-06-15 17:50:49.375 [info] <0.384.0>@riak_kv_js_vm:init:76 Spidermonkey VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_hook) host starting (<0.384.0>) 2012-06-15 17:50:49.395 [info] <0.7.0> Application bitcask started on node ' riak@10.90.15.198' 2012-06-15 17:50:49.567 [info] <0.463.0>@riak_core:wait_for_application:396 Waiting for application riak_kv to start (0 seconds). 2012-06-15 17:50:49.571 [info] <0.7.0> Application riak_kv started on node ' riak@10.90.15.198' 2012-06-15 17:50:49.573 [info] <0.7.0> Application riak_search started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.573 [info] <0.7.0> Application basho_stats started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.584 [info] <0.7.0> Application runtime_tools started on node 'riak@10.90.15.198' 2012-06-15 17:50:49.669 [info] <0.463.0>@riak_core:wait_for_application:390 Wait complete for application riak_kv (0 seconds) 2012-06-15 17:50:54.871 [info] <0.283.0>@riak_core:wait_for_service:410 Wait complete for service riak_kv (4 seconds) 2012-06-15 18:26:48.764 [info] <0.42.0> alarm_handler: {set,{system_memory_high_watermark,[]}} 2012-06-15 19:09:31.777 [error] <0.20970.188> gen_server <0.20970.188> terminated with reason: {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} 2012-06-15 19:09:32.219 [error] <0.20970.188> CRASH REPORT Process <0.20970.188> with 0 neighbours crashed with reason: {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} 2012-06-15 19:09:32.320 [error] <0.284.0> Supervisor riak_kv_pb_socket_sup had child undefined started with {riak_kv_pb_socket,start_link,undefined} at <0.20970.188> exit with reason {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} in context child_terminated 2012-06-15 19:09:32.824 [error] <0.20974.188> gen_server <0.20974.188> terminated with reason: {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} 2012-06-15 19:09:32.972 [error] <0.20974.188> CRASH REPORT Process <0.20974.188> with 0 neighbours crashed with reason: {mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]} Thanks In Advance, Amol Rajoba
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com