Hi Guys,
I am evaluating Riak as Kay-Value storage where my requirement is to store
huge set of data(more than RAM), so Riak was setup with LevelDB as backend.

Clients were connected using protocol buffer api.
{pb_backlog, 100000}, in app.config

Benchmarking involved 25 Agents doing put/store on single node for 100M
records.
It runs well till 3M but then complete cluster crashes with making all
nodes down.

Following are the System as well as Riak configurations with error & crash
logs

Please help to find what I am missing, I need to test riak & use it in
production as soon as possible.

Nodes: 2  (I know cluster of 5 is best but this is just test setup)
OS: Ubuntu 12.04 32bit
CPU: Core i3
RAM: 4GB
HDD: 500GB

app.config [changes only]

%% eLevelDB Config
 {eleveldb, [
             {data_root, "/data/riak/leveldb"},
             {block_size, 262144}, %%256k
             {cache_size, 104857600}, %% 100MB - default cache size 8MB
per-partition
             {write_buffer_size, 524288000}, %% 500MB in bytes
                {write_buffer_size_min, 524288000}, %% 500MB in bytes
                {write_buffer_size_max, 524288000}, %% 500MB in bytes
                {max_open_files, 100} %% Maximum number of files open at
once per partition- Default: 20 - Minimum: 20
            ]},


vm.args [changes only]
## Enable kernel poll and a few async threads
+K true
+A 128


Bucket "riaktest" properties:

{"props":{"allow_mult":false,"
basic_quorum":false,"big_vclock":50,"chash_keyfun":{"mod":"riak_core_util","fun":"chash_std_keyfun"},"dw":"quorum","last_write_wins":true,"linkfun":{"mod":"riak_kv_wm_link_walker","fun":"mapreduce_linkfun"},"n_val":3,"name":"riaktest","notfound_ok":true,"old_vclock":86400,"postcommit":[],"pr":0,"precommit":[],"pw":0,"r":"one","rw":"one","small_vclock":50,"w":"one","young_vclock":20}}

relatime set in /etc/fstab on all drives

OS open files limit sysctl fs.file-max set to 800000


Following are the error.log, crash.log and console.log* *files*

error.log*
---------------
2012-06-15 19:09:31.777 [error] <0.20970.188> gen_server <0.20970.188>
terminated with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.219 [error] <0.20970.188> CRASH REPORT Process
<0.20970.188> with 0 neighbours crashed with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.320 [error] <0.284.0> Supervisor riak_kv_pb_socket_sup
had child undefined started with {riak_kv_pb_socket,start_link,undefined}
at <0.20970.188> exit with reason
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
in context child_terminated
2012-06-15 19:09:32.824 [error] <0.20974.188> gen_server <0.20974.188>
terminated with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.972 [error] <0.20974.188> CRASH REPORT Process
<0.20974.188> with 0 neighbours crashed with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}

*

Crash.log*
--------------
2012-06-15 19:09:31 =ERROR REPORT====
** Generic server <0.20970.188> terminating
** Last message in was
{tcp,#Port<0.6076011>,[11|<<10,6,117,114,108,99,97,116,18,39,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,55,50,48,51,51,57,34,122,10,109,34,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,55,50,48,51,51,57,58,58,32,99,97,116,101,103,111,114,121,32,49,44,32,107,101,121,119,111,114,100,32,49,44,32,99,97,116,101,103,111,114,121,32,50,44,32,99,97,116,101,103,111,114,121,32,51,44,32,107,101,121,119,111,114,100,50,44,32,107,101,121,119,111,114,100,51,34,18,9,116,101,120,116,47,106,115,111,110,40,2,48,2,56,1>>]}
** When Server state == {state,#Port<0.6076011>,{riak_client,'
riak@10.90.15.198',undefined},undefined,undefined,<<0,0,0,0>>}
** Reason for termination ==
**
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32 =CRASH REPORT====
  crasher:
    initial call: gen:init_it/6
    pid: <0.20970.188>
    registered_name: []
    exception exit:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
      in function  gen_server2:terminate/6
      in call from proc_lib:init_p_do_apply/3
    ancestors: [riak_kv_pb_socket_sup,riak_kv_sup,<0.279.0>]
    messages: []
    links: [#Port<0.6076023>,<0.284.0>,#Port<0.6076011>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 974
  neighbours:
2012-06-15 19:09:32 =SUPERVISOR REPORT====
     Supervisor: {local,riak_kv_pb_socket_sup}
     Context:    child_terminated
     Reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
     Offender:
[{pid,<0.20970.188>},{name,undefined},{mfargs,{riak_kv_pb_socket,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}]

2012-06-15 19:09:32 =ERROR REPORT====
** Generic server <0.20974.188> terminating
** Last message in was
{tcp,#Port<0.6076015>,[11|<<10,6,117,114,108,99,97,116,18,39,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,49,50,48,51,57,57,34,122,10,109,34,50,53,48,50,97,98,102,49,55,97,100,102,100,48,98,55,102,48,57,48,52,99,48,99,98,101,52,48,100,100,100,55,49,49,50,48,51,57,57,58,58,32,99,97,116,101,103,111,114,121,32,49,44,32,107,101,121,119,111,114,100,32,49,44,32,99,97,116,101,103,111,114,121,32,50,44,32,99,97,116,101,103,111,114,121,32,51,44,32,107,101,121,119,111,114,100,50,44,32,107,101,121,119,111,114,100,51,34,18,9,116,101,120,116,47,106,115,111,110,40,2,48,2,56,1>>]}
** When Server state == {state,#Port<0.6076015>,{riak_client,'
riak@10.90.15.198',undefined},undefined,undefined,<<0,0,0,0>>}
** Reason for termination ==
**
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:33 =CRASH REPORT====
  crasher:
    initial call: gen:init_it/6
    pid: <0.20974.188>
    registered_name: []
    exception exit:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
      in function  gen_server2:terminate/6
      in call from proc_lib:init_p_do_apply/3
    ancestors: [riak_kv_pb_socket_sup,riak_kv_sup,<0.279.0>]
    messages: []
    links: [#Port<0.6076029>,<0.284.0>,#Port<0.6076015>]
    dictionary: []
    trap_exit: false
    status: running
    heap_size: 987
    stack_size: 24
    reductions: 910
  neighbours:



*Console.log*
--------------------
2012-06-15 17:50:48.811 [info] <0.7.0> Application lager started on node '
riak@10.90.15.198'
2012-06-15 17:50:48.970 [info] <0.7.0> Application public_key started on
node 'riak@10.90.15.198'
2012-06-15 17:50:49.003 [info] <0.7.0> Application ssl started on node '
riak@10.90.15.198'
2012-06-15 17:50:49.037 [info] <0.7.0> Application riak_core started on
node 'riak@10.90.15.198'
2012-06-15 17:50:49.060 [info] <0.7.0> Application riak_control started on
node 'riak@10.90.15.198'
2012-06-15 17:50:49.061 [info] <0.7.0> Application basho_metrics started on
node 'riak@10.90.15.198'
2012-06-15 17:50:49.063 [info] <0.7.0> Application cluster_info started on
node 'riak@10.90.15.198'
2012-06-15 17:50:49.072 [info] <0.7.0> Application merge_index started on
node 'riak@10.90.15.198'
2012-06-15 17:50:49.083 [info] <0.180.0>@riak_core:wait_for_service:416
Waiting for service riak_pipe to start (0 seconds)
2012-06-15 17:50:49.110 [info] <0.249.0>@riak_core:wait_for_application:396
Waiting for application riak_pipe to start (0 seconds).
2012-06-15 17:50:49.111 [info] <0.7.0> Application riak_pipe started on
node 'riak@10.90.15.198'
2012-06-15 17:50:49.145 [info] <0.7.0> Application inets started on node '
riak@10.90.15.198'
2012-06-15 17:50:49.151 [info] <0.7.0> Application mochiweb started on node
'riak@10.90.15.198'
2012-06-15 17:50:49.169 [info] <0.7.0> Application erlang_js started on
node 'riak@10.90.15.198'
2012-06-15 17:50:49.176 [info] <0.7.0> Application luke started on node '
riak@10.90.15.198'
2012-06-15 17:50:49.197 [info] <0.283.0>@riak_core:wait_for_service:416
Waiting for service riak_kv to start (0 seconds)
2012-06-15 17:50:49.212 [info] <0.249.0>@riak_core:wait_for_application:390
Wait complete for application riak_pipe (0 seconds)
2012-06-15 17:50:49.285 [info] <0.180.0>@riak_core:wait_for_service:410
Wait complete for service riak_pipe (0 seconds)
2012-06-15 17:50:49.291 [info] <0.367.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.367.0>)
2012-06-15 17:50:49.296 [info] <0.368.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.368.0>)
2012-06-15 17:50:49.302 [info] <0.369.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.369.0>)
2012-06-15 17:50:49.307 [info] <0.370.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.370.0>)
2012-06-15 17:50:49.311 [info] <0.371.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.371.0>)
2012-06-15 17:50:49.316 [info] <0.372.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.372.0>)
2012-06-15 17:50:49.320 [info] <0.373.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.373.0>)
2012-06-15 17:50:49.324 [info] <0.374.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_map) host starting
(<0.374.0>)
2012-06-15 17:50:49.333 [info] <0.376.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.376.0>)
2012-06-15 17:50:49.341 [info] <0.377.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.377.0>)
2012-06-15 17:50:49.348 [info] <0.378.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.378.0>)
2012-06-15 17:50:49.354 [info] <0.379.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.379.0>)
2012-06-15 17:50:49.360 [info] <0.380.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.380.0>)
2012-06-15 17:50:49.366 [info] <0.381.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_reduce) host
starting (<0.381.0>)
2012-06-15 17:50:49.371 [info] <0.383.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_hook) host starting
(<0.383.0>)
2012-06-15 17:50:49.375 [info] <0.384.0>@riak_kv_js_vm:init:76 Spidermonkey
VM (thread stack: 16MB, max heap: 8MB, pool: riak_kv_js_hook) host starting
(<0.384.0>)
2012-06-15 17:50:49.395 [info] <0.7.0> Application bitcask started on node '
riak@10.90.15.198'
2012-06-15 17:50:49.567 [info] <0.463.0>@riak_core:wait_for_application:396
Waiting for application riak_kv to start (0 seconds).
2012-06-15 17:50:49.571 [info] <0.7.0> Application riak_kv started on node '
riak@10.90.15.198'
2012-06-15 17:50:49.573 [info] <0.7.0> Application riak_search started on
node 'riak@10.90.15.198'
2012-06-15 17:50:49.573 [info] <0.7.0> Application basho_stats started on
node 'riak@10.90.15.198'
2012-06-15 17:50:49.584 [info] <0.7.0> Application runtime_tools started on
node 'riak@10.90.15.198'
2012-06-15 17:50:49.669 [info] <0.463.0>@riak_core:wait_for_application:390
Wait complete for application riak_kv (0 seconds)
2012-06-15 17:50:54.871 [info] <0.283.0>@riak_core:wait_for_service:410
Wait complete for service riak_kv (4 seconds)
2012-06-15 18:26:48.764 [info] <0.42.0> alarm_handler:
{set,{system_memory_high_watermark,[]}}
2012-06-15 19:09:31.777 [error] <0.20970.188> gen_server <0.20970.188>
terminated with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.219 [error] <0.20970.188> CRASH REPORT Process
<0.20970.188> with 0 neighbours crashed with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.320 [error] <0.284.0> Supervisor riak_kv_pb_socket_sup
had child undefined started with {riak_kv_pb_socket,start_link,undefined}
at <0.20970.188> exit with reason
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
in context child_terminated
2012-06-15 19:09:32.824 [error] <0.20974.188> gen_server <0.20974.188>
terminated with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}
2012-06-15 19:09:32.972 [error] <0.20974.188> CRASH REPORT Process
<0.20974.188> with 0 neighbours crashed with reason:
{mem_error,[{zlib,call,3},{zlib,zip,1},{riak_kv_pb_socket,process_message,2},{riak_kv_pb_socket,handle_info,2},{gen_server2,handle_msg,7},{proc_lib,init_p_do_apply,3}]}


Thanks In Advance,
Amol Rajoba
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to