Hi,

We have been using riak to gather our test data and analyze results after test 
completes.
Recently we have observed riak crash in riak console logs.
This causes our tests failing to record data to riak and bailing out :-(

The crash logs are as follow:
2016-02-19 16:25:26.255 [error] <0.2160.0> gen_fsm <0.2160.0> in state active 
terminated with reason: no function clause matching 
riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, 
{state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...})
 line 1195

2016-02-19 16:25:26.260 [error] <0.2160.0> CRASH REPORT Process <0.2160.0> with 
2 neighbours exited with reason: no function clause matching 
riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, 
{state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...})
 line 1195 in gen_fsm:terminate/7 line 622

2016-02-19 16:25:26.260 [error] <0.172.0> Supervisor riak_core_vnode_sup had 
child undefined started with {riak_core_vnode,start_link,undefined} at 
<0.2160.0> exit with reason no function clause matching 
riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, 
{state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...})
 line 1195 in context child_terminated

2016-02-19 16:25:26.261 [error] <0.4319.0> gen_fsm <0.4319.0> in state ready 
terminated with reason: no function clause matching 
riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, 
{state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...})
 line 1195

2016-02-19 16:25:26.275 [error] <0.4319.0> CRASH REPORT Process <0.4319.0> with 
10 neighbours exited with reason: no function clause matching 
riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, 
{state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...})
 line 1195 in gen_fsm:terminate/7 line 622

2016-02-19 16:25:26.278 [error] <0.4320.0> Supervisor {<0.4320.0>,poolboy_sup} 
had child riak_core_vnode_worker started with 
riak_core_vnode_worker:start_link([{worker_module,riak_core_vnode_worker},{worker_args,[268322566228720457638957762256505085639956365312,...]},...])
 at undefined exit with reason no function clause matching 
riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, 
{state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...})
 line 1195 in context shutdown_error

2016-02-19 16:25:26.278 [error] <0.4320.0> gen_server <0.4320.0> terminated 
with reason: no function clause matching 
riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, 
{state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...})
 line 1195

2016-02-19 16:25:26.278 [error] <0.4320.0> CRASH REPORT Process <0.4320.0> with 
0 neighbours exited with reason: no function clause matching 
riak_kv_vnode:handle_info({#Ref<0.0.482.161540>,{ok,<0.11042.842>}}, 
{state,268322566228720457638957762256505085639956365312,riak_kv_eleveldb_backend,true,{state,<<>>,...},...})
 line 1195 in gen_server:terminate/6 line 744

2016-02-19 16:25:26.806 [error] <0.2157.0> gen_fsm <0.2157.0> in state active 
terminated with reason: {timeout,{gen_server,call,[<0.5141.0>,stop]}}

2016-02-19 16:25:26.808 [error] <0.2157.0> CRASH REPORT Process <0.2157.0> with 
2 neighbours exited with reason: {timeout,{gen_server,call,[<0.5141.0>,stop]}} 
in gen_fsm:terminate/7 line 600

2016-02-19 16:25:26.809 [error] <0.5450.0> gen_fsm <0.5450.0> in state ready 
terminated with reason: {timeout,{gen_server,call,[<0.5141.0>,stop]}}

2016-02-19 16:25:26.809 [error] <0.172.0> Supervisor riak_core_vnode_sup had 
child undefined started with {riak_core_vnode,start_link,undefined} at 
<0.2157.0> exit with reason {timeout,{gen_server,call,[<0.5141.0>,stop]}} in 
context child_terminated

2016-02-19 16:25:26.809 [error] <0.5450.0> CRASH REPORT Process <0.5450.0> with 
10 neighbours exited with reason: {timeout,{gen_server,call,[<0.5141.0>,stop]}} 
in gen_fsm:terminate/7 line 622

2016-02-19 16:25:26.809 [error] <0.5451.0> Supervisor {<0.5451.0>,poolboy_sup} 
had child riak_core_vnode_worker started with 
riak_core_vnode_worker:start_link([{worker_module,riak_core_vnode_worker},{worker_args,[211232658520482062396626323478525280184646500352,...]},...])
 at undefined exit with reason {timeout,{gen_server,call,[<0.5141.0>,stop]}} in 
context shutdown_error

2016-02-19 16:25:26.809 [error] <0.5451.0> gen_server <0.5451.0> terminated 
with reason: {timeout,{gen_server,call,[<0.5141.0>,stop]}}

2016-02-19 16:25:26.809 [error] <0.5451.0> CRASH REPORT Process <0.5451.0> with 
0 neighbours exited with reason: {timeout,{gen_server,call,[<0.5141.0>,stop]}} 
in gen_server:terminate/6 line 744

Our setup is as follow:
We have a riak cluster with 10 nodes, configuration of each node is as follow:
RAM: 48GB
Disk:
         80GB (/)
         504GB (separate riak partition)
Riak Version: 2.1.3-1 (2.1.3)
Data in riak: After observing crash, total data in riak partition was ~50GB

Riak config is as follow:
riak.conf
[Attached with this email]

advanced.config:

[

 {riak_kv, [{add_paths, ["/usr/local/lib/scale_riak/ebin"]}]},

 {webmachine, [{backlog, 511}, {nodelay, true}]},

 {yokozuna, [{solr_request_timeout, 120000}]}

].

We have observed this a few times now, and after this crash we observed latency 
increases and our application starts timing out.
We would really like to understand what might be causing this crash and if it 
is something due to missing config on our nodes we would like to fix it.

Thanks for your help in advanced :-)

Regards,
Raviraj

Attachment: riak.conf
Description: riak.conf

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
  • riak crash Raviraj Vaishampayan

Reply via email to