Jon, Thanks for the reply. I tried increasing the 'map_js_vm_count' and 'reduce_js_vm_count' on the cluster to the values you suggested 24/18 and even larger ones 32/24 and 128/128, but with no success - I still get the same error in the console log of the node.
What are we missing?! -Matthew On Fri, Mar 30, 2012 at 16:19, Jon Meredith <jmered...@basho.com> wrote: > Hi list, > > I spend some time looking at this issue today. I suspect it is due to all > the Javascript VMs being busy inside Riak due to many parallel MapReduce > jobs. > > If you're seeing this issue please check for this message in the > console.log file > > 16:40:50.037 [notice] JS call failed: All VMs are busy. > > I've filed https://github.com/basho/riak_pipe/issues/44 to track > improvements to error reporting. > > If you are seeing the log message, the number of VMs allocated for map and > reduce tasks can be configured in the app.config - defaults are 8 for map > and 6 for reduce, increasing them will use more memory but should reduce > the incidence of the error message. > > {riak_kv, [ > ... > {map_js_vm_count, 24 }, > {reduce_js_vm_count, 18 }, > ...] > > If you were affected by it and changing this does not resolve your issue, > I'll keep digging. > > Cheers, Jon. > > On Thu, Mar 29, 2012 at 10:29 AM, Matthew Tovbin <matt...@tovbin.com>wrote: > >> Guys, >> >> Any updates on the issue?! >> >> -Matthew >> >> >> >> On Tue, Mar 13, 2012 at 18:29, Matthew Tovbin <matt...@tovbin.com> wrote: >> >>> Here is a log from one of the servers: >>> >>> ==> /mnt/dataraid/log/crash.log <== >>> 2012-03-13 18:24:44 =CRASH REPORT==== >>> crasher: >>> initial call: riak_pipe_builder:init/1 >>> pid: <0.20949.24> >>> registered_name: [] >>> exception exit: {sink_died,shutdown} >>> in function gen_fsm:terminate/7 >>> in call from proc_lib:init_p_do_apply/3 >>> ancestors: [riak_pipe_builder_sup,riak_pipe_sup,<0.172.0>] >>> messages: [{'DOWN',#Ref<0.0.5.38339>,process,<0.20950.24>,shutdown}] >>> links: [<0.175.0>] >>> dictionary: >>> [{eunit,[{module,riak_pipe_builder},{ref,#Ref<0.0.5.38318>},{spec,[{fitting_spec,listkeys,riak_kv_pipe_listkeys,undefined,{chash,key_of},1,64}]},{options,[{sink,{fitting,<0.20947.24>,#Ref<0.0.5.38318>,{riak_kv_pipe_get,bkey_chash},{riak_kv_pipe_get,bkey_nval}}},{trace,{set,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[error],[],[],[],[],[],[],[],[],[],[],[],[],[]}}}},{log,{sink,{fitting,<0.20060.24>,#Ref<0.0.5.38318>,sink,undefined}}}]},{fittings,[{{fitting,<0.20950.24>,#Ref<0.0.5.38318>,{chash,key_of},1},#Ref<0.0.5.38339>}]}]}] >>> trap_exit: false >>> status: running >>> heap_size: 987 >>> stack_size: 24 >>> reductions: 251 >>> neighbours: >>> 2012-03-13 18:24:44 =SUPERVISOR REPORT==== >>> Supervisor: {local,riak_pipe_builder_sup} >>> Context: child_terminated >>> Reason: {sink_died,shutdown} >>> Offender: >>> [{pid,<0.20949.24>},{name,undefined},{mfargs,{riak_pipe_builder,start_link,undefined}},{restart_type,temporary},{shutdown,brutal_kill},{child_type,worker}] >>> >>> >>> ==> /mnt/dataraid/log/console.log <== >>> 2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process >>> <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown} >>> 2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor >>> riak_pipe_builder_sup had child undefined started with >>> {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason >>> {sink_died,shutdown} in context child_terminated >>> 2012-03-13 18:24:56.420 [info] >>> <0.71.0>@riak_core_sysmon_handler:handle_event:85 monitor long_gc >>> <0.32603.24> >>> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}] >>> [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}] >>> >>> >>> ==> /mnt/dataraid/log/erlang.log.1 <== >>> 18:24:44.634 [error] gen_fsm <0.20949.24> in state >>> wait_pipeline_shutdown terminated with reason: {sink_died,shutdown} >>> (riak@1)1> 18:24:44.641 [error] Pipe worker startup failed:fitting was >>> gone before startup >>> (riak@1)1> >>> 18:24:44.645 [error] CRASH REPORT Process <0.20949.24> with 0 neighbours >>> crashed with reason: {sink_died,shutdown} >>> (riak@1)1> 18:24:44.652 [error] Supervisor riak_pipe_builder_sup had >>> child undefined started with {riak_pipe_builder,start_link,undefined} at >>> <0.20949.24> exit with reason {sink_died,shutdown} in context >>> child_terminated >>> (riak@1)1> 18:24:56.420 [info] monitor long_gc <0.32603.24> >>> [{initial_call,{mochiweb_acceptor,init,3}},{almost_current_function,{gen,do_call,4}},{message_queue_len,0}] >>> [{timeout,120},{old_heap_block_size,0},{heap_block_size,4181},{mbuf_size,0},{stack_size,40},{old_heap_size,0},{heap_size,1955}] >>> (riak@1)1> >>> >>> >>> >>> ==> /mnt/dataraid/log/error.log <== >>> 2012-03-13 18:24:44.645 [error] <0.20949.24> CRASH REPORT Process >>> <0.20949.24> with 0 neighbours crashed with reason: {sink_died,shutdown} >>> 2012-03-13 18:24:44.652 [error] <0.175.0> Supervisor >>> riak_pipe_builder_sup had child undefined started with >>> {riak_pipe_builder,start_link,undefined} at <0.20949.24> exit with reason >>> {sink_died,shutdown} in context child_terminated >>> >>> >>> >>> >>> >>> >>> -Matthew >>> >>> >>> >>> On Tue, Mar 13, 2012 at 18:17, Matthew Tovbin <matt...@tovbin.com>wrote: >>> >>>> Hi, >>>> >>>> I got the same problem today on 1.1.0. >>>> >>>> As suggested, I updated all the nodes to 1.1.1. The error remains the >>>> same: >>>> >>>> { stack: [Getter/Setter], >>>> arguments: undefined, >>>> type: undefined, >>>> message: 'HTTP error 500: >>>> {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object ....."}}} >>>> >>>> >>>> -Matthew >>>> >>>> >>>> >>>> On Tue, Mar 13, 2012 at 06:48, Jon Meredith <jmered...@basho.com>wrote: >>>> >>>>> Hi Arne, >>>>> >>>>> Could you try with 1.1.1 please. We've fixed a number of bugs with >>>>> how MapReduce handles errors that could explain some of what you are >>>>> seeing. >>>>> >>>>> Jon >>>>> >>>>> >>>>> On Tue, Mar 13, 2012 at 12:55 AM, Arne Claassen < >>>>> arne+r...@claassen.net> wrote: >>>>> >>>>>> I'm running 1.1.0 built from source on Centos 6. I set up two nodes >>>>>> on localhost following the instructions on the basho site for testing and >>>>>> everything was going fine until i needed to fetch all items in a bucket. >>>>>> >>>>>> For this i was using node.js with riak-js and it's db.getAll >>>>>> call,which from the debug logs looks like it uses /mapred >>>>>> >>>>>> The bucket only has about 1200 items in it, each a json payload of >>>>>> about 3KB. Initially, it'd keep getting inconsistent results, sometimes >>>>>> the >>>>>> data would come back, sometimes data would be undefined. So i kept >>>>>> running >>>>>> it, trying to see if there was a pattern, adding more debugging code, >>>>>> etc. >>>>>> But still it would be about a request every couple of seconds. Then it >>>>>> just >>>>>> stopped working altogether and now i always get: >>>>>> >>>>>> { [Error: HTTP error 500: >>>>>> {"phase":0,"error":"[preflist_exhausted]","input":"{ok,{r_object,<<\"tweets_15368536\">>,<<\"178528155714330620\">>,[{r_content,{dict,6,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]},{{[],[],[[<<\"Links\">>]],[],[],[],[],[],[],[],[[<<\"content-type\">>,97,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<\"X-Riak-VTag\">>,49,76,115,54,70,75,74,71,119,65,111,70,88,73,117,84,98,88,79,69,52,116]],[[<<\"index\">>]],[],[[<<\"X-Riak-Last-Modified\">>|{1331,487523,218968}]],[],[[<<\"X-Riak-Meta\">>]]}}},<<\"{\"coordinates\":null,\"create...\">>}],...},...}","type":"forward_preflist","stack":"[]"}] >>>>>> statusCode: 500 } >>>>>> >>>>>> in the console. I haven't tried restarting riak, just because i'd >>>>>> rather find out what's happening than it going away and coming back later >>>>>> when i don't expect it. >>>>>> >>>>>> Any suggestions on how to troubleshoot this? I can still access items >>>>>> individually and i can even list the keys via ?keys=true, it's just he >>>>>> map >>>>>> reduce that seems to be busted. >>>>>> >>>>>> cheers, >>>>>> arne >>>>>> _______________________________________________ >>>>>> riak-users mailing list >>>>>> riak-users@lists.basho.com >>>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>>> >>>>> >>>>> >>>>> >>>>> -- >>>>> Jon Meredith >>>>> Platform Engineering Manager >>>>> Basho Technologies, Inc. >>>>> jmered...@basho.com >>>>> >>>>> >>>>> _______________________________________________ >>>>> riak-users mailing list >>>>> riak-users@lists.basho.com >>>>> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com >>>>> >>>>> >>>> >>> >> > > > -- > Jon Meredith > Platform Engineering Manager > Basho Technologies, Inc. > jmered...@basho.com > >
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com