problem with rolling upgrade 0.14 -> 1.0
Hi, I'm trying to upgrade my Riak 0.14 nodes and encountered some problems. After installing the riak-1.0 rpm and trying to start the node I'm getting the following error message: riak start Attempting to restart script through sudo -u riak pthread/ethr_event.c:98: Fatal error in wait__(): Function not implemented (38) pthread/ethr_event.c:98: Fatal error in wait__(): Function not implemented (38) Error reading /etc/riak/app.config When installing the riak-1.0 rpm I'm getting the following output: sudo rpm -Uvh riak-1.0.0-1.el5.x86_64.rpm Preparing...### [100%] 1:riak warning: /etc/riak/app.config created as /etc/riak/app.config.rpmnew warning: /etc/riak/vm.args created as /etc/riak/vm.args.rpmnew ### [100%] chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled chcon: couldn't compute security context from unlabeled in addition, I noticed that the app.config did not update as it should on the installation process. Can you please advise what could be the cause of this error? Thanks, Tom. ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
Re: Timeout when storing
Thanks David - I'll try that on my single-node instance, but I'm working another Riak issue on another thread. Jim - Original Message - From: "David Smith" To: "jim adler" Sent: Friday, October 7, 2011 7:02:01 AM Subject: Re: Timeout when storing Hi Jim, Sorry for the slow response -- email is like a running battle at times. :) How many partitions are you running? Also, take down the node and then remove any *.lock files. Thanks, D. On Mon, Oct 3, 2011 at 11:23 AM, wrote: > About 90 out of 3000 are zero-bytes. > > Jim > > -Original Message- > From: riak-users-boun...@lists.basho.com > [mailto:riak-users-boun...@lists.basho.com] On Behalf Of David Smith > Sent: Monday, October 03, 2011 4:46 AM > To: Jim Adler > Cc: riak-users@lists.basho.com > Subject: Re: Timeout when storing > > Jim, > > If you look at your bitcask directories, do you have a large number of > zero-byte files, perchance? > > D. > > On Sat, Oct 1, 2011 at 1:58 PM, Jim Adler wrote: >> After upgrading my single-node instance to 1.0, I'm still seeing the >> "timeout when storing" issue. Here are the changes I made based on >> everyone's suggestions (much appreciated!): >> >> - Ubuntu 11.04 (natty) 32-bit >> - Python client 1.3.0 >> - /etc/riak/vm.args: -env ERL_MAX_PORTS 32768 >> - /etc/default/riak: ulimit -n 32768 >> >> Here's the /var/log/crash.log report: >> >> 2011-10-01 12:31:03 =ERROR REPORT >> ** State machine <0.3452.0> terminating >> ** Last event in was >> {'riak_vnode_req_v1',1136089163393944065322395631681798128560666312704 >> ,{fsm,undefined,<0.3451.0>},{'riak_kv_put_req_v1',{<<"nodes">>,<<"user >> _id-17527747-info">>},{r_object,<<"nodes">>,<<"user_id-17527747-info"> >> >,[{r_content,{dict,4,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[], >> [],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[[<<"content-type">>,9 >> 7,112,112,108,105,99,97,116,105,111,110,47,106,115,111,110],[<<"X-Riak >> -VTag">>,49,88,88,75,75,51,90,88,68,117,90,122,85,53,57,85,53,101,107, >> 89,115,110]],[[<<"index">>]],[],[[<<"X-Riak-Last-Modified">>|{1317,497 >> 463,847242}]],[],[]}}},<<"{DATA >> DELETED}">>}],[],{dict,1,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[], >> [],[],[],[],[],[]},{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[[clean >> |true]],[]}}},undefined},51456853,63484716663,[coord]}} >> >> ** When State == active >> ** Data == >> {state,1136089163393944065322395631681798128560666312704,riak_kv_vnode >> ,{state,1136089163393944065322395631681798128560666312704,false,riak_k >> v_bitcask_backend,{state,#Ref<0.0.0.10359>,"11360891633939440653223956 >> 31681798128560666312704",[{async_folds,true},[{vnode_vclocks,true},{in >> cluded_applications,[]},{add_paths,[]},{allow_strfun,false},{storage_b >> ackend,riak_kv_bitcask_backend},{legacy_keylisting,false},{reduce_js_v >> m_count,6},{js_thread_stack,16},{pb_ip,"0.0.0.0"},{riak_kv_stat,true}, >> {map_js_vm_count,8},{mapred_system,pipe},{js_max_vm_mem,8},{pb_port,80 >> 87},{legacy_stats,true},{mapred_name,"mapred"},{stats_urlpath,"stats"} >> ,{http_url_encoding,on},{hook_js_vm_count,2}],{read_write,true}],11360 >> 89163393944065322395631681798128560666312704,"/var/lib/riak/bitcask"}, >> {dict,0,16,16,8,80,48,{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[] >> },{{[],[],[],[],[],[],[],[],[],[],[],[],[],[],[],[]}}},<<35,9,254,249, >> 78,135,82,106>>,3000,1000,100,100,true,false},undefined,undefined,none >> ,undefined,<0.3454.0>,6} >> ** Reason for termination = >> ** {bad_return_value,{error,{write_locked,emfile}}} >> 2011-10-01 12:31:03 =CRASH REPORT >> crasher: >> initial call: riak_core_vnode:init/1 >> pid: <0.3452.0> >> registered_name: [] >> exception exit: {bad_return_value,{error,{write_locked,emfile}}} >> in function gen_fsm:terminate/7 >> in call from proc_lib:init_p_do_apply/3 >> ancestors: [riak_core_vnode_sup,riak_core_sup,<0.92.0>] >> messages: [{'EXIT',<0.3454.0>,shutdown}] >> links: [<0.96.0>] >> dictionary: [] >> trap_exit: true >> status: running >> heap_size: 6765 >> stack_size: 24 >> reductions: 160650 >> neighbours: >> 2011-10-01 12:31:03 =SUPERVISOR REPORT >> Supervisor: {local,riak_core_vnode_sup} >> Context: child_terminated >> Reason: {bad_return_value,{error,{write_locked,emfile}}} >> Offender: >> [{pid,<0.3452.0>},{name,undefined},{mfargs,{riak_core_vnode,start_link >> ,undefined}},{restart_type,temporary},{shutdown,30},{child_type,wo >> rker}] >> >> 2011-10-01 12:45:28 =ERROR REPORT >> Failed to merge >> > "/var/lib/riak/bitcask/605153021707326989568713251046585937826284568576/var/ > lib/riak/bitcask/605153021707326989568713251046585937826284568576/1315770213 > .bitcask.data/var/lib/riak/bitcask/60515302170732698956871325104658593782628 > 4568576/1316329673.bitcask.data/var/lib/riak/bitcask/60515302170732698956871 > 3251046585937826284568576/1316330222.bitcask.data/var/lib/riak/bitcask/60515 > 30217073269895687132
Re: Riak 1.0 pre2 legacy_keylisting crash
I'm seeing the same behavior and logs on a bucket with about 8M keys. Fyodor, any luck with any of Bryan's suggestions? Jim - Original Message - From: "Bryan Fink" To: "Fyodor Yarochkin" Cc: riak-users@lists.basho.com Sent: Friday, October 7, 2011 6:06:15 AM Subject: Re: Riak 1.0 pre2 legacy_keylisting crash On Fri, Oct 7, 2011 at 1:50 AM, Fyodor Yarochkin wrote: > Here's one of the queries that consistently generates series of > 'fitting_died' log messages: > > { > "inputs":{ > "bucket":"test", > "index":"integer_int", … > }, > "query":[ > {"map":{"language":"javascript", … > }, > {"reduce":{"language":"javascript", … > {"reduce":{"language":"javascript", … > ],"timeout": 9000 > } > > produces over hundred of " "Supervisor riak_pipe_vnode_worker_sup had > child at module undefined at <0.28835.0> exit with reason fitting_died > in context child_terminated" entries in log file and returns 'timeout' My interpretation of your report is that 9 seconds is not long enough to finish your MapReduce query. I'll explain how I arrived at this interpretation: The log message you're seeing says that many processes that riak_pipe_vnode_worker_sup was monitor exited abnormally. That supervisor only monitors Riak Pipe worker processes, the processes that do the work for Riak 1.0's MapReduce phases. The reason those workers gave for exiting abnormally was 'fitting_died'. This means that the pipeline they were working for closed before they were finished with their work. The result your received was 'timeout'. The way timeouts work in Riak-Pipe-based MapReduce is that a timer triggers a message at the given time, causing a monitoring process to cease waiting for results, tear down the pipe, and return a timeout message to your client. The "tear down the pipe" step in the timeout process is what causes all of those 'fitting_died' message you see. They're normal, and are intended to aid in analysis like the above. With that behind us, though, the question remains: why isn't 9 seconds long enough to finish this query? To figure that out, I'd start from the beginning: 1. Is 9 seconds long enough to just finish the index query (using the index API outside of MapReduce)? If not, then the next people to jump in with help here will want to know more about the types, sizes, and counts of data you have indexed. 2. Assuming the bare index query finishes fast enough, is 9 seconds long enough to get through just the index and map phase (no reduce phases)? If not, it's likely that either it takes longer than 9 seconds to pull every object matching your index query out of KV, or that contention for Javascript VMs prohibits the throughput needed. 2a. Try switching to an Erlang map phase. {"language":"erlang","module":"riak_kv_mapreduce","function":"map_object_value","arg":"filter_notfound"} should do exactly what your Javascript function does, without contending for a JS VM. 2b. Try increasing the number of JS VMs available for map phases. In your app.config, find the 'map_js_vm_count' setting, and increase it. 3. Assuming just the map phase also makes it through, is 9 seconds long enough to get through just the index, map, and first reduce phase (leave off the second)? Your first reduce phase looks like it doesn't do anything … is it needed? Try removing it. 4. If you get all the way to the final phase before hitting the 9 second timeout, then it's may be that the re-reduce behavior of Riak KV's MapReduce causes your function to be too expensive. This will be especially true if you expect that phase to receive thousands of inputs. A sort function such as yours probably doesn't benefit from re-reduce, so I would recommend disabling it by adding "arg":{"reduce_phase_only_1":true} to that reduce phase's specification. With that in place, your function should be evaluated only once, with all the inputs it will receive. This may still fail because of the time it can take to encode/decode a large set of inputs/outputs to/from JSON, but doing it only once may be enough to get you finished. Hope that helps, Bryan ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
riak-admin test | problems
I'm able to start riak (riak start) and execute riak-admin status, which dumps lots of info about the local node... BUT when running riak-admin test, I always get the following: Attempting to restart script through sudo -u riakFailed to read test value: {error,{insufficient_vnodes,0,need,1} Is it something wrong with the installation? Haven't been able to get much info on the net Cheers ___ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com