Re: Differences between riak_client and riak_kv_mrc_pipe MapReduce when one node is down.

gunin Thu, 31 Jan 2013 03:09:40 -0800

Sorry John. You don't understand my question.
1. One node(I mean physical(erlang) node) in cluster is down.
2. It was down when i'm start job,when perform job and after it. We power off 
this node. It's under repair. But we don't remove this node from cluster.
3. All data that must be processed available. On primary and fallback vnodes 
started on other physical(erlang) node. We don't need read repair for this 
data.(As you see old style MapReduce work fine,it's use luke module)
4. If something fail I want receive error,but actually last reduce phase  call 
this empty list,and after that return empty list. For example 4 times of 5 i'am 
receive good result that contains all data,
but 1 of 5 i'm receive empty result.
5. I don't see any vnode fails during MapReduce task.


Thank you.

PS. I prepare simple test script later.

----- Исходное сообщение -----
От: "John Daily" <jda...@basho.com>
Кому: gu...@mail.mipt.ru
Копия: riak-users@lists.basho.com
Отправленные: Четверг, 31 Январь 2013 г 1:58:38
Тема: Re: Differences between riak_client and riak_kv_mrc_pipe MapReduce when 
one node is down.

Riak's MapReduce functionality cannot survive a node failure. If a vnode 
involved with a query fails while actively processing the request, the entire 
query will have to be re-run. The failed query should be automatically 
terminated, but you'll have to re-run the query yourself.

If you create queries using Riak Pipe (the technology layer beneath MapReduce), 
it is possible to create queries that can survive a vnode failure, but that is 
not a trivial exercise.

Regarding the empty result set you're seeing: one possibility is that a vnode 
has failed recently and has come back online without data. MapReduce will not 
currently trigger a read repair, but that problem should be resolved with the 
forthcoming Riak 1.3 release.

-John Daily
Technical Evangelist
Basho

On Jan 30, 2013, at 8:05 AM, gu...@mail.mipt.ru wrote:

> We have 6 node riak cluster.I simple custom erlang application for custom 
> MapReduce job.
> 
> We start MapReduce job using riak_kv_mrc_pipe pipe module,for example - 
> 
> Query = [{map, {modfun,Mod,MapFun},[do_prereduce,{from,1}], false},{reduce, 
> {modfun,Mod,ReduceFun},[{reduce_phase_batch_size, 1000}], true}],
> riak_kv_mrc_pipe:mapred({index,Bucket,Field,From,To},Query,Timeout).
> 
> But if one of the node down for along time. Response is unpredictable 
> sometimes it's return {ok,GoodResultList}, but sometimes {ok,[]}(empty list).
> We trace riak_kv and riak_pipe and found too problem:
> 1. In riak_kv_pipe_index or in riak_kv_pipe_liskeys created fitting_spec this 
> nval always is 1.
> 2. Actual error is occurred in riak_pipe_vnode:remaining_preflist that retun 
> empty PrefList for some Hash(#fitting_spec.nval is 1). It use 
> riak_core_apl:get_primary_apl function.
> 
> But if we use old style map reduce,for example:
>        {ok,C} = riak:local_client(),
>        Me = self(),
>        Query = [{map, {modfun,Mod,MapFun},[do_prereduce,{from,1}], 
> false},{reduce, {modfun,Mod,ReduceFun},[{reduce_phase_batch_size, 1000}], 
> true}],
>       {ok, {ReqId,FlowPid}} = C:mapred_stream(Query,Me,Timeout),
>       
> {ok,_}=riak_kv_index_fsm_sup:start_index_fsm(zont_riak_connection:riak_node(),
>  [{raw, ReqId,FlowPid}, [Bucket, none,{range,Field,From,To},Timeout,mapred]]),
>       luke_flow:collect_output(ReqId, Timeout).
> 
> Query executed well. But problem is that do_prereduce and 
> {reduce_phase_batch_size, 1000} is ignored,that why execution is slow.
> 
> 
> Can you make some recommendation? May be riak_pipe_vnode:remaining_preflist 
> we need use riak_core_apl:get_apl_ann or set #fitting_spec.nval to nval from 
> out Bucket props?
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Re: Differences between riak_client and riak_kv_mrc_pipe MapReduce when one node is down.

Reply via email to