I'm interested in this, I'll fork the repo and see what I can get added in 
there.

On Aug 10, 2012, at 7:52 AM, Bryan Fink wrote:

> On Thu, Aug 9, 2012 at 5:11 AM, Kresten Krab Thorup <k...@trifork.com> wrote:
>> The only issue with this approach is AFAIK that M/R effectively runs with 
>> R=1, i.e. it doesn't ensure that a value is consistent across replicas.
>> 
>> IMHO riak_kv_mapreduce should have a map_get_object_value, which does a 
>> proper RiakClient:get, i.e. something like this: [will be slower, but will 
>> honour the bucket's default R value].
> 
> I recently realized that this would be a fairly small and easy thing
> to do since MR has been ported to Riak Pipe. I'm frying other fish at
> the moment, but if any of your are interested, read on.
> 
> In Riak Pipe, an MR "map" phase is broken into two steps: "get" and
> "transform". The "get" phase is what reads the value from Riak. It is
> currently implemented in riak_kv_pipe_get, in the riak_kv application.
> 
> If you read riak_kv_pipe_get.erl, you'll see that all of the fetching
> logic is in the process/3 function. Modifying this code to do a
> regular riak_client:get instead of talking directly to a single vnode
> should be easy.
> 
> We would like to keep the existing implementation as the default, at
> least for now. So, my suggestion would be to add the new behavior as
> an option, with flags to control it. This could be accomplished either
> by modifying riak_kv_pipe_get to look for a flag in its argument, or
> by modifying riak_kv_mrc_pipe to use a new fitting instead of
> riak_kv_pipe_get.
> 
> With either modification, you'll want to also change riak_kv_mrc_pipe
> to pass the map arguments through to the "get" fitting. These
> arguments are the only place available to external clients to specify
> any of the R-value tuning parameters. Yes, that means a map function
> implementation will have to ignore them, but hopefully that's not
> insurmountable. See the reduce_batch_size and reduce_phase_only_1
> optional "reduce" phase arguments for examples on how to do this.
> 
> There are probably other ways to fit this kind of fetching behavior in
> as well. While Kresten's map-function implementation is good, I think
> this behavior is useful in more cases than resolving a
> notfound. Hopefully what I've written above is enough to get one or
> more of you started down a path.
> 
> Cheers,
> Bryan
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to