I'm interested in this, I'll fork the repo and see what I can get added in there.
On Aug 10, 2012, at 7:52 AM, Bryan Fink wrote: > On Thu, Aug 9, 2012 at 5:11 AM, Kresten Krab Thorup <k...@trifork.com> wrote: >> The only issue with this approach is AFAIK that M/R effectively runs with >> R=1, i.e. it doesn't ensure that a value is consistent across replicas. >> >> IMHO riak_kv_mapreduce should have a map_get_object_value, which does a >> proper RiakClient:get, i.e. something like this: [will be slower, but will >> honour the bucket's default R value]. > > I recently realized that this would be a fairly small and easy thing > to do since MR has been ported to Riak Pipe. I'm frying other fish at > the moment, but if any of your are interested, read on. > > In Riak Pipe, an MR "map" phase is broken into two steps: "get" and > "transform". The "get" phase is what reads the value from Riak. It is > currently implemented in riak_kv_pipe_get, in the riak_kv application. > > If you read riak_kv_pipe_get.erl, you'll see that all of the fetching > logic is in the process/3 function. Modifying this code to do a > regular riak_client:get instead of talking directly to a single vnode > should be easy. > > We would like to keep the existing implementation as the default, at > least for now. So, my suggestion would be to add the new behavior as > an option, with flags to control it. This could be accomplished either > by modifying riak_kv_pipe_get to look for a flag in its argument, or > by modifying riak_kv_mrc_pipe to use a new fitting instead of > riak_kv_pipe_get. > > With either modification, you'll want to also change riak_kv_mrc_pipe > to pass the map arguments through to the "get" fitting. These > arguments are the only place available to external clients to specify > any of the R-value tuning parameters. Yes, that means a map function > implementation will have to ignore them, but hopefully that's not > insurmountable. See the reduce_batch_size and reduce_phase_only_1 > optional "reduce" phase arguments for examples on how to do this. > > There are probably other ways to fit this kind of fetching behavior in > as well. While Kresten's map-function implementation is good, I think > this behavior is useful in more cases than resolving a > notfound. Hopefully what I've written above is enough to get one or > more of you started down a path. > > Cheers, > Bryan > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com