Hi, For Riak Mobile, I occasionally need to use a M/R job to scan all key/values in a bucket and compute a content-hash for each object in a bucket. This works fine, but ...
I'd like to be able to do a "consistent map/reduce" job i.e., with "R=2 semantics" for an "N=3 bucket". Maybe other people have the same need, but I can't see if this is possible ... perhaps with the new riak_pipe infrastructure? This is my idea: The map function yields {Key, [{VectorClock,1,Hash}]} for each replica, but needs to run on *all* replicas of objects in a given Bucket. Hash is the real value I'm interested in i.e., the content-hash for the object; but it could be some other "map" function output. Then, the reduce phase needs to "merge" a list of {VectorClock,N,Hash} tuples, by considering the VectorClocks to determine if results are in "conflict", or if one is before/after the other. N is reduced to the sum of all elements with equal Hash value. For each output of the reduce phase I'll then have, for each key, a list of {VC,N,Hash}. If one of those N values are >= quorum, then I have a consistent output value (Hash). Questions: - How can I have a M/R job run on *all* vnodes? Not just for objects that are owned by a primary? - The M/R "input" is essentially listkeys(Bucket) ... can this be done using "async keylisting", so that the operation does not hold up the vnode while listing? If someone can sketch a solution, I'd be happy to go hacking on it ... Kresten Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab Trifork A/S | Margrethepladsen 4 | DK- 8000 Aarhus C | Phone : +45 8732 8787 | www.trifork.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com