Hi,

For Riak Mobile, I occasionally need to use a M/R job to scan all key/values in 
a bucket and compute a content-hash for each object in a bucket.  This works 
fine, but ...

I'd like to be able to do a "consistent map/reduce" job i.e., with "R=2 
semantics" for an "N=3 bucket".  Maybe other people have the same need, but I 
can't see if this is possible ... perhaps with the new riak_pipe infrastructure?

This is my idea:

The map function yields {Key, [{VectorClock,1,Hash}]} for each replica, but 
needs to run on *all* replicas of objects in a given Bucket.   Hash is the real 
value I'm interested in i.e., the content-hash for the object; but it could be 
some other "map" function output.

Then, the reduce phase needs to "merge" a list of {VectorClock,N,Hash} tuples, 
by considering the VectorClocks to determine if results are in "conflict", or 
if one is before/after the other.  N is reduced to the sum of all elements with 
equal Hash value.

For each output of the reduce phase I'll then have, for each key, a list of 
{VC,N,Hash}.  If one of those N values are >= quorum, then I have a consistent 
output value (Hash).

Questions:

- How can I have a M/R job run on *all* vnodes?  Not just for objects that are 
owned by a primary?

- The M/R "input" is essentially  listkeys(Bucket)  ... can this be done using 
"async keylisting", so that the operation does not hold up the vnode while 
listing?

If someone can sketch a solution, I'd be happy to go hacking on it ...   

Kresten



Mobile: + 45 2343 4626 | Skype: krestenkrabthorup | Twitter: @drkrab
Trifork A/S  |  Margrethepladsen 4  | DK- 8000 Aarhus C |  Phone : +45 8732 
8787  |  www.trifork.com
 


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to