Wow, this question slipped by while I wasn't looking. Sorry about that.

On Mon, Jul 16, 2012 at 4:47 PM, Mark Boyd ソフトウェア 建築家
<boy...@ldschurch.org> wrote:
> Can anyone familiar with the innards of riak describe how distribution of a
> map/reduce is handled when there are multiple reduce phases included as in
> this solution copied from below. I’m assuming that the first map phase would
> spread to nodes containing data for incoming bucket/key combinations and
> their output pulled back to the coordinating node for the first reduce
> phase. Then the second map phase would spread to (potentially different)
> nodes containing data for that phase’s incoming bucket/key combinations and
> their output pulled back to the coordinating node for the final reduce
> phase.

Exactly correct, Mark. Map is always spread to vnodes holding the
objects to be read/transformed. Reduce is always brought back to a
single node for aggregation. So you would have a
scatter-gather-scatter-gather pattern, just as you described.

Javascript map is quite limited in its handling of errors, as you
found. Erlang map phases, however, get an opportunity to handle the
notfound themselves. An example of what this looks like can be found
in the riak_kv_mapreduce:map_object_value/3 function:

https://github.com/basho/riak_kv/blob/master/src/riak_kv_mapreduce.erl#L81-99

So, if you find the intermediate aggregation of that filtering reduce
to be a problem, you could consider migrating to Erlang for your map
phase.

-Bryan

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to