Wow, this question slipped by while I wasn't looking. Sorry about that. On Mon, Jul 16, 2012 at 4:47 PM, Mark Boyd ソフトウェア 建築家 <boy...@ldschurch.org> wrote: > Can anyone familiar with the innards of riak describe how distribution of a > map/reduce is handled when there are multiple reduce phases included as in > this solution copied from below. I’m assuming that the first map phase would > spread to nodes containing data for incoming bucket/key combinations and > their output pulled back to the coordinating node for the first reduce > phase. Then the second map phase would spread to (potentially different) > nodes containing data for that phase’s incoming bucket/key combinations and > their output pulled back to the coordinating node for the final reduce > phase.
Exactly correct, Mark. Map is always spread to vnodes holding the objects to be read/transformed. Reduce is always brought back to a single node for aggregation. So you would have a scatter-gather-scatter-gather pattern, just as you described. Javascript map is quite limited in its handling of errors, as you found. Erlang map phases, however, get an opportunity to handle the notfound themselves. An example of what this looks like can be found in the riak_kv_mapreduce:map_object_value/3 function: https://github.com/basho/riak_kv/blob/master/src/riak_kv_mapreduce.erl#L81-99 So, if you find the intermediate aggregation of that filtering reduce to be a problem, you could consider migrating to Erlang for your map phase. -Bryan _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com