Can anyone familiar with the innards of riak describe how distribution of a 
map/reduce is handled when there are multiple reduce phases included as in this 
solution copied from below. I’m assuming that the first map phase would spread 
to nodes containing data for incoming bucket/key combinations and their output 
pulled back to the coordinating node for the first reduce phase. Then the 
second map phase would spread to (potentially different) nodes containing data 
for that phase’s incoming bucket/key combinations and their output pulled back 
to the coordinating node for the final reduce phase.

Is that correct?

db.add(pairs)
    .map(evaluation.toMapReduceForm, { 'obj-bucket' : 'v2.tv', 'user-atts' : 
userAtts })
        .reduce('Riak.filterNotFound')
        .map('Riak.mapValuesJson') // converts the buckets and keys array into 
array of json objects
        .reduce('Riak.filterNotFound')
        .run(function(err, listOfViews) { // process on client the list of 
returned array objects
            if (err) {
               console.log("ERROR: Unable to obtain tvs for id '" + id + "'. 
Detail: " + JSON.stringify(err));
                   send500ToClient(response);
               return;
            }
            callback(listOfViews);
        });

Mark

From: riak-users-boun...@lists.basho.com 
[mailto:riak-users-boun...@lists.basho.com] On Behalf Of Mark Boyd ?????? ???
Sent: Monday, July 16, 2012 2:35 PM
To: riak-users@lists.basho.com
Subject: RE: mapreduce with non-existent keys

For those wishing to know how I solved this and to shed light on debugging 
map/reduce here is what I did.

Background:

I’m dealing with a set of keys that are mirrored in two buckets, an 
authorization expression bucket and a protected objects bucket. My goal is to 
use map/reduce to evaluate the authz expressions for the passed-in keys and 
return only those protected objects for which a user is authorized. But authz 
expressions and protected objects themselves may not exist since they could be 
deleted while references to them may not have been cleaned up as yet.

For my input I have this bucket/key pairs array from other processing. I have 
authz expressions for v1 to v4 but not v5 to v8 and protected objects for v1 to 
v3 but v4 to v8.

[[“authz”, “v1”], [“authz”, “v2”], [“authz”, “v3”], [“authz”, “v4”], … 
[“authz”, “v8”]]

I’m using riak-js in node.js. My map reduce looked like this to begin with:

db.add(pairs)
    .map(evaluation.toMapReduceForm, { 'obj-bucket' : 'v2.tv', 'user-atts' : 
userAtts })
                .map('Riak.mapValuesJson') // converts the buckets and keys 
array into array of json objects
                .run(function(err, listOfViews) {
                    if (err) {
                        console.log("ERROR: Unable to obtain tvs for id '" + id 
+ "'. Detail: " + JSON.stringify(err));
                                send500ToClient(response);
                        return;
                    }
                    callback(listOfViews);
                });

This results in the err object being the unhelpful {"statusCode":500}. 
Fortunately, I have an http proxy that I wrote, “google wamulator”, that I’ve 
configured allowing all riak-js http traffic passing to riak to pass through 
the proxy exposing what passes across the wire. And here is what I saw:

{
*         "phase":0,
*         "error":"function_clause",
*         
"input":"{{error,notfound},{<<"v2.tv.authz">>,<<"v5">>},{struct,[{<<"type">>,<<"FALSE">>}]}}",
*         "type":"error",
*         
"stack":"[{riak_kv_pipe_get,bkey,[{not_found,{<<"v2.tv.authz">>,<<"v5">>},{struct,[{<<"type">>,<<"FALSE">>}]}}]},{riak_kv_pipe_get,bkey_chash,1},{riak_pipe_vnode,queue_work,4},{riak_kv_mrc_map,send_results,2},{riak_pipe_vnode_worker,process_input,3},{riak_pipe_vnode_worker,wait_for_input,2},{gen_fsm,handle_msg,7},{proc_lib,init_p_do_apply,3}]"
}

This is where it got interesting. It appears that it wasn’t finding the authz 
object for the v5 key. So I assumed it was failing before even hitting my first 
map function. On the contrary, _it wasn’t_. On a whim I commented out the 
second map and the reduce portions. And ran the query again. And the following 
array was returned.

[
  [
    "v2.tv",
    "v1"
  ],
  {
    "not_found": {
      "bucket": "v2.tv.authz",
      "key": "v5",
      "keydata": "undefined"
    }
  },
… more not_found objects, one for each missing key,
  [
    "v2.tv",
    "v3"
  ],
]

This gave me some great information:


1)      If I don’t have a reduce phase my objects returned from a map phase 
make it back to the client as-is. We can use that for debugging!

2)      I was getting these weird not_found objects included with my two 
objects (of three) for which the user was authorized.

Now where did those not_found objects come from? After _much_ trial and error I 
came to the conclusion that to each phase is passed an array. A map phase 
interprets that as an array of bucket/key pair arrays. For each of those the 
map phase looks for the corresponding item. If not found, the map phase puts 
one of these not_found objects in its output array. If an item _is_ found it 
passes the item to the map function and sticks any returned object into the 
output array.

Note that I said “returned object” not “bucket/key pairs”. As noted in item 1 
above, it appears to be crafting another input array without interpretation. It 
appears that intepretation belongs to the next phase. And if there is no next 
phase, then that array propagates back to the client as-is including any 
not_founds for missing bucket/key objects in the input array to the map. In 
contrast, it appears that a reduce phase takes the incoming array as-is without 
treating them as bucket/key pairs.

Now back to my original error. The not_found error for the v5 key was coming 
from the second map phase, the mapValuesJson part. As noted, it tries to 
interpret the incoming array as bucket/key pair array objects and sees those 
not_found items and throws the error.

So how did I solve this problem?

Riak has some pre-defined javascript functions that can be used in map/reduce 
defined at 
https://github.com/basho/riak_kv/blob/master/priv/mapred_builtins.js. I noted 
that one of these, filterNotDefined, had a single argument having a plural 
name, values. That led me to believe that it was solely for use in the reduce 
phase. So here is what I did. Notice that after each map phase to which keys 
will be passed that might not exist I have a reduce phase that leverages the 
filterNotDefined function to pull those not_found objects from the array. That 
last one is there so that I don’t get those not_found objects in the array 
returned from riak.

db.add(pairs)
    .map(evaluation.toMapReduceForm, { 'obj-bucket' : 'v2.tv', 'user-atts' : 
userAtts })
        .reduce('Riak.filterNotFound')
        .map('Riak.mapValuesJson') // converts the buckets and keys array into 
array of json objects
        .reduce('Riak.filterNotFound')
        .run(function(err, listOfViews) { // process on client the list of 
returned array objects
            if (err) {
               console.log("ERROR: Unable to obtain tvs for id '" + id + "'. 
Detail: " + JSON.stringify(err));
                   send500ToClient(response);
               return;
            }
            callback(listOfViews);
        });

Yes, you can have multiple reduce steps and that solves the “not found” issue. 
Hope this helps.

Mark

From: Mark Boyd ソフトウェア 建築家
Sent: Sunday, July 15, 2012 10:18 PM
To: riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>
Subject: RE: mapreduce with non-existent keys

Never mind. I found the archive search page and this same question posted 
earlier here:

http://riak-users.197444.n3.nabble.com/Map-Reduce-behavior-when-key-not-found-td3641739.html

Mark

From: 
riak-users-boun...@lists.basho.com<mailto:riak-users-boun...@lists.basho.com> 
[mailto:riak-users-boun...@lists.basho.com]<mailto:[mailto:riak-users-boun...@lists.basho.com]>
 On Behalf Of Mark Boyd ?????? ???
Sent: Sunday, July 15, 2012 7:55 AM
To: riak-users@lists.basho.com<mailto:riak-users@lists.basho.com>
Subject: mapreduce with non-existent keys

I’ve got a set of bucket/key pairs that may contain items that no longer exist 
in riak. Is it possible to pass that to map/reduce and explicitly tell riak to 
ignore any pairs which aren’t current, ie: which aren’t found? For example, if 
I have compiled a list of pairs but before passing the list, one or more of 
those items was removed from the database, then my map/reduce appears to fail 
since it doesn’t find the referenced item. Can riak be told to ignore such 
missing items if they are incurred?

Thanks.

Mark


NOTICE: This email message is for the sole use of the intended recipient(s) and 
may contain confidential and privileged information. Any unauthorized review, 
use, disclosure or distribution is prohibited. If you are not the intended 
recipient, please contact the sender by reply email and destroy all copies of 
the original message.

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to