For those wishing to know how I solved this and to shed light on debugging map/reduce here is what I did.
Background: I’m dealing with a set of keys that are mirrored in two buckets, an authorization expression bucket and a protected objects bucket. My goal is to use map/reduce to evaluate the authz expressions for the passed-in keys and return only those protected objects for which a user is authorized. But authz expressions and protected objects themselves may not exist since they could be deleted while references to them may not have been cleaned up as yet. For my input I have this bucket/key pairs array from other processing. I have authz expressions for v1 to v4 but not v5 to v8 and protected objects for v1 to v3 but v4 to v8. [[“authz”, “v1”], [“authz”, “v2”], [“authz”, “v3”], [“authz”, “v4”], … [“authz”, “v8”]] I’m using riak-js in node.js. My map reduce looked like this to begin with: db.add(pairs) .map(evaluation.toMapReduceForm, { 'obj-bucket' : 'v2.tv', 'user-atts' : userAtts }) .map('Riak.mapValuesJson') // converts the buckets and keys array into array of json objects .run(function(err, listOfViews) { if (err) { console.log("ERROR: Unable to obtain tvs for id '" + id + "'. Detail: " + JSON.stringify(err)); send500ToClient(response); return; } callback(listOfViews); }); This results in the err object being the unhelpful {"statusCode":500}. Fortunately, I have an http proxy that I wrote, “google wamulator”, that I’ve configured allowing all riak-js http traffic passing to riak to pass through the proxy exposing what passes across the wire. And here is what I saw: { * "phase":0, * "error":"function_clause", * "input":"{{error,notfound},{<<"v2.tv.authz">>,<<"v5">>},{struct,[{<<"type">>,<<"FALSE">>}]}}", * "type":"error", * "stack":"[{riak_kv_pipe_get,bkey,[{not_found,{<<"v2.tv.authz">>,<<"v5">>},{struct,[{<<"type">>,<<"FALSE">>}]}}]},{riak_kv_pipe_get,bkey_chash,1},{riak_pipe_vnode,queue_work,4},{riak_kv_mrc_map,send_results,2},{riak_pipe_vnode_worker,process_input,3},{riak_pipe_vnode_worker,wait_for_input,2},{gen_fsm,handle_msg,7},{proc_lib,init_p_do_apply,3}]" } This is where it got interesting. It appears that it wasn’t finding the authz object for the v5 key. So I assumed it was failing before even hitting my first map function. On the contrary, _it wasn’t_. On a whim I commented out the second map and the reduce portions. And ran the query again. And the following array was returned. [ [ "v2.tv", "v1" ], { "not_found": { "bucket": "v2.tv.authz", "key": "v5", "keydata": "undefined" } }, … more not_found objects, one for each missing key, [ "v2.tv", "v3" ], ] This gave me some great information: 1) If I don’t have a reduce phase my objects returned from a map phase make it back to the client as-is. We can use that for debugging! 2) I was getting these weird not_found objects included with my two objects (of three) for which the user was authorized. Now where did those not_found objects come from? After _much_ trial and error I came to the conclusion that to each phase is passed an array. A map phase interprets that as an array of bucket/key pair arrays. For each of those the map phase looks for the corresponding item. If not found, the map phase puts one of these not_found objects in its output array. If an item _is_ found it passes the item to the map function and sticks any returned object into the output array. Note that I said “returned object” not “bucket/key pairs”. As noted in item 1 above, it appears to be crafting another input array without interpretation. It appears that intepretation belongs to the next phase. And if there is no next phase, then that array propagates back to the client as-is including any not_founds for missing bucket/key objects in the input array to the map. In contrast, it appears that a reduce phase takes the incoming array as-is without treating them as bucket/key pairs. Now back to my original error. The not_found error for the v5 key was coming from the second map phase, the mapValuesJson part. As noted, it tries to interpret the incoming array as bucket/key pair array objects and sees those not_found items and throws the error. So how did I solve this problem? Riak has some pre-defined javascript functions that can be used in map/reduce defined at https://github.com/basho/riak_kv/blob/master/priv/mapred_builtins.js. I noted that one of these, filterNotDefined, had a single argument having a plural name, values. That led me to believe that it was solely for use in the reduce phase. So here is what I did. Notice that after each map phase to which keys will be passed that might not exist I have a reduce phase that leverages the filterNotDefined function to pull those not_found objects from the array. That last one is there so that I don’t get those not_found objects in the array returned from riak. db.add(pairs) .map(evaluation.toMapReduceForm, { 'obj-bucket' : 'v2.tv', 'user-atts' : userAtts }) .reduce('Riak.filterNotFound') .map('Riak.mapValuesJson') // converts the buckets and keys array into array of json objects .reduce('Riak.filterNotFound') .run(function(err, listOfViews) { // process on client the list of returned array objects if (err) { console.log("ERROR: Unable to obtain tvs for id '" + id + "'. Detail: " + JSON.stringify(err)); send500ToClient(response); return; } callback(listOfViews); }); Yes, you can have multiple reduce steps and that solves the “not found” issue. Hope this helps. Mark From: Mark Boyd ソフトウェア 建築家 Sent: Sunday, July 15, 2012 10:18 PM To: riak-users@lists.basho.com Subject: RE: mapreduce with non-existent keys Never mind. I found the archive search page and this same question posted earlier here: http://riak-users.197444.n3.nabble.com/Map-Reduce-behavior-when-key-not-found-td3641739.html Mark From: riak-users-boun...@lists.basho.com<mailto:riak-users-boun...@lists.basho.com> [mailto:riak-users-boun...@lists.basho.com]<mailto:[mailto:riak-users-boun...@lists.basho.com]> On Behalf Of Mark Boyd ?????? ??? Sent: Sunday, July 15, 2012 7:55 AM To: riak-users@lists.basho.com<mailto:riak-users@lists.basho.com> Subject: mapreduce with non-existent keys I’ve got a set of bucket/key pairs that may contain items that no longer exist in riak. Is it possible to pass that to map/reduce and explicitly tell riak to ignore any pairs which aren’t current, ie: which aren’t found? For example, if I have compiled a list of pairs but before passing the list, one or more of those items was removed from the database, then my map/reduce appears to fail since it doesn’t find the referenced item. Can riak be told to ignore such missing items if they are incurred? Thanks. Mark NOTICE: This email message is for the sole use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message.
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com