I was writing a new mapreduce query to look at users over time, and ran it over a single user in production. After that, other mapreduce jobs over users started returning results from my new map phase, some of the time. After five minutes of this, I had to restart every node in the cluster to get it to stop.

Every node has {map_cache_size, 0} in riak_kv.

The map phase that screwed things up was:

function(v) {
  o = JSON.parse(v.values[0].data);

  // Age of account in days
  age = Math.round(
    (Date.now() - Date.iso8601(o.created_at)) /
    (1000 * 60 * 60 * 24)
  );

  return [['t_user_scores', v.key, age]];
}

It looks like one node started running that phase instead of the requested phase for subsequent jobs. It *should* have run this one, but didn't.

function(v) {
        o = JSON.parse(v.values[0].data);
        return [{
                key: v.key,
                name: o.name,
                thumbnail: o.thumbnail
        }];
}

Now I'm scared to run MR jobs. Could it be an issue with returning keydata? Anybody else seen this before?

--Kyle

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to