map function for link-walking

2010-07-10 Thread Nicolas Fouché
In the "one-to-very-many link associations" thread , Sean Cribbs talks
about a map function which does link-walking from links stored in
object contents. http://bit.ly/cKguqQ

"Another way to cope with large numbers of links is to
encapsulate them in the object itself, rather than in the headers.  This removes
the header-length/count limitation, but would require you to have a map function
that understands the internals of the object.  Also, you would need to deal with
the larger size of the object, which could potentially slow down your request."

Is there any chance someone shares the code of a map function doing
this (custom-)link-walking ?

The only example I found is in the "Practical Map-Reduce: Forwarding
and Collecting" blog article
http://blog.basho.com/2010/04/14/practical-map-reduce:-forwarding-and-collecting/
It gatheres links from objects and call the "map" function on them.
So, as Sean says, a Link object has to be build to be able to call the
"map" function on it.

By the way, the result of the "map" function is cached like any
standard Map phase ?

It's the only step before we go with Riak in our project :)

Thanks!

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


Re: map function for link-walking

2010-07-10 Thread Bryan Fink
On Sat, Jul 10, 2010 at 4:45 PM, Nicolas Fouché  wrote:
> In the "one-to-very-many link associations" thread , Sean Cribbs talks
> about a map function which does link-walking from links stored in
> object contents. http://bit.ly/cKguqQ
>
> "Another way to cope with large numbers of links is to
> encapsulate them in the object itself, rather than in the headers.  This 
> removes
> the header-length/count limitation, but would require you to have a map 
> function
> that understands the internals of the object.  Also, you would need to deal 
> with
> the larger size of the object, which could potentially slow down your 
> request."
>
> Is there any chance someone shares the code of a map function doing
> this (custom-)link-walking ?

Hi, Nicolas.  Any function you have that returns a list of bucket-key
pairs, in the same format as the "inputs" list for the map/reduce
query, will work.  For example, if you stored your object's links in a
"mylinks" field in it's value, like so:

$ curl -X PUT -H "content-type:application/json"
http://localhost:8098/riak/example/foo --data @-
{"mylinks":[["example","bar"],["example","baz"]],"myval":1}
^D
$ curl -X PUT -H "content-type:application/json"
http://localhost:8098/riak/example/bar --data @-
{"mylinks":[["example","baz"]],"myval":2}
^D
$ curl -X PUT -H "content-type:application/json"
http://localhost:8098/riak/example/baz --data @-
{"mylinks":[["example","foo"]],"myval":3}
^D

Then you could use a very simple map function like:
   function(v) {
  return v.not_found ? [] : JSON.parse(v.values[0].data).mylinks;
   }

And then the link-walking is simple:

carboy:riak bryan$ curl -X POST -H "content-type:application/json"
http://localhost:8098/mapred --data @-
{"inputs":[["example","foo"]],"query":[{"map":{"language":"javascript","source":"function(v)
{ return v.not_found ? [] : JSON.parse(v.values[0].data).mylinks;
}"}},{"map":{"language":"javascript","source":"function(v) { return
[JSON.parse(v.values[0].data).myval]; }"}}]}
^D
[2,3]

That query uses two map phases to start at the example/foo object I
created above, and then follow the links it has to the example/bar and
example/baz, and extracting the "myval" field from the values of those
objects.

I'd recommend adding a little defensive programming in to make sure
that "mylinks" is defined, and that it's a list of the proper shape.
It would also be a good idea to define these function in a file that
Riak would preload, instead of specifying them dynamically in the
query (for performance).  But, you could also take it in another
direction: if you knew that all of your links were going to point to
objects in a certain bucket, you could store just the keys in the
object, and produce bucket-key pairs with a quick map function  (e.g.
mykeys.map(function(k) { return ["otherbucket", k]; })

Hope that helps.

-Bryan

___
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com