map performance notes and question

Brendan Tue, 18 Jan 2011 10:39:31 -0800

hi. i'm a new riak user. i scanned the list archives and didn't notice
either of these things discussed - i apologize in advance if they
already have been.


i noticed something that is perhaps of use to people. when using the
data of an object in a map phase to generate a list of [bucket,object]
pairs for a second map phase i've seen the suggestion to use a json
array as the data and Riak.mapValues to generate the list. i've been
playing with this (and some more complex json objects beyond a simple
array) and i've come to the conclusion that using a comma separated list
as the data instead of a json array is far more efficient on large
datasets.

JSON.parse (and the eval()) it calls is relatively slow, but a
split(',') operation is very quick. on a 100,000 element dataset on my
test machine the split averages about 25ms to convert the csv data into
a javascript array object, but using JSON.parse to convert the json data
into a javascript array object averages around 750ms. for smaller
datasets there isn't much of a difference, but for large ones the
difference can become quite significant. hopefully this observation is
of use to someone besides me. :)

i also have a question regarding map performance with large objects. if
i create an object with one of these large 100,000 element datasets in
it (whether a csv or json array doesn't matter) then stop and start
riak, i can retrieve the object (get, via rest api) in about 0.2
seconds. subsequent requests take about 0.06 seconds, i assume because
riak had to load the data from disk the first time and had it cached the
second. at any rate, riak is very quick in both cases.

if, however, i then do a simple map operation of
"function(v){return[1]}" where the map doesn't actually do any work at
all on just that single object, then the time to process the request
balloons to 2.1 seconds. given that riak can return the data via GET in
0.06 seconds, 2.1 seconds simply to load the object into a map phase
seems excessive. is this a penalty for passing the data into the
javascript engine, and thus something i could avoid by doing the map
operation natively in erlang (my erlang skills are pretty weak, but if
this works around the issue it'll be a good driver to force me to
improve them)? or is it simply a penalty of the map phase, with no way
around it?

(everything above is based on tests with riak-0.14-0)

thanks
-brendan

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

map performance notes and question

Reply via email to