hi. i'm a new riak user. i scanned the list archives and didn't notice either of these things discussed - i apologize in advance if they already have been.
i noticed something that is perhaps of use to people. when using the data of an object in a map phase to generate a list of [bucket,object] pairs for a second map phase i've seen the suggestion to use a json array as the data and Riak.mapValues to generate the list. i've been playing with this (and some more complex json objects beyond a simple array) and i've come to the conclusion that using a comma separated list as the data instead of a json array is far more efficient on large datasets. JSON.parse (and the eval()) it calls is relatively slow, but a split(',') operation is very quick. on a 100,000 element dataset on my test machine the split averages about 25ms to convert the csv data into a javascript array object, but using JSON.parse to convert the json data into a javascript array object averages around 750ms. for smaller datasets there isn't much of a difference, but for large ones the difference can become quite significant. hopefully this observation is of use to someone besides me. :) i also have a question regarding map performance with large objects. if i create an object with one of these large 100,000 element datasets in it (whether a csv or json array doesn't matter) then stop and start riak, i can retrieve the object (get, via rest api) in about 0.2 seconds. subsequent requests take about 0.06 seconds, i assume because riak had to load the data from disk the first time and had it cached the second. at any rate, riak is very quick in both cases. if, however, i then do a simple map operation of "function(v){return[1]}" where the map doesn't actually do any work at all on just that single object, then the time to process the request balloons to 2.1 seconds. given that riak can return the data via GET in 0.06 seconds, 2.1 seconds simply to load the object into a map phase seems excessive. is this a penalty for passing the data into the javascript engine, and thus something i could avoid by doing the map operation natively in erlang (my erlang skills are pretty weak, but if this works around the issue it'll be a good driver to force me to improve them)? or is it simply a penalty of the map phase, with no way around it? (everything above is based on tests with riak-0.14-0) thanks -brendan _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com