question on map-reduce and throughput

Will Gage Tue, 28 Feb 2012 16:26:00 -0800

Hello,

I have a question regarding throughput of map-reduce queries -- in other words, 
how many per second I can reasonably try to do in my Riak cluster.  I have a 
decent sized data set - about 2 million keys for a total of about 240GB disk 
usage on a 6 node Riak cluster (version 1.0.1).  On top of that data is a Java 
application talking to Riak via protocol buffers wherein it would be nice to be 
able to throw a large volume of MapReduce queries at those keys, where the 
basic map function looks like this, and it gets called against a single key and 
"sub key" per query:


  function(object, subKey) { return [ Riak.mapValuesJson(object)[0][subKey] ] ; 
}

Is it reasonable to ask a 6 node Riak cluster on 4 core virtual servers with 
8GB RAM to do 1000 of those per second, with a sub-100ms 99th percentile 
latency?

In testing with 25 javascript VMs it looks good at 160 RPS, but under more 
realistic load I'm seeing it melt the Riak cluster and I'm wondering if this is 
something I can tune my way out of, or if I'm asking Riak to do the impossible.

By way of comparison, the same volume of simple gets against those keys works 
smoothly.

Thanks in advance,
Will

_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

question on map-reduce and throughput

Reply via email to