Item number one: if you are using stock riak then you are also using
the stock nval (number of replicas) of 3. This means that your 1000 k/
v write is actually 3000 items written to disk.
Next, riak traverses the entire key space when doing an m/r over a
bucket. You either must explicitly provide bucket/key pairs to an m/r
or explore the new key filtering provided in the 0.14 release.
Listkeys is a very costly operation and only increases in cost as your
number of keys grow.
There are more caveats but I'll end with three. For any critically
performant system you must use the protocol buffers interface and you
must juggle connections. Additionally, anonymous JavaScript functions
have a penalty associated. Lastly you should also upgrade from
JavaScript m/r functions to erlang. There is performance impedance
when pushing json from the native erlang interface into the JavaScript
vm.
Riak has many benefits but bleeding single node performance is not one
of them. Predictable, scaleable units of performance per node
throughout a cluster is.
Best,
Alexander
@siculars on twitter
http://siculars.posterous.com
Sent from my iPhone
On Jan 12, 2011, at 20:33, Alexander Staubo <li...@purefiction.net>
wrote:
I'm experimenting with a test dataset to gauge whether Riak is
suitable for a particular app. My real dataset has millions of
records, but I'm testing with just a thousand items, and
unfortunately, I am getting horrible performance -- so horrible it
can't possibly be right. What am I doing wrong?
My environment:
* Riak 0.14 with default config
* Sean Cribb's Ruby client
* MacOS X Snow Leopard
* Ruby 1.9.2
* Erlang R14B01 from MacPorts
I am testing with a single node on my MacBook, which should be enough
for just a thousand key/value-pairs. These tests are run on an
initially empty database, from a single Ruby app. Each test has been
run at least 10 times consecutively to eliminate outliers and ensure
optimal cache fill.
Here are some numbers:
* 9.6 seconds to store 1,000 items. They are loaded from a text file
as JSON data. Parsing/processing overhead is about 0.8 s, the rest is
Riak. In JSON format, the items total 570 KB. The resultant Bitcask
data directory is 3.9 MB.
* 0.3 seconds to list all keys in the bucket [1].
* 1.8 seconds to list all keys and then fetch each object [2].
* 1.5 seconds to run a very simple map/reduce query [3].
Here's something else that is weird. I repeated the steps above on a
new, empty bucket, again using just 1,000 items, but after loading 1.5
million items into a separate, empty bucket. The numbers now are very
odd:
* 4.5 seconds to list all keys.
* 6.5 seconds to list + fetch.
* 5.1 seconds to run map/reduce query.
Why are operations on the small bucket suddenly worse in the presence
of a separate, large bucket? Surely the key spaces are completely
separate? Even listing keys or querying on an *empty* bucket is taking
several seconds in this scenario.
So are these timings appropriate for such a tiny dataset, and if not,
what could I be doing wrong? I'm new to Riak and I'm not sure if the
map/reduce-query is optimally expressed, so maybe that could be fixed.
Even so, storage and key-querying performance seems off by perhaps an
order of magnitude.
I have confirmed the performance issue on an Amazon EC2 instance
running Ubuntu Maverick, where performance was in fact considerably
worse.
[1] Just looping over bucket.keys.
[2] Basically: bucket.keys { |keys| keys.each { |key| bucket.get
(key) } }
[3] Here's the query code. Each stored item is a JSON hash from which
a key ("path") is mapped, then reduced to aggregate the counts of each
path.
mr = Riak::MapReduce.new(client)
mr.add("test")
mr.map <<-end, :keep => false
function(v) {
var paths = [];
var entry = Riak.mapValuesJson(v)[0];
var out = {};
out[entry.path] = 1;
paths.push(out);
return paths;
}
end
mr.reduce <<-end.strip, :keep => true
function(values) {
var result = {};
for (var i = 0; i < values.length; i++) {
var table = values[i];
for (var k in table) {
var count = table[k];
if (result[k]) {
result[k] += count;
} else {
result[k] = count;
}
}
}
return [result];
}
end
results = mr.run
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com