It'd be interesting to see what numbers you get if your script sets R
& W = 1
See:
http://wiki.basho.com/display/RIAK/REST+API#RESTAPI-Storeaneworexistingobjectwithakey
/Mårten
On 10 maj 2010, at 17.36, Karsten Thygesen <kar...@netic.dk> wrote:
Hi
I'm doing a small proof-of-concept and the goal is to store about 250.000.000
records in a Riak cluster. Today, we have the data in MySQL, but we
strive for better performance and we might even expect up to 5 times
as mush data during the next couple of years. The data is
denormalized and "document" like so they are an easy match for NoSQL
paradigm.
For the small POC, I have built a 4 node cluster with 4 dedicated
virtual servers running Opensolaris on top of VMWare but with quite
fast storage below. In fron of the cluster I have a loadbalancer
which will distribute reuests evenly among the nodes.
Each node is running riak-0.10 with almost deafult configuration. I
have added "-smp enabled" to vm.args and each node is otherwise
using default configuration (except for name of cause). This also
implies N=2 and dest for storage backend.
I have written a small ruby script which uses riak-client from
Ripple (latest version) as well as curd for http connections and it
quite simple takes each record from the database and stores is in
riak. Each record is around 500-1000 bytes large and entirely
structured text/data. I store them as JSON objects.
The script can easily read more than 15.000 records/second, process
them and print them to the screen, so I doubt the script is the
bottleneck.
When I try to write them to the riak cluster via the loadbalancer, I
can only write around 50-60 records/second and while writing, the
beam process is only using around 10% cpu and no major IO activity
is going on.
I have tried to move the data directory to /tmp (memory filesystem)
and with this setup, I can get around 90 write/sec (yes - only for
testing - I can not live with memoryfilesystem in production with
this dataset).
I have also noticed, that the performance I get is almost equivalent
nomatter if I write through the loadbalancer or I just select a node
and sends all my writes to that one.
I have also tried a "multithreaded" approach where I simply run two
of my datamover scripts in parallel, and that way, I can get around
110 writes/second.
With the current performance, it will take me more than a month to
move my data from mysql to Riak, so I need a multitude of better
performance.
Do you have any suggestions for how to get better performance? I was
hoping for towards 1000 writes/second so feel free to speculate -
perhaps I should just add quite a bunch of more servers?
Best regards,
Karsten
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com
_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com