In my experience, there is little point in testing with less than N physical 
machines (when using replication factor N) + a load balancer.  Riak is designed 
to run on this, and performance will be miserable if you try to run on a single 
machine.   At first we tried running a number of virtual machines, but since 
disk i/o is usually the limiting factor, and Riak is fairly memory hungry in 
the default setup (and virtual machines are generally bad with memory hungry 
apps) that turned out as a terrible test setup.  Now we have a stack of mac 
minis in the dev team that we can use for running performance tests.  While 
they're not nearly as fast as the real servers, they are a much better 
predictor for performance characteristics.

To get loads fast we run many threads [~10 per target machine in our case] in 
the loader app, and make sure to either use a load balancer or do the load 
balancing in the client app.

Kresten

On Aug 1, 2011, at 11:13 PM, gtuhl wrote:

> I am currently load testing Riak using riak_0.14.2-1_amd64.deb with
> fs.file-max set to 503840 for all users.
> 
> I have a reasonably large set of data (hundreds of millions of documents,
> many terabytes in size) that is currently stored in a combination of
> PostgreSQL+Redis and Disco/DDFS.  The first for key/value and the second for
> map/reduce to satisfy the full set of user requirements.
> 
> I am trying to consolidate these data sources so trying out a variety of
> different data stores with the potential of satisfying both usage types.
> 
> With Riak, my main challenge is getting this data loaded.  Using the PHP
> library I am able to push 100-200 documents/sec.  Is there a recommended
> approach to bulk loading data?  At that pace it would take a couple months
> to load everything.  That is not necessarily a deal breaker, but wanted to
> sniff around for better options.
> 
> Related to this, I did attempt to break up my records and load them with a
> bunch of concurrently running loaders.  This actually seems to work fairly
> well with not much of a penalty in terms of documents/sec on any single
> loader process.  But, once I reach 4-5 loaders running concurrently I
> consistently get the "Could not contact Riak Server" error and all of my
> loader processes die simultaneously.  If I wait a few seconds the Riak
> server does begin to respond again.
> 
> Any idea for approaching this differently?  Is attempting to run many
> loaders concurrently a bad idea with Riak?
> 
> I am running a single server right now while I test with bucket nval set to
> 1.
> 
> --
> View this message in context: 
> http://riak-users.197444.n3.nabble.com/Bulk-loading-data-and-Could-not-contact-Riak-Server-error-tp3217091p3217091.html
> Sent from the Riak Users mailing list archive at Nabble.com.
> 
> _______________________________________________
> riak-users mailing list
> riak-users@lists.basho.com
> http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com


_______________________________________________
riak-users mailing list
riak-users@lists.basho.com
http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com

Reply via email to