In my experience, there is little point in testing with less than N physical machines (when using replication factor N) + a load balancer. Riak is designed to run on this, and performance will be miserable if you try to run on a single machine. At first we tried running a number of virtual machines, but since disk i/o is usually the limiting factor, and Riak is fairly memory hungry in the default setup (and virtual machines are generally bad with memory hungry apps) that turned out as a terrible test setup. Now we have a stack of mac minis in the dev team that we can use for running performance tests. While they're not nearly as fast as the real servers, they are a much better predictor for performance characteristics.
To get loads fast we run many threads [~10 per target machine in our case] in the loader app, and make sure to either use a load balancer or do the load balancing in the client app. Kresten On Aug 1, 2011, at 11:13 PM, gtuhl wrote: > I am currently load testing Riak using riak_0.14.2-1_amd64.deb with > fs.file-max set to 503840 for all users. > > I have a reasonably large set of data (hundreds of millions of documents, > many terabytes in size) that is currently stored in a combination of > PostgreSQL+Redis and Disco/DDFS. The first for key/value and the second for > map/reduce to satisfy the full set of user requirements. > > I am trying to consolidate these data sources so trying out a variety of > different data stores with the potential of satisfying both usage types. > > With Riak, my main challenge is getting this data loaded. Using the PHP > library I am able to push 100-200 documents/sec. Is there a recommended > approach to bulk loading data? At that pace it would take a couple months > to load everything. That is not necessarily a deal breaker, but wanted to > sniff around for better options. > > Related to this, I did attempt to break up my records and load them with a > bunch of concurrently running loaders. This actually seems to work fairly > well with not much of a penalty in terms of documents/sec on any single > loader process. But, once I reach 4-5 loaders running concurrently I > consistently get the "Could not contact Riak Server" error and all of my > loader processes die simultaneously. If I wait a few seconds the Riak > server does begin to respond again. > > Any idea for approaching this differently? Is attempting to run many > loaders concurrently a bad idea with Riak? > > I am running a single server right now while I test with bucket nval set to > 1. > > -- > View this message in context: > http://riak-users.197444.n3.nabble.com/Bulk-loading-data-and-Could-not-contact-Riak-Server-error-tp3217091p3217091.html > Sent from the Riak Users mailing list archive at Nabble.com. > > _______________________________________________ > riak-users mailing list > riak-users@lists.basho.com > http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com _______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com