Hi. we are load testing Riak 1.4.2 in the amazon cloud (8 nodes: c1.xlarge + 1 ebs) at the moment and plan to go into production early next year. So far things have been going pretty well and we increased the load and the total time of the load tests almost weekly.
The current target is a 24 hour endurance test. Unfortunately after a few hours our tests fail. After looking at the Ganglia graphs (the riak metrics) we suspect it is triggered by Riak. What we see is that all nodes at almost same time "spike" in the reponse time and the number of coordinated requests drops. We think the bitcask files get merged at almost the same time causing the load tests to fail. Does this make sense? And how can we prevent this from happening? Our IOPS are pretty limited which we also looking forward to improve somehow (amazon offers SSD drives only for a very few instance types .. sigh). Thanks a bunch. Michael
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com