Overnight we do some data collection that is stored in Riak, and just last night we had one of our server's load spike very high and just drop back down to more acceptable levels. You can see a graph of it here<https://img.skitch.com/20110818-t8bctnyjx5e3bhrwyqfg9cag2k.png>. This node happens to be one that we had to restore from a backup on a couple days ago, so our initial thoughts were that it was just doing a lot of read repair and merging, but looking at the erlang.*.logs I don't see log entries for merging during all the points of high load, but certainly for some of them. The other nodes did exhibit some spiky load average last night, but the one I linked to certainly was the most egregious offender.
Another datapoint to consider is that our data collection job is also very cyclical. It will be doing 2,500 requests/minute (~2,000 GET, 500 PUT) to Riak and then one minute later that will suddenly jump to 7,000 requests/minute (~5,500 GET, 1,600 PUT). This cycle repeats for a somewhere between 2-4 hours overnight. Anecdotally, I've seen the load on our Riak nodes spike when the requests/minute on them goes from more-or-less flat to a high request rate, so I thought that perhaps the fluctuation in request rate to the riak nodes was causing some sort of problem. And after the initial 2-4 hours of the 2,500/7,000 request rate, we have a similarly shaped but smaller in throughput cyclical request pattern (500/2,000) where the load is much lower (< 4). So my main question for the list is - is this normal/abnormal behavior? Should we be concerned? These nodes are hosted on EC2 with ephemeral disks, so is the high load average is simply probably due to I/O wait. I checked and the CPU usage of Riak itself during the high load averages was very small (< 10% of total) so the source of the high load has to be I/O wait as far as I'm concerned, but I wasn't sure if I should be alarmed about the high load average or not in general? We're in the process of adding I/O wait to the monitoring system for our Riak nodes, so I'll likely have more data tomorrow on I/O wait during overnight data collection.
_______________________________________________ riak-users mailing list riak-users@lists.basho.com http://lists.basho.com/mailman/listinfo/riak-users_lists.basho.com