Peter, Thanks for your response. I'm looking into some of the ideas in your other recent mail, but I had another followup question on this one...
Is there any way to control the CPU load when using the "stress" benchmark? I have some control over that with our home-grown benchmark, but I thought it made sense to use the official benchmark tool as people might more readily believe those results and/or be able to reproduce them. But offhand, I don't see any to throttle back the load created by the stress test. On Mon, Dec 19, 2011 at 09:47:32PM -0800, Peter Schuller wrote: > > I'm trying to understand if this is expected or not, and if there is > > Without careful tuning, outliers around a couple of hundred ms are > definitely expected in general (not *necessarily*, depending on > workload) as a result of garbage collection pauses. The impact will be > worsened a bit if you are running under high CPU load (or even maxing > it out with stress) because post-pause, if you are close to max CPU > usage you will take considerably longer to "catch up". > > Personally, I would just log each response time and feed it to gnuplot > or something. It should be pretty obvious whether or not the latencies > are due to periodic pauses. > > If you are concerned with eliminating or reducing outliers, I would: > > (1) Make sure that when you're benchmarking, that you're putting > Cassandra under a reasonable amount of load. Latency benchmarks are > usually useless if you're benchmarking against a saturated system. At > least, start by achieving your latency goals at 25% or less CPU usage, > and then go from there if you want to up it. > > (2) One can affect GC pauses, but it's non-trivial to eliminate the > problem completely. For example, the length of frequent young-gen > pauses can typically be decreased by decreasing the size of the young > generation, leading to more frequent shorter GC pauses. But that > instead causes more promotion into the old generation, which will > result in more frequent very long pauses (relative to normal; they > would still be infrequent relative to young gen pauses) - IF your > workload is such that you are suffering from fragmentation and > eventually seeing Cassandra fall back to full compacting GC:s > (stop-the-world) for the old generation. > > I would start by adjusting young gen so that your frequent pauses are > at an acceptable level, and then see whether or not you can sustain > that in terms of old-gen. > > Start with this in any case: Run Cassandra with -XX:+PrintGC > -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps > > -- > / Peter Schuller (@scode, http://worldmodscode.wordpress.com) -- Peter Fales Alcatel-Lucent Member of Technical Staff 1960 Lucent Lane Room: 9H-505 Naperville, IL 60566-7033 Email: [email protected] Phone: 630 979 8031
