On Sat, 17 Jun 2006, Danial Thom wrote:
At some point you're going to have to figure out that there's a reason that
every time anyone other than you tests FreeBSD it completely pigs out.
Sqeezing out some extra bytes in netperf isn't "performance". Performance is
everything that a system can do. If you're eating 10% more cpu to get a few
more bytes in netperf, you haven't increased the performance of the system.
This test wasn't netperf, it was a 32-process web server and a 32-process
client, doing sendfile on UFS-backed data files. It was definitely a potted
benchmark, in that it omits some of the behaviors of web servers (dynamic
content, significantly variable data set, etc), but is intended to be more
than a simple micro-benchmark involving two sockets and packet blasting.
Specifically, it was intended to validate whether or not there were
immediately observable changes in TCP behavior based on adjusting HZ under
load. The answer was a qualified yes: there was a small but noticeable
negative affect on high load web serving in the test environment by reducing
HZ, likely due to to reduced timer accuracy. Specifically: simply frobbing HZ
isn't a strategy that necessarily results in a performance improvement.
You need to do things like run 2 benchmarks at once. What happens to the
"performance" of one benchmark when you increase the "performance" of the
other? Run a database benchmark while you're running a network benchmark, or
while you're passing a controlled stream of traffic through the box.
The point of this exercise was to demonstrate the complexity of the issue of
adjusting HZ, and to suggest that simply changing the value in the further
absense of evidence could have negative effects, and that we might want to
investigate a more mature middle ground, such as a modified timer
architecture. I'm sorry if that conclusion wasn't clear from my e-mail.
I'd also love to see the results of the exact same test with only 1 cpu
enabled, to see how well you scale generally. I'm astounded that no-one ever
seems to post 1 vs 2 cpu performance, which is the entire point of SMP.
Single CPU results were included in my e-mail. There are actually a couple of
other variations of interest you want to measure in more general benchmarking
exercises:
- Kernel compiled without any SMP support. Specifically, without lock
prefixes on atomic instructions.
- Kernel compiled with SMP support, but with use of additional CPUs disabled.
- Kernel compiled with SMP support, and with varying numbers of CPUs enabled.
The first two cases are important, because they help identify the difference
between the general overhead of compiling in locked instructions (and related
issues), and the overheads associated with contention, caches, inter-CPU IPI
traffic, scheduling, etc. By failing to compare the top to cases, it might be
easy to conclude that a performance improve is due to the additional cost of
atomic instructions, whereas in reality it may be the result of a poor
scheduling decision, or of data unnecessarily cache missing in both CPUsrather
than one because processing of the data is split poorly over available CPUs.
Robert N M Watson
Computer Laboratory
University of Cambridge
_______________________________________________
freebsd-performance@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-performance
To unsubscribe, send any mail to "[EMAIL PROTECTED]"