Hello!
I've done simple (yet, I hope, reality-reflecting) performance benchmarking
different STABLE branches (4 vs 5 vs 6) using the following hardware:
CPU: Pentium II/Pentium II Xeon/Celeron (334.09-MHz 686-class CPU)
Origin = "GenuineIntel" Id = 0x665 Stepping = 5
Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PA
T,PSE36,MMX,FXSR>
real memory = 134152192 (127 MB)
...
rl0: <RealTek 8139 10/100BaseTX> port 0xe800-0xe8ff mem 0xdc101000-0xdc1010ff
irq 5 at device 20.0 on pci0
...
fxp0: <Intel 82559 Pro/100 Ethernet> port 0xe400-0xe43f mem
0xdc100000-0xdc100fff,0xdc000000-0xdc0fffff irq 7 at device 19.0 on pci0
...
ad0: 76351MB <SAMSUNG SP0802N TK100-24> at ata0-master UDMA33
and just restoring precompiled 4/5/6-STABLE to the same HDD partition. I've
used the following kernel config for 4-STABLE:
ident TEST
machine i386
maxusers 32
makeoptions CONF_CFLAGS=-fno-builtin
makeoptions DEBUG=-g
options INCLUDE_CONFIG_FILE
cpu I686_CPU
options COMPAT_43
options USER_LDT
options SYSVSHM
options SYSVSEM
options SYSVMSG
options INVARIANTS
options INVARIANT_SUPPORT
options USERCONFIG
options INET
options FAST_IPSEC
options IPSEC_FILTERGIF
pseudo-device ether
pseudo-device vlan 1
pseudo-device loop
pseudo-device bpf
pseudo-device ppp 8
options PPP_BSDCOMP
options PPP_DEFLATE
options PPP_FILTER
options IPFIREWALL
options IPFW2
options IPFIREWALL_VERBOSE
options IPFIREWALL_VERBOSE_LIMIT=100
options IPFIREWALL_FORWARD
options IPDIVERT
options IPSTEALTH
options ICMP_BANDLIM
options DUMMYNET
options FFS
options FFS_ROOT
options SOFTUPDATES
options QUOTA
options P1003_1B
options _KPOSIX_PRIORITY_SCHEDULING
options _KPOSIX_VERSION=199309L
pseudo-device pty
pseudo-device crypto
device isa
device atkbdc0 at isa? port IO_KBD
device atkbd0 at atkbdc? irq 1
device psm0 at atkbdc? irq 12
device vga0 at isa?
pseudo-device splash
device sc0 at isa?
options SC_HISTORY_SIZE=1000
options SC_TWOBUTTON_MOUSE
device npx0 at nexus? port IO_NPX flags 0x0 irq 13
device ata
device atadisk
options ATA_STATIC_ID
device fdc0 at isa? port IO_FD1 irq 6 drq 2
device fd0 at fdc0 drive 0
device fd1 at fdc0 drive 1
device sio0 at isa? port IO_COM1 irq 4
device sio1 at isa? port IO_COM2 irq 3
device pci
and slightly modified it for 5/6-STABLE, here is the diff ("<" = 4-only
option, ">" - 5/6-only):
options SCHED_4BSD
< options USER_LDT
< options USERCONFIG
< pseudo-device ether
< pseudo-device vlan 1
< pseudo-device loop
< pseudo-device bpf
< pseudo-device ppp 8
device ether
device loop
device bpf
< options IPFW2
options IPFIREWALL_FORWARD_EXTENDED
< options ICMP_BANDLIM
< options FFS_ROOT
< options P1003_1B
< options _KPOSIX_VERSION=199309L
< pseudo-device pty
< pseudo-device crypto
device pty
device crypto
< device atkbdc0 at isa? port IO_KBD
< device atkbd0 at atkbdc? irq 1
< device psm0 at atkbdc? irq 12
< device vga0 at isa?
< pseudo-device splash
< device sc0 at isa?
---
device atkbdc
device atkbd
device psm
options KBD_INSTALL_CDEV
device vga
device splash
device sc
< device npx0 at nexus? port IO_NPX flags 0x0 irq 13
device npx
< device fdc0 at isa? port IO_FD1 irq 6 drq 2
< device fd0 at fdc0 drive 0
< device fd1 at fdc0 drive 1
< device sio0 at isa? port IO_COM1 irq 4
< device sio1 at isa? port IO_COM2 irq 3
Also I've set kern.hz="100" in /boot/loader.conf for every system.
I've effectively excluded ipfw from the game by using
'add 1 pass all from any to any' rule. I hope, I've compared apples with
apples this way.
For every x-STABLE, I've received large ISO image via FTP in binary mode
twice: using rl NIC and using fxp one, both in 10baseT mode (got approx. 1
Mbyte/s transfer rate). I've noted CPU utilization which gave "systat -vm 1"
once numbers have stabilized. Here are the results (average numbers, %User
and %Nice are close to zero):
%Sys %Intr %Idl
RELENG_4 + rl0 14 14 72
RELENG_4 + fxp0 14 10 76
RELENG_5 + rl0 40 30 30
RELENG_5 + fxp0 35 25 40
RELENG_6 + rl0 45 40 15
RELENG_6 + fxp0 45 35 20
I've tried to verify these numbers by running 'md5 -t' in parallel with
download and measuring wall time: "time md5 -t". Indeed, under RELENG_4
I've got 43 sec on wall clock time for this benchmark vs 2:01 for
RELENG_5 and 2:05 under RELENG_6 (I don't understand why difference is so low
between 5 and 6 here).
I would call these numbers discouraging. Actually such high CPU usage
during the relatively simple processing to HDD of _only_ 10 Mbit/s traffic
will surely prevent deployment of 6-STABLE on many not-very-powerful
production servers. Am I missing something simple regarding compile-time
or runtime optimization?
Sincerely, Dmitry
--
Atlantis ISP, System Administrator
e-mail: [EMAIL PROTECTED]
nic-hdl: LYNX-RIPE
_______________________________________________
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to "[EMAIL PROTECTED]"