Hi,
Recently I moved some FreeBSD systems to VPS provider TransIP, who use
Linux-KVM based virtualization.
Initial performance was surprisingly bad, and the CPU graphs very were
spikey with as much system time being spent as user time.
Via PostgreSQL I ended up trying out pg_test_timing which reported the
following for the default timecounter (HPET):
(note choices are: kern.timecounter.choice: i8254(0) ACPI-fast(900)
HPET(950) TSC-low(-100) dummy(-1000000))
Testing timing overhead for 3 seconds.
Per loop time including overhead: 6481.08 ns
Histogram of timing durations:
< us % of total count
1 0.00000 0
2 0.00000 0
4 0.00000 0
8 88.79165 411005
16 9.53451 44134
32 1.03848 4807
64 0.49796 2305
128 0.10370 480
256 0.02981 138
512 0.00259 12
1024 0.00086 4
2048 0.00022 1
4096 0.00000 0
8192 0.00000 0
16384 0.00022 1
With the other timecounter choices of i8245 and ACPI-fast the result
look like the above, no results under 4us.
Only with TSC-low does it look like:
Testing timing overhead for 3 seconds.
Per loop time including overhead: 41.22 ns
Histogram of timing durations:
< us % of total count
1 95.97088 69846421
2 4.02214 2927264
4 0.00136 988
8 0.00288 2096
16 0.00132 958
32 0.00074 542
64 0.00047 345
128 0.00016 114
256 0.00004 29
512 0.00000 3
1024 0.00000 2
2048 0.00000 3
and indeed the CPU graphs cleaned up completely with much lower CPU
averages and no excessive system CPU time after switching to TSC-low.
Webserver and database response times dropped as well (at least
according to their own reporting). To rule out this being just a symptom
of timekeeping: the providers own CPU graphs (so from the outside of the
VPS as a whole) also show this VPS to consume roughly half the CPU it
does with TSC-low compared to the other options, and you can tell the
difference right away when changing the kern.timecounter.hardware sysctl.
The main problem however is that the system clock now keeps time
atrociously badly. Chrony with the most aggressive settings barely
manages to keep the time and the CPU graphs now show regular gaps where
the system time jumped because of a correction. It looks very sloppy to
the users if the recorded times of their actions/files are not correct.
This is all on a 6 core system with lots of threads and churn and short
lived apps coming and going. A 4-core database system, with a stable
number of threads and processes, running in the same virtualization
environment, doesn't really have either of these problems, that is, CPU
usage wasn't that spikey or system CPU usage that high even with HPET,
and the time doesn't drift as much either with TSC-low.
I figured this is a virtualization question as these kinds of symptoms
are probably generic. What is the host doing?
Additional information from within the guest:
hw.machine: amd64
hw.model: Westmere E56xx/L56xx/X56xx (Nehalem-C)
hw.ncpu: 6
hw.hv_vendor: KVMKVMKVM
hw.clockrate: 2593
(has 24GB memory)
(They do perform live migrations so I don't know what the real
underlying hardware is but probably similar, it's pretty stale at this
point)
I wonder if anyone could talk a bit about what might be going on.
Thank you,
Dennis
_______________________________________________
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to
"freebsd-virtualization-unsubscr...@freebsd.org"