Hi,

Recently I moved some FreeBSD systems to VPS provider TransIP, who use Linux-KVM based virtualization.

Initial performance was surprisingly bad, and the CPU graphs very were spikey with as much system time being spent as user time.

Via PostgreSQL I ended up trying out pg_test_timing which reported the following for the default timecounter (HPET):

(note choices are: kern.timecounter.choice: i8254(0) ACPI-fast(900) HPET(950) TSC-low(-100) dummy(-1000000))

Testing timing overhead for 3 seconds.
Per loop time including overhead: 6481.08 ns
Histogram of timing durations:
  < us   % of total      count
     1      0.00000          0
     2      0.00000          0
     4      0.00000          0
     8     88.79165     411005
    16      9.53451      44134
    32      1.03848       4807
    64      0.49796       2305
   128      0.10370        480
   256      0.02981        138
   512      0.00259         12
  1024      0.00086          4
  2048      0.00022          1
  4096      0.00000          0
  8192      0.00000          0
 16384      0.00022          1

With the other timecounter choices of i8245 and ACPI-fast the result look like the above, no results under 4us.

Only with TSC-low does it look like:

Testing timing overhead for 3 seconds.
Per loop time including overhead: 41.22 ns
Histogram of timing durations:
  < us   % of total      count
     1     95.97088   69846421
     2      4.02214    2927264
     4      0.00136        988
     8      0.00288       2096
    16      0.00132        958
    32      0.00074        542
    64      0.00047        345
   128      0.00016        114
   256      0.00004         29
   512      0.00000          3
  1024      0.00000          2
  2048      0.00000          3

and indeed the CPU graphs cleaned up completely with much lower CPU averages and no excessive system CPU time after switching to TSC-low. Webserver and database response times dropped as well (at least according to their own reporting). To rule out this being just a symptom of timekeeping: the providers own CPU graphs (so from the outside of the VPS as a whole) also show this VPS to consume roughly half the CPU it does with TSC-low compared to the other options, and you can tell the difference right away when changing the kern.timecounter.hardware sysctl.

The main problem however is that the system clock now keeps time atrociously badly. Chrony with the most aggressive settings barely manages to keep the time and the CPU graphs now show regular gaps where the system time jumped because of a correction. It looks very sloppy to the users if the recorded times of their actions/files are not correct.

This is all on a 6 core system with lots of threads and churn and short lived apps coming and going. A 4-core database system, with a stable number of threads and processes, running in the same virtualization environment, doesn't really have either of these problems, that is, CPU usage wasn't that spikey or system CPU usage that high even with HPET, and the time doesn't drift as much either with TSC-low.

I figured this is a virtualization question as these kinds of symptoms are probably generic. What is the host doing?

Additional information from within the guest:

hw.machine: amd64
hw.model: Westmere E56xx/L56xx/X56xx (Nehalem-C)
hw.ncpu: 6
hw.hv_vendor: KVMKVMKVM
hw.clockrate: 2593

(has 24GB memory)

(They do perform live migrations so I don't know what the real underlying hardware is but probably similar, it's pretty stale at this point)

I wonder if anyone could talk a bit about what might be going on.

Thank you,
Dennis













_______________________________________________
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Reply via email to