Timecounter problem as a guest

Dennis Noordsij Tue, 23 Jul 2019 04:46:09 -0700

Hi,

Recently I moved some FreeBSD systems to VPS provider TransIP, who useLinux-KVM based virtualization.

Initial performance was surprisingly bad, and the CPU graphs very werespikey with as much system time being spent as user time.

Via PostgreSQL I ended up trying out pg_test_timing which reported thefollowing for the default timecounter (HPET):

(note choices are: kern.timecounter.choice: i8254(0) ACPI-fast(900)HPET(950) TSC-low(-100) dummy(-1000000))


Testing timing overhead for 3 seconds.
Per loop time including overhead: 6481.08 ns
Histogram of timing durations:
  < us   % of total      count
     1      0.00000          0
     2      0.00000          0
     4      0.00000          0
     8     88.79165     411005
    16      9.53451      44134
    32      1.03848       4807
    64      0.49796       2305
   128      0.10370        480
   256      0.02981        138
   512      0.00259         12
  1024      0.00086          4
  2048      0.00022          1
  4096      0.00000          0
  8192      0.00000          0
 16384      0.00022          1

With the other timecounter choices of i8245 and ACPI-fast the resultlook like the above, no results under 4us.


Only with TSC-low does it look like:

Testing timing overhead for 3 seconds.
Per loop time including overhead: 41.22 ns
Histogram of timing durations:
  < us   % of total      count
     1     95.97088   69846421
     2      4.02214    2927264
     4      0.00136        988
     8      0.00288       2096
    16      0.00132        958
    32      0.00074        542
    64      0.00047        345
   128      0.00016        114
   256      0.00004         29
   512      0.00000          3
  1024      0.00000          2
  2048      0.00000          3

and indeed the CPU graphs cleaned up completely with much lower CPUaverages and no excessive system CPU time after switching to TSC-low.Webserver and database response times dropped as well (at leastaccording to their own reporting). To rule out this being just a symptomof timekeeping: the providers own CPU graphs (so from the outside of theVPS as a whole) also show this VPS to consume roughly half the CPU itdoes with TSC-low compared to the other options, and you can tell thedifference right away when changing the kern.timecounter.hardware sysctl.

The main problem however is that the system clock now keeps timeatrociously badly. Chrony with the most aggressive settings barelymanages to keep the time and the CPU graphs now show regular gaps wherethe system time jumped because of a correction. It looks very sloppy tothe users if the recorded times of their actions/files are not correct.

This is all on a 6 core system with lots of threads and churn and shortlived apps coming and going. A 4-core database system, with a stablenumber of threads and processes, running in the same virtualizationenvironment, doesn't really have either of these problems, that is, CPUusage wasn't that spikey or system CPU usage that high even with HPET,and the time doesn't drift as much either with TSC-low.

I figured this is a virtualization question as these kinds of symptomsare probably generic. What is the host doing?


Additional information from within the guest:

hw.machine: amd64
hw.model: Westmere E56xx/L56xx/X56xx (Nehalem-C)
hw.ncpu: 6
hw.hv_vendor: KVMKVMKVM
hw.clockrate: 2593

(has 24GB memory)

(They do perform live migrations so I don't know what the realunderlying hardware is but probably similar, it's pretty stale at thispoint)


I wonder if anyone could talk a bit about what might be going on.

Thank you,
Dennis













_______________________________________________
freebsd-virtualization@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-virtualization
To unsubscribe, send any mail to 
"freebsd-virtualization-unsubscr...@freebsd.org"

Timecounter problem as a guest

Reply via email to