Folks, I've been running a Debian Xen server for a while now. It was upgraded to Stretch upon its release and has (I think) ran like that since. It's been regularly updated but never rebooted until this weekend which saw it go from a 4.9.0-3 kernel to 4.9.0-6 and what looks from the apt history like a few upgrades to the Hypervisor.
Long story short, having ran without issue for around 330 days, following the reboot, I noticed that the time on the Dom0 and DomUs was drifting. After investigating, I noticed that the NTP daemon was rejecting all of its upstream servers as they were failing sanity. After a restart and resync of the NTP daemon, the same thing was happening, after about five minutes, all of the servers would fail sanity and get rejected. Upon further investigation, I noticed that the system time was drifting away from the hardware clock at a fairly alarming rate. It was something like 15 minutes a day. I did take a look at /sys/devices/system/clocksource/clocksource0/current_clocksource which was listing tsc. I understand that can be a little flaky with vCPUS (although had worked for the past 330 days), so I changed it to xen. This made no difference at all. Further investigations led me to: xl dmesg | grep time which revealed: (XEN) Platform timer is 14.318MHz HPET but again, something which has worked perfectly well for 330 days. In the end I used adjtimex -t 10065 -f 3058784 to speed the clock up a bit and that has allowed NTP to be able to keep it under control. So the issue with the slow clock appears to have been compenstated for but I'd be intersted to know what actually caused it. I've drawn a bit of a blank on that one and would be grateful if anyone could offer any thoughts? Mike.
signature.asc
Description: PGP signature