Component: Xen Hypervisor (x86 / time.c)
Versions affected: Potential in 4.17-4.21 and unstable (tested on 4.18
with high vCPU density)
Description:
In high-load scenarios (24+ cores, heavy Dom0 load, and frequent VM
pauses via DRAKVUF/VMI), Windows guests experience Desktop Window
Manager (DWM.exe) crashes with error 0x8898009b.
The root cause is an integer memory overflow in the time scaling logic,
in case if the time calibration occurs simultaneously with a snapshot
reversion or RDTSC(P) instruction emulation.
Technical Analysis:
The get_s_time_fixed function in (xen/arch/x86/time.c) accepts at_tsc as
an argument. If it is less than local_tsc, a negative delta will be
produced, which will be incorrectly handled in scale_delta (Or, if
at_tsc is zero, a race condition may occur after receiving ticks via
rdtsc_ordered, time calibration will occur, and local_tsc may become
larger than the tick values). This will result in an extremely large
number instead of a backward offset. This is guaranteed to be
reproducible in hvm_load_cpu_ctxt (xen/arch/x86/hvm/hvm.c), as sync_tsc
will be less than local_tsc after time calibration. This can also
potentially occur during RDTSC(P) emulation simultaneously with
time_calibration_rendezvous_tail (xen/arch/x86/time.c).
Windows DWM, sensitive to QueryPerformanceCounter jumps, fails
catastrophically when it receives an essentially infinite timestamp delta.
Steps to Reproduce:
Setup a host with a high core count (e.g., 24+ cores).
Run a high density of Windows 10 DomUs (20 domains with 4 vcpus
each).
Apply heavy load on Dom0 (e.g., DRAKVUF monitoring).
Frequently pause/resume or revert snapshots of the DomUs.
Observe dwm.exe crashes in Guests with
MILERR_QPC_TIME_WENT_BACKWARD (0x8898009b).
Currently, the lack of sign-awareness in the delta scaling path allows a
nanosecond-scale race condition to turn into a multi-millennium time jump.
Environment:
CPU: 24 cores (Intel Xeon with Invariant TSC)
Dom0: High vCPU count (24)
Feature: tsc_mode="always_emulate",
*timer_mode="**no_delay_for_missed_ticks**"*
Guest: Windows 10/11 with tsc as time source