Hi everyone, thank you for your attention to this bug report. Michael,
1. No, lscpu in the L1 guest does not show the flags "tsc_reliable" and "constant_tsc". $ lscpu | grep tsc_reliable $ lscpu | grep constant_tsc $ cat /sys/devices/system/clocksource/clocksource0/current_clocksource hyperv_clocksource_tsc_page 2. Windows 10 Version 22H2 (OS Build 19045.6466) 3. Hyper-V: privilege flags low 0x2e7f, high 0x3b8030, ext 0x2, hints 0x24e24, misc 0xbed7b2 4. Yes, the laptop hibernates and then resumes. When the problem occurred, the laptop had gone through multiple hibernate and resume cycles. I haven't seen it happen after a full reboot before a hibernate/resume cycle. Thomas On Tue, Apr 7, 2026 at 11:37 AM Michael Kelley <[email protected]> wrote: > > From: Sean Christopherson <[email protected]> Sent: Tuesday, April 7, 2026 > 9:43 AM > > > > +Michael > > > > On Tue, Apr 07, 2026, Vitaly Kuznetsov wrote: > > > Thomas Lefebvre <[email protected]> writes: > > > > Under Hyper-V, raw RDTSC values are not consistent across vCPUs. > > > > The hypervisor corrects them only through the TSC page scale/offset. > > > > If pvclock_update_vm_gtod_copy() runs on CPU 0 and __get_kvmclock() > > > > later runs on CPU 1 where the raw TSC is lower, the unsigned > > > > subtraction wraps. > > > > > > > > > > According to the TLFS, reference TSC page is partition wide: > > > > > > "The hypervisor provides a partition-wide virtual reference TSC page > > > which is overlaid on the partition’s GPA space. A partition’s reference > > > time stamp counter page is accessed through the Reference TSC MSR." > > > > > > so if as you say RAW rdtsc value is inconsistent across vCPUs, I can > > > hardly see how we can use this time source at all, even without > > > KVM. scale/offset are the same for all vCPUs. > > > > > > I think the fix here is to avoid setting up Hyper-V TSC page clocksource > > > in L1. Unfortunately, with unsynchronized TSCs this will leave us the > > > only choice for a sane clocksource: raw HV_X64_MSR_TIME_REF_COUNT MSR > > > reads. > > > > This feels like either a Hyper-V bug or a Linux-as-a-guest bug. For > > "Reference > > Counter"[1]: > > > > The hypervisor maintains a per-partition reference time counter. It has > > the > > characteristic that successive accesses to it return strictly > > monotonically > > increasing (time) values as seen by any and all virtual processors of a > > partition. Furthermore, the reference counter is rate constant and > > unaffected > > by processor or bus speed transitions or deep processor power savings > > states. A > > partition’s reference time counter is initialized to zero when the > > partition is > > created. The reference counter for all partitions count at the same rate, > > but > > at any time, their absolute values will typically differ because > > partitions > > will have different creation times. > > > > The reference counter continues to count up as long as at least one > > virtual > > processor is not explicitly suspended. > > > > > > And then "Partition Reference Time Enlightenment"[2]: > > > > The partition reference time enlightenment presents a reference time > > source to > > a partition which does not require an intercept into the hypervisor. This > > enlightenment is available only when the underlying platform provides > > support > > of an invariant processor Time Stamp Counter (TSC), or iTSC. In such > > platforms, > > the processor TSC frequency remains constant irrespective of changes in > > the > > processor’s clock frequency due to the use of power management states > > such as > > ACPI processor performance states, processor idle sleep states (ACPI > > C-states), > > etc. > > > > The partition reference time enlightenment uses a virtual TSC value, an > > offset > > and a multiplier to enable a guest partition to compute the normalized > > reference time since partition creation, in 100nS units. The mechanism > > also > > allows a guest partition to atomically compute the reference time when the > > guest partition is migrated to a platform with a different TSC rate, and > > provides a fallback mechanism to support migration to platforms without > > the > > constant rate TSC feature. > > > > My read of "Partition Reference Time Enlightenment" is that it should only > > be > > advertised if the TSC is synchronized and constant. I can't figure out > > where > > that feature is actually advertised though, because IIUC it's not the same > > as > > HV_ACCESS_TSC_INVARIANT, which says that the virtual TSC is guaranteed to be > > invariant even across live migration. And it's not > > HV_MSR_REFERENCE_TSC_AVAILABLE, > > because I'm pretty sure that just says HV_MSR_REFERENCE_TSC is available. > > > > Michael, help? > > > > [1] > > https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#reference-counter > > [2] > > https://learn.microsoft.com/en-us/virtualization/hyper-v-on-windows/tlfs/timers#partition-reference-time-enlightenment > > Yes, TSC page enlightenment is per VM, so it does not compensate for > discrepancies in raw TSC values across physical CPUs. RDTSC in a > Hyper-V VM is executed directly by the hardware (i.e., does not trap to > the hypervisor), so there's no opportunity for the hypervisor to compensate > for discrepancies. The hypervisor is expected to present a VM with TSCs > that are already synchronized. I'll need to double-check, but I don't think > Linux guests on Hyper-V run their own TSC synchronization. > > The relevant Hyper-V flags are: > * HV_MSR_TIME_REF_COUNT_AVAILABLE: The synthetic MSR for reading > the partition reference time is available. > * HV_MSR_REFERENCE_TSC_AVAILABLE: The partition reference time > enlightenment (i.e., "the TSC page") is available as a faster way to read > the reference counter. > * HV_ACCESS_TSC_INVARIANT: As Sean said, this says the hardware and > Hyper-V support TSC scaling, so live migration can be done across hosts > without the guest seeing a change in TSC frequency. > > Yes, this does feel like an issue where Hyper-V is not presenting the guest > with TSCs that are already synchronized. But I'm not aware of having seen > such a problem before. I'll try to imagine a scenario where a problem like > this could happen via some other path. > > @Thomas Lefebvre: Let me double-check a few things via these follow-up > questions/actions: > > 1. You said the clocksource is hyperv_clocksource_tsc_page. Just to > confirm, that's for the L1 guest, right? Does the output of the "lscpu" > command in the L1 guest show the flags "tsc_reliable" and "constant_tsc"? > I'm assume "no", since if these flags were set, the clocksource (i.e., > /sys/devices/system/clocksource/clocksource0/current_clocksource) > should be the standard "tsc". I've got a laptop with a i7-13700H processor, > and my L1 VMs show "tsc" as the clocksource, but I haven't been running > KVM with L2 nested VMs. > > 2. What is the version of Windows/Hyper-V you are running? Get the > output of the "winver.exe" command. It should be something like this: > > Windows 11 [as the top banner] > Version 25H2 (OS Build 26200.8037) > > 3. In the dmesg output of your L1 VM, find the line like this one and reply > with what you have: > > Hyper-V: privilege flags low 0xae7f, high 0x3b8030, hints 0x9a4e24, misc > 0xe0bed7b2 > > From there, I can decode the Hyper-V settings and see if anything jumps out > as anomalous. > > 4. Does the laptop where you are seeing this problem ever hibernate and > then resume? If so, do you recall if the problem occurs after a full reboot > but > before it ever does a hibernate/resume cycle? > > Michael

