On 5/26/26 15:57, David Woodhouse wrote:
In 2012, as part of implementing the "master clock" mode for kvmclock,
Marcelo added kvm_get_time_and_clockread() in commit d828199e8444
("KVM: x86: implement PVCLOCK_TSC_STABLE_BIT pvclock flag").

In 2016, Christopher Hall added the generic ktime_get_snapshot() in
commit 9da0f49c8767 ("time: Add timekeeping snapshot code capturing
system time and counter"), which provides the same paired read of
{ time, counter } through the core timekeeping code.

Then in 2018, Vitaly Kuznetsov added Hyper-V TSC page support in
commit b0c39dc68e3b ("x86/kvm: Pass stable clocksource to guests when
running nested on Hyper-V"), which extended vgettsc() to handle the
HVCLOCK case.

I'd quite like to kill it all with fire and make KVM use
ktime_get_snapshot() instead.

However, to correlate with the TSC provided to guests, KVM needs the
underlying host TSC counter value, *not* the cycles count from the
hyperv_clocksource_tsc_page clocksource which is scaled to 10MHz.

If we wanted to support master clock mode while nesting under KVM and
bizarrely using the kvmclock for system timing, we'd have the same
problem with the kvmclock clocksource, which similarly scales to 1GHz.

One option is to say "Don't Do That Then™": if you want to provide a
masterclock kvmclock to guests then *don't* use the silly pvclocks for
your own kernel's timekeeping, use the damn TSC. Because if the TSC
*isn't* reliable then you can't do masterclock mode for your guests
anyway.

Perhaps that should have been the response when commit b0c39dc68e3b was
submitted, but I guess we're stuck supporting that mode now. But I
really do want to kill the KVM hacks and use ktime_get_snapshot().

Reverse-engineering the original TSC reading from the clocksource
counter value doesn't look sane, without a loss of precision and/or
128-bit division.

One simple option that occurs to me would be to add a 'cycles_raw'
value to the system_time_snapshot, for PV clocksources like hyperv and
kvmclock to populate with the original TSC reading.

That might actually let us clean up some of the PTP code that currently
has to deal with TSC vs. kvmclock in counter snapshots too. I think I
could kill the use of get_cycles() in vmclock for the kvmclock case,
which might make Thomas happy...

Yeah, when reading I was thinking of PTP as well.  Seems worthwhile.

Paolo


Reply via email to