On Wed, May 20, 2026, David Woodhouse wrote:
> On Wed, 2026-05-20 at 13:44 -0700, Sean Christopherson wrote:
> >
> > + /*
> > + * If the TSC counts at a constant frequency across P/T states,
> > counts
> > + * in deep C-states, and the TSC hasn't been marked unstable, treat
> > the
> > + * TSC reliable, as guaranteed by KVM. Note, the TSC unstable check
> > + * exists purely to honor the TSC being marked unstable via command
> > + * line, any runtime detection of an unstable will happen after
> > this.
> > + */
> > + if (boot_cpu_has(X86_FEATURE_CONSTANT_TSC) &&
> > + boot_cpu_has(X86_FEATURE_NONSTOP_TSC) &&
> > + !check_tsc_unstable())
> {
> > + tsc_properties = TSC_FREQ_KNOWN_AND_RELIABLE;
>
> kvmclock = 0; /* Why use it if the TSC works? The kvmclock exists
> *purely* to work around a TSC which *doesn't*
> have those properties checked above. */
kvmclock still provides SYSTEM_TIME and WALL_CLOCK :-/
> }
>
> I was going to say PVCLOCK_TSC_STABLE_BIT, and maybe we should check
> that *too* for paranoia?
No? PVCLOCK_TSC_STABLE is property of kvmclock more than it's a property of the
TSC itself. And for the no-kvmclock case, we most definitely don't want to
setup
kvmclock just to query that flag.
> But hopefully the checks you have above are equivalent?
They aren't as paranoid, but if the host enumerates CONSTANT+NONSTOP TSC despite
KVM-the-host not being able to advertise PVCLOCK_TSC_STABLE_BIT, then the VMM
done
messed up.
> > +
> > + kvm_tsc_khz_cpuid = kvm_para_tsc_khz();
> > +
> > + /*
> > + * If provided, use the TSC (and APIC bus) frequency provided in
> > KVM's
> > + * PV CPUID leaf even if kvmclock itself is disabled via command
> > line.
> > + * The PV CPUID information isn't dependent on kvmclock in any way,
> > and
> > + * in fact using the precise information is *more* important when
> > the
> > + * user has explicitly disabled kvmclock to force the kernel to use
> > the
> > + * TSC as its clocksource.
> > + */
> > + if (!kvmclock) {
> > + if (kvm_tsc_khz_cpuid)
> > + tsc_register_calibration_routines(kvm_get_tsc_khz,
> > + kvm_get_cpu_khz,
> > + tsc_properties);
> > + return;
> > + }
> > +
>
>
> Regardless of the above, why not just register these here
> unconditionally, and remove the later call that does the same?
Because if kvmclock=n, it's only safe to call kvm_get_tsc_khz() if
kvm_tsc_khz_cpuid
is non-zero, other wise the "else" path will hit a NULL pointer deref when
trying
to get the frequency from the PV clock struct:
return kvm_tsc_khz_cpuid ? : pvclock_tsc_khz(this_cpu_pvti());