On Mon, 2026-05-18 at 00:52 -0700, Dongli Zhang wrote: > On 5/9/26 3:46 PM, David Woodhouse wrote:
Huh, I didn't write that then; it isn't September yet. Did you mean 2026-05-09? We aren't all in the US... Strictly speaking, you just misattributed a quote of mine, which is very poor form :) What mailer are you using? Can it be fixed? > > From: Jack Allister <[email protected]> > > > > Where kvm->arch.use_master_clock is false (because the host TSC is > > unreliable, or the guest TSCs are configured strangely), the KVM clock > > is *not* defined as a function of the guest TSC so KVM_GET_CLOCK_GUEST > > returns an error. In this case, as documented, userspace shall use the > > legacy KVM_GET_CLOCK ioctl. The loss of precision is acceptable in this > > The description here confused me a little. It sounds like userspace should > call > KVM_SET_CLOCK if KVM_SET_CLOCK_GUEST fails. However, I assume it actually > means > that userspace should do nothing extra if KVM_SET_CLOCK_GUEST fails, and > simply > rely on the prior KVM_SET_CLOCK and KVM_VCPU_TSC_OFFSET workflow described in > patch 07. Is that correct? Yes. If KVM_SET_CLOCK_GUEST doesn't work (which might be because KVM_GET_CLOCK_GUEST didn't work so userspace doesn't have the data in the first place, or because the actual ioctl returns failure), then userspace should rely on the old method using KVM_SET_CLOCK imprecisely instead. That includes on a migration from an older kernel that *lacks* KVM_GET_CLOCK_GUEST, of course. I don't think it strictly matters whether userspace does KVM_SET_CLOCK first, then *tries* KVM_SET_CLOCK_GUEST, or whether it tries KVM_SET_CLOCK_GUEST and then only calls KVM_SET_CLOCK on failure? I'd probably be inclined not to use KVM_SET_CLOCK at all unless it is known to be needed? > > +4.145 KVM_GET_CLOCK_GUEST > > +---------------------------- > > + > > +:Capability: none > > +:Architectures: x86_64 > > +:Type: vcpu ioctl > > +:Parameters: struct pvclock_vcpu_time_info (out) > > +:Returns: 0 on success, <0 on error > > + > > +Retrieves the current time information structure used for KVM/PV clocks, > > +in precisely the form advertised to the guest vCPU, which gives parameters > > +for a direct conversion from a guest TSC value to nanoseconds. > > + > > +When the KVM clock is not in "master clock" mode, for example because the > > +host TSC is unreliable or the guest TSCs are oddly configured, the KVM > > clock > > +is actually defined by the host CLOCK_MONOTONIC_RAW instead of the guest > > TSC. > > +In this case, the KVM_GET_CLOCK_GUEST ioctl returns -EINVAL. > > + > > +4.146 KVM_SET_CLOCK_GUEST > > +---------------------------- > > + > > +:Capability: none > > Do we need a KVM_CHECK_EXTENSION capability for this? If userspace wants to > support the new API, should it detect availability via KVM_CHECK_EXTENSION, or > simply try the ioctl and handle failure? That might be conventional, I suppose. But I suspect Jack's thinking was that userspace is going to have to *try* it anyway, and still might have to fall back to what KVM_SET_CLOCK can manage, so userspace probably wouldn't even bother to check that capability; it doesn't matter. Since then, we've added some more attributes in this series though, and it probably is worth adding a cap which advertises them *all*? Something like KVM_CAP_CLOCK_PRECISION_API? > > +#ifdef CONFIG_X86_64 > > +static int kvm_vcpu_ioctl_get_clock_guest(struct kvm_vcpu *v, void __user > > *argp) > > +{ > > + struct pvclock_vcpu_time_info hv_clock = {}; > > + struct kvm_vcpu_arch *vcpu = &v->arch; > > + struct kvm_arch *ka = &v->kvm->arch; > > + unsigned int seq; > > + > > + /* > > + * If KVM_REQ_CLOCK_UPDATE is already pending, or if the pvclock > > + * has never been generated at all, call kvm_guest_time_update(). > > + */ > > + if (kvm_check_request(KVM_REQ_CLOCK_UPDATE, v) || !vcpu->hw_tsc_hz) { > > This was flagged by AI, and I am still checking whether it is a real issue. > > What happens if KVM_REQ_MASTERCLOCK_UPDATE and KVM_REQ_CLOCK_UPDATE are both > pending? > > From my perspective, I am also curious how we should reason about this in > other > scenarios in the future. Specifically, when do we need to process > KVM_REQ_MASTERCLOCK_UPDATE before KVM_REQ_CLOCK_UPDATE, and when is it > acceptable not to? I noticed that kvm_cpuid() already processes only > KVM_REQ_CLOCK_UPDATE. The way I've been thinking about it — and I'm only two cups of coffee into Monday so take those words literally and don't think of them as British understatement of something I believe is absolute truth — is that MASTERCLOCK_UPDATE is updating the actual clock for the whole VM, while CLOCK_UPDATE is about *putting* that information into the per- vCPU pvclock structures. So after a MASTERCLOCK_UPDATE, we need to do a CLOCK_UPDATE on all vCPUs to disseminate the result. Which means that if CLOCK_UPDATE is already pending before a MASTERCLOCK_UPDATE, it's probably redundant and might as well be cleared because it's only going to get set *again* in kvm_end_pvclock_update()? > > + /* > > + * Calculate the guest TSC at the new reference point, and the > > + * corresponding KVM clock value according to user_hv_clock. > > + * Adjust kvmclock_offset so both definitions agree. > > + */ > > + guest_tsc = kvm_read_l1_tsc(v, ka->master_cycle_now); > > + user_clk_ns = __pvclock_read_cycles(&user_hv_clock, guest_tsc); > > + ka->kvmclock_offset = user_clk_ns - ka->master_kernel_ns; > > I used to explore adjusting ka->kvmclock_offset in KVM_SET_CLOCK based on the > old hv_clock and the new hv_clock long time ago. At that time, my concern was > what would happen if userspace provided bogus values. Theoretically, this is > possible with any ioctl. My concern may be unnecessary. > > Would it be helpful to validate that the delta is within a reasonable range, > e.g. that the drift can never be more than five minutes (forward or backward)? Setting confidential guests aside, which have their own way of trusting the TSC and should never even *consider* using kvmclock, surely this is supposed to be *entirely* under the control of the VMM? The kernel has no business deciding what is 'bogus'? If a guest has been running for months on a previous host and is migrated to a new host, don't we expect that the KVM clock of the new VM on the new host is tweaked from its default near-zero after creation, to some large amount?
smime.p7s
Description: S/MIME cryptographic signature

