On Wed, Jan 21, 2026 at 5:21 PM Yosry Ahmed <[email protected]> wrote: > > On Thu, Jan 15, 2026 at 03:21:40PM -0800, Jim Mattson wrote: > > When the vCPU is in guest mode with nested NPT enabled, guest accesses to > > IA32_PAT are redirected to the gPAT register, which is stored in > > vmcb02->save.g_pat. > > > > Non-guest accesses (e.g. from userspace) to IA32_PAT are always redirected > > to hPAT, which is stored in vcpu->arch.pat. > > > > This is architected behavior. It also makes it possible to restore a new > > checkpoint on an old kernel with reasonable semantics. After the restore, > > gPAT will be lost, and L2 will run on L1's PAT. Note that the old kernel > > would have always run L2 on L1's PAT. > > This creates a difference in MSR_IA32_CR_PAT handling with nested SVM > and nested VMX, right?
Correct. > AFAICT, reading MSR_IA32_CR_PAT while an L2 VMX guest is running will > read L2's PAT. With this change, the same scenario on SVM will read L1's > PAT. We can claim that it was always L1's PAT though, because we've > always been running L2 with L1's PAT. > > I am just raising this in case it's a problem to have different behavior > for SVM and VMX. I understand that we need to do this to be able to > save/restore L1's PAT with SVM in guest mode and maintain backward > compatibility. > > IIUC VMX does not have the same issue because host and guest PAT are > both in vmcs12 and are both saved/restored appropriately. Correct. > > > > Signed-off-by: Jim Mattson <[email protected]> > > --- > > arch/x86/kvm/svm/svm.c | 31 ++++++++++++++++++++++++------- > > 1 file changed, 24 insertions(+), 7 deletions(-) > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > index 7041498a8091..3f8581adf0c1 100644 > > --- a/arch/x86/kvm/svm/svm.c > > +++ b/arch/x86/kvm/svm/svm.c > > @@ -2846,6 +2846,13 @@ static int svm_get_msr(struct kvm_vcpu *vcpu, struct > > msr_data *msr_info) > > case MSR_AMD64_DE_CFG: > > msr_info->data = svm->msr_decfg; > > break; > > + case MSR_IA32_CR_PAT: > > + if (!msr_info->host_initiated && is_guest_mode(vcpu) && > > + nested_npt_enabled(svm)) > > + msr_info->data = svm->vmcb->save.g_pat; /* gPAT */ > > + else > > + msr_info->data = vcpu->arch.pat; /* hPAT */ > > + break; > > default: > > return kvm_get_msr_common(vcpu, msr_info); > > } > > @@ -2929,14 +2936,24 @@ static int svm_set_msr(struct kvm_vcpu *vcpu, > > struct msr_data *msr) > > > > break; > > case MSR_IA32_CR_PAT: > > - ret = kvm_set_msr_common(vcpu, msr); > > - if (ret) > > - break; > > + if (!kvm_pat_valid(data)) > > + return 1; > > > > - svm->vmcb01.ptr->save.g_pat = data; > > This is a bug fix, L2 is now able to alter L1's PAT, right? Yes, L1 and L2 share a PAT today. The whole series is fixing that mistake. > > - if (is_guest_mode(vcpu)) > > - nested_vmcb02_compute_g_pat(svm); > > - vmcb_mark_dirty(svm->vmcb, VMCB_NPT); > > This looks like another bug fix, it seems like we'll update vmcb01 but > clear the clean bit in vmcb02 if we're in guest mode. I will split this out into a separate patch, since this is tangential to the rest of the series. > Probably worth calling these out (and CC:stable, Fixes:.., etc)? > > We probably need a comment here explaining the gPAT vs hPAT case, I > don't think it's super obvious why we only redirect L2's own accesses to > its PAT but not userspace's. Point taken. > > + if (!msr->host_initiated && is_guest_mode(vcpu) && > > + nested_npt_enabled(svm)) { > > + svm->vmcb->save.g_pat = data; /* gPAT */ > > + vmcb_mark_dirty(svm->vmcb, VMCB_NPT); > > + } else { > > + vcpu->arch.pat = data; /* hPAT */ > > Should we call kvm_set_msr_common() here instead of setting > vcpu->arch.pat? The kvm_pat_valid() call would be redundant but that > should be fine. My main concern is if kvm_set_msr_common() gains more > logic for MSR_IA32_CR_PAT that isn't reflected here. Probably unlikely > tho.. There are already three open-coded assignments to vcpu->arch.pat on the VMX side. Perhaps this should be avoided, but it's tangential to this series. > > + if (npt_enabled) { > > + svm->vmcb01.ptr->save.g_pat = data; > > + vmcb_mark_dirty(svm->vmcb01.ptr, VMCB_NPT); > > + if (is_guest_mode(vcpu)) { > > IIUC (with the fix you mentioned) this is because L1 and L2 share the > PAT if nested NPT is disabled, and if we're already in guest mode then > we also need to update vmcb02 (as it was already created based on vmcb01 > with the old PAT). Probably worth a comment. Noted. > > + svm->vmcb->save.g_pat = data; > > + vmcb_mark_dirty(svm->vmcb, VMCB_NPT); > > + } > > + } > > + } > > break; > > case MSR_IA32_SPEC_CTRL: > > if (!msr->host_initiated && > > -- > > 2.52.0.457.g6b5491de43-goog > >

