On 22/01/2026 1:48 pm, Julian Vetter wrote:
> On 1/19/26 20:01, Andrew Cooper wrote:
>> On 19/01/2026 10:34 am, Julian Vetter wrote:
>>> On 1/15/26 4:50 PM, Andrew Cooper wrote:
>>>> On 15/01/2026 3:17 pm, Julian Vetter wrote:
>>>>> +{
>>>>> +    uint64_t misc_enable;
>>>>> +    uint32_t eax, ebx, ecx, edx;
>>>>> +
>>>>> +    if ( !boot_cpu_has(X86_FEATURE_NX) )
>>>>> +    {
>>>>> +        /* Intel: try to unhide NX by clearing XD_DISABLE */
>>>>> +        cpuid(0, &eax, &ebx, &ecx, &edx);
>>>>> +        if ( ebx == X86_VENDOR_INTEL_EBX &&
>>>>> +             ecx == X86_VENDOR_INTEL_ECX &&
>>>>> +             edx == X86_VENDOR_INTEL_EDX )
>>>>> +        {
>>>>> +            rdmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
>>>>> +            if ( misc_enable & MSR_IA32_MISC_ENABLE_XD_DISABLE )
>>>>> +            {
>>>>> +                misc_enable &= ~MSR_IA32_MISC_ENABLE_XD_DISABLE;
>>>>> +                wrmsrl(MSR_IA32_MISC_ENABLE, misc_enable);
>>>>> +
>>>>> +                /* Re-read CPUID after having cleared XD_DISABLE */
>>>>> +                boot_cpu_data.x86_capability[FEATURESET_e1d] = 
>>>>> cpuid_edx(0x80000001U);
>>>>> +
>>>>> +                /* Adjust misc_enable_off for secondary startup and 
>>>>> wakeup code */
>>>>> +                bootsym(trampoline_misc_enable_off) |= 
>>>>> MSR_IA32_MISC_ENABLE_XD_DISABLE;
>>>>> +                printk(KERN_INFO "re-enabled NX (Execute Disable) 
>>>>> protection\n");
>>>>> +            }
>>>>> +        }
>>>>> +        /* AMD: nothing we can do - NX must be enabled in BIOS */
>>>> The BIOS is only hiding the CPUID bit.  It's not blocking the use of NX.
>>> Yes, you're right.
>>>> You want to do a wrmsr_safe() trying to set EFER.NXE, and if it
>>>> succeeds, set the NX bit in MSR_K8_EXT_FEATURE_MASK to "unhide" it in
>>>> regular CPUID.  This is a little more tricky to arrange because it needs
>>>> doing on each CPU, not just the BSP.
>>> Ok, yes, I have modified the AMD side to use MSR_K8_EXT_FEATURE_MASK to
>>> "unhide" it.
>> Great.  And contrary to the other thread, this really must modify the
>> mask MSRs rather than use setup_force_cpu_cap(), because we still need
>> it to be visible to PV guest kernels which can't see Xen's choice of
>> setup_force_cpu_cap().
>>
>>>>> +    }
>>>>> +
>>>>> +    /* Enable EFER.NXE only if NX is available */
>>>>> +    if ( boot_cpu_has(X86_FEATURE_NX) )
>>>>> +    {
>>>>> +        if ( !(read_efer() & EFER_NXE) )
>>>>> +            write_efer(read_efer() | EFER_NXE);
>>>>> +
>>>>> +        /* Adjust trampoline_efer for secondary startup and wakeup code 
>>>>> */
>>>>> +        bootsym(trampoline_efer) |= EFER_NXE;
>>>>> +    }
>>>>> +
>>>>> +    if ( IS_ENABLED(CONFIG_REQUIRE_NX) && !boot_cpu_has(X86_FEATURE_NX) )
>>>>> +        panic("This build of Xen requires NX support\n");
>>>>> +}
>>>>> +
>>>>>    /* How much of the directmap is prebuilt at compile time. */
>>>>>    #define PREBUILT_MAP_LIMIT (1 << L2_PAGETABLE_SHIFT)
>>>>>    
>>>>> @@ -1159,6 +1203,8 @@ void asmlinkage __init noreturn __start_xen(void)
>>>>>        rdmsrl(MSR_EFER, this_cpu(efer));
>>>>>        asm volatile ( "mov %%cr4,%0" : "=r" (info->cr4) );
>>>>>    
>>>>> +    nx_init();
>>>>> +
>>>>>        /* Enable NMIs.  Our loader (e.g. Tboot) may have left them 
>>>>> disabled. */
>>>>>        enable_nmis();
>>>>>    
>>>> This is too early, as can be seen by the need to make a cpuid() call
>>>> rather than using boot_cpu_data.
>>>>
>>>> The cleanup I wanted to do was to create/rework early_cpu_init() to get
>>>> things in a better order, so the panic() could go at the end here.  The
>>>> current split we've got of early/regular CPU init was inherited from
>>>> Linux and can be collapsed substantially.
>>> I have tried to add the logic into the early_init_{intel,amd}()
>>> functions. But it seems this is already too late in the boot chain. This
>>> is why I put into an extra function which is called earlier. Because it
>>> seems there are already pages with PAGE_NX being used on the way to
>>> early_init_{intel,amd}(). Because when I put my code into
>>> early_init_intel I get a fault and a reboot. What do you suggest?
>> Have you got the backtrace available?
> Yes. Here it is. Although I saw before when enabling 
> 'CONFIG_MICROCODE_LOADING' it faults even earlier, somewhere in 
> 'find_cpio_data()', but with the same EC = 0x0009 (Protection violation, 
> Reserved bit violation).

That's to be expected.  bootstrap_map_bm() uses PAGE_HYPERVISOR which
has NX set in it.

>
> Xen 4.22-unstable
> (XEN) Xen version 4.22-unstable (julian@work) (gcc (Debian 15.2.0-12) 
> 15.2.0) debug=y Thu Jan 22 14:28:58 CET 2026
> (XEN) Latest ChangeSet: Tue Jan 13 16:50:12 2026 +0100 git:ce886ef641
> (XEN) build-id: 2e72a4b08fca3ae0f0ed9af0dd3a5de947a966d0
> (XEN) CPU Vendor: Intel, Family 6 (0x6), Model 55 (0x37), Stepping 8 
> (raw 00030678)
> (XEN) BSP microcode revision: 0x00000836
> (XEN) Bootloader: GRUB 2.12
> (XEN) Command line: dom0_mem=1232M,max:1232M watchdog ucode=scan 
> dom0_max_vcpus=1-1 com1=115200,8n1 console=com1
> (XEN) Xen image load base address: 0xb5800000
> (XEN) Video information:
> (XEN)  VGA is graphics mode 800x600, 32 bpp
> (XEN) Disc information:
> (XEN)  Found 0 MBR signatures
> (XEN)  Found 1 EDD information structures
> (XEN) EFI RAM map:
> (XEN)  [0000000000000000, 000000000003efff] (usable)
> (XEN)  [000000000003f000, 000000000003ffff] (ACPI NVS)
> (XEN)  [0000000000040000, 000000000009ffff] (usable)
> (XEN)  [0000000000100000, 000000001effffff] (usable)
> (XEN)  [000000001f000000, 000000001f0fffff] (reserved)
> (XEN)  [000000001f100000, 000000001fffffff] (usable)
> (XEN)  [0000000020000000, 00000000200fffff] (reserved)
> (XEN)  [0000000020100000, 00000000b9377fff] (usable)
> (XEN)  [00000000b9378000, 00000000b93a7fff] (reserved)
> (XEN)  [00000000b93a8000, 00000000b94bdfff] (usable)
> (XEN)  [00000000b94be000, 00000000b98d6fff] (ACPI NVS)
> (XEN)  [00000000b98d7000, 00000000b9bb0fff] (reserved)
> (XEN)  [00000000b9bb1000, 00000000b9bb1fff] (usable)
> (XEN)  [00000000b9bb2000, 00000000b9bf3fff] (reserved)
> (XEN)  [00000000b9bf4000, 00000000b9d6dfff] (usable)
> (XEN)  [00000000b9d6e000, 00000000b9ff9fff] (reserved)
> (XEN)  [00000000b9ffa000, 00000000b9ffffff] (usable)
> (XEN)  [00000000e00f8000, 00000000e00f8fff] (reserved)
> (XEN)  [00000000fed01000, 00000000fed01fff] (reserved)
> (XEN)  [00000000fed08000, 00000000fed08fff] (reserved)
> (XEN)  [00000000ffb00000, 00000000ffffffff] (reserved)
> (XEN)  [0000000100000000, 000000013fffffff] (usable)
> (XEN) Early fatal page fault at e008:ffff82d0403b38e0 
> (cr2=0000000001100202, ec=0009)
> (XEN) ----[ Xen-4.22-unstable  x86_64  debug=y  Not tainted ]----
> (XEN) CPU:    0
> (XEN) RIP:    e008:[<ffff82d0403b38e0>] memcmp+0x20/0x46
> (XEN) RFLAGS: 0000000000010002   CONTEXT: hypervisor
> (XEN) rax: 0000000000000000   rbx: 0000000001100000   rcx: 0000000000000000
> (XEN) rdx: 0000000000000004   rsi: ffff82d0404a0d23   rdi: 0000000001100202
> (XEN) rbp: ffff82d040497d88   rsp: ffff82d040497d78   r8:  0000000000000016
> (XEN) r9:  ffff82d04061a180   r10: ffff82d04061a188   r11: 0000000000000010
> (XEN) r12: 0000000001100000   r13: 0000000000000001   r14: ffff82d0404d2b80
> (XEN) r15: ffff82d040462750   cr0: 0000000080050033   cr4: 00000000000000a0
> (XEN) cr3: 00000000b5d0e000   cr2: 0000000001100202
> (XEN) fsb: 0000000000000000   gsb: 0000000000000000   gss: 0000000000000000
> (XEN) ds: 0000   es: 0000   fs: 0000   gs: 0000   ss: 0000   cs: e008
> (XEN) Xen code around <ffff82d0403b38e0> (memcmp+0x20/0x46):
> (XEN)  0f 1f 84 00 00 00 00 00 <0f> b6 04 0f 44 0f b6 04 0e 44 29 c0 75 
> 13 48 83
> (XEN) Xen stack trace from rsp=ffff82d040497d78:
> (XEN)    ffff82d040483f79 0000000000696630 ffff82d040497db0 ffff82d040483fd2
> (XEN)    0000000000696630 ffff82d040200000 0000000000000001 ffff82d040497ef8
> (XEN)    ffff82d04047c4ac 0000000000000000 0000000000000000 0000000000000000
> (XEN)    ffff82d04062c6d8 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000140000 0000000000000000 0000000000000001
> (XEN)    0000000000000000 0000000000000000 ffff82d040497f08 ffff82d0404d2b80
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000800000000 000000010000006e 0000000000000003
> (XEN)    00000000000002f8 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000099f30ba0 0000000099feeda7 0000000000000000 ffff82d040497fff
> (XEN)    00000000b9cf3920 ffff82d0402043e8 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000000000000000 0000000000000000 0000000000000000
> (XEN)    0000000000000000 0000e01000000000 0000000000000000 0000000000000000
> (XEN)    00000000000000a0 0000000000000000 0000000000000000 0000000000000000
> (XEN) Xen call trace:
> (XEN)    [<ffff82d0403b38e0>] R memcmp+0x20/0x46
> (XEN)    [<ffff82d040483f79>] S arch/x86/bzimage.c#bzimage_check+0x2e/0x73
> (XEN)    [<ffff82d040483fd2>] F bzimage_headroom+0x14/0xa5
> (XEN)    [<ffff82d04047c4ac>] F __start_xen+0x908/0x2452
> (XEN)    [<ffff82d0402043e8>] F __high_start+0xb8/0xc0
> (XEN)
> (XEN) Pagetable walk from 0000000001100202:
> (XEN)  L4[0x000] = 00000000b5c9d063 ffffffffffffffff
> (XEN)
> (XEN) ****************************************
> (XEN) Panic on CPU 0:
> (XEN) FATAL TRAP: vec 14, #PF[0009] IN INTERRUPT CONTEXT
> (XEN) ****************************************

Huh, that means we have a bug in the pagewalk rendering.  It shouldn't
give up like that.

>> It's probably easiest if I prototype the split I'd like to see, and you
>> integrate with that.

I've had a go at this.  It's a 6 patch series and growing.  The early
logic is horribly tangled, but there's a lot to delete in it.

~Andrew

Reply via email to