On 8/2/24 20:07, Tomoaki AOKI wrote:
On Fri, 2 Aug 2024 17:24:30 +0100
Pontus Bramberg <pon...@bramberg.net> wrote:

From: Pontus Bramberg <pon...@bramberg.net>
To: stable@freebsd.org
Subject: Nvidia Xorg page fault in kernel mode 14-STABLE/amd64
Date: Fri, 2 Aug 2024 17:24:30 +0100
Sender: owner-freebsd-sta...@freebsd.org
User-Agent: Mozilla Thunderbird

Hello,

I'm not entirely sure if this is the right place to ask about this. I
apologise if not.

I recently updated the UEFI BIOS on my laptop. Immediately after this,
Xorg would not start when using the Nvidia driver. I attempted to
downgrade the BIOS but the laptop would not flash an earlier version of
the BIOS. I therefore instead tried to rebuild kernel and world from the
latest stable/14 git commit (b37a6d41a046dbb46ee1d6bf00c710c03c944a24)
as well as uninstalling and reinstalling x11/nvidia-driver from the
latest ports collection (version 550.54.14). This did not help so I
rebuilt the same kernel and world after 'make clean' and 'make
cleanworld' and reinstalled the same version of x11/nvidia-driver. I at
first thought this might be related to the similar issue discussed July
2 to 5 on this mailing list but the workaround from then (rebuilding
kernel, world, and driver) does not work for me and the BIOS update make
me think this is a different issue. Xorg works perfectly well if I
switch to the integrated Intel graphics (using the i915kms module) so I
think the problem is related to the discrete GPU. I do not normally use
nvidia-drm-kmod but I have tried using both graphics/nvidia-drm-kmod and
graphics/nvidia-drm-61-kmod with the same result, except the system
crashes on boot rather than when starting Xorg (I use startx if that
matters). The laptop is a Lenovo Thinkpad P16 with an Nvidia RTX 3500
Ada Generation Laptop GPU if that is helpful. If there are any logs or
anything else that would be useful, please let me know. I would be very
grateful if anybody knows how to resolve this or has any pointers for
further troubleshooting.

The output before the system crashes:

ACPI Warning: \134_SB.PC00.PEG1.PEGP._DSM: Argument #4 type mismatch -
Found [Buffer], ACPI requires [Package] (20221020/nsarguments-212)
NVRM: GPU at PCI:0000:01:00: GPU-58d85fdb-6f45-87c1-fe0f-9a26e92647c9
NVRM: Xid (PCI:0000:01:00): 62, pid='', name=, 2022a7a6 2028a6fc
2027a696 2027a1b2 20250cf2 2025084c 00000000 00000000


Fatal trap 12: page fault while in kernel mode
cpuid = 18; apic id = 44
fault virtual address = 0x0
fault code = supervisor read data, page not present
instruction pointer = 0x20:0xffffffff861c1354
stack pointer = 0x28:0xfffffe02ebc316b0
frame pointer = 0x28:0xfffffe02eff26e10
code segment = base 0x0, limit 0xfffff, type 0x1b
              = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags = interrupt enabled, resume, IOPL = 0
current process = 2380 (Xorg)
rdi: 0000000000000000 rsi: 0000000000000040 rdx: 0000000000000007
rcx: 0000000000000007 r8: 00000000000000c0 r9: 0000000000000066
rax: 0000000000000000 rbx: fffffe02f051c000 rbp: fffffe02eff26e10
r10: 00000000100d96f8 r11: 0000000066ace8f2 r12: fffffe02eff2d000
r13: 0000000000000000 r14: 0000000000000055 r15: 0000000000000000
trap number     = 12
panic: page fault
cpuid = 18
time = 1722607858
KDB: stack backtrace:
#0 0xffffffff80b86d7d at kdb_backtrace+0x5d
#1 0xffffffff80b399a1 at vpanic+0x131
#2 0xffffffff80b39863 at panic+0x43
#3 0xffffffff8101a93b at trap_fatal+0x40b
#4 0xffffffff8101a986 at trap_pfault+0x46
#5 0xffffffff80ff0c98 at calltrap+0x8
Uptime: 39s
Automatic reboot in 15 seconds - press a key on the console to abort
--> Press a key on the console to reboot,
--> or switch off the system now.

Best wishes,
Pontus Bramberg <pon...@bramberg.net>

How do you load nvidia related modules?
If you're loading them via /boot/loader.conf[.local], please don't.
You can load them via kldlist variable in /etc/rc.conf[.local].

This usually causes problem on module loading.
And trap 12 on boot makes me suspect the truncated loading of modules.
(These truncations cause many types of crashes, though.)

See, for example, Bug277827 [1], Bug277364 [2] and Bug277028.

One more to mention, assuming you're building x11/nvidia-driver and
graphics/nvidia-drm-61-kmod from ports, for latest stable/14, it's
strange if you could build graphics/nvidia-drm-61-kmod (or
graphics/nvidia-drm-515-kmod) on vanilla ports tree. The patch proposed
(by me) on Bug279539 [4] should be needed for successful build.

And does anything default (you have't modified) options ON UEFI
FIRMWARE changed, according to the firmware release notes?
Lenovo usually provides relatively precise per-revision informations in
it, at least for ThinPad P and T series.

If anything changed between your previous and current firmware, trying
to restore the changed defaults to previous default could help.

[1] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277827

[2] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277364

[3] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=277028

[4] https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=279539


Thank you very much for this.
I load the relevant module (nvidia-modeset or nvidia-drm) by adding it to kldlist in /etc/rc.conf. When using nvidia-drm, I have added only hw.nvidiadrm.modeset=1 to /boot/loader.conf. Using nvidia-modeset, there are no graphics related items in /boot/loader.conf.

I install all my software using the ports collection.
I was using the patch you proposed on Bug279539 when building graphics/nvidia-drm-61-kmod. I apologise for not mentioning that. I just tested this again to make sure I remember correctly and indeed graphics/nvidia-drm-61-kmod does not build with the vanilla ports tree but it does with the patch. With nvidia-drm (built using the patch) in kldlist, the system crashes on boot rather than when attempting to start X and I have to boot into single-user mode to remove it from /etc/rc.conf.

After updating the BIOS, I restored factory defaults. The changes I have made are disabling the trackpad and setting F1-F12 (instead of media functions) to be the default mode for the top row of keys. I have tried with the graphics card set to discrete only (my preference) and hybrid graphics (I am currently using this as it allows me to run X using the integrated GPU without a problem). In both cases, attempting to start Xorg using the Nvidia card either crashes the system with the same error (instruction pointer and stack backtrace also seem to be the same) or does not find the graphics card (in hybrid mode I have to set BusID for the Nvidia card to be found by Xorg but that also does not work). I have tried adding (separately, with reboots in between) both nvidia-drm and nvidia-modeset to kldlist with this configuration and the results are the same except that nvidia-modeset does not crash until I try to start X while nvidia-drm crashes on boot.

Reply via email to