Control: tags -1 + upstream Hi,
On Wed, Apr 02, 2025 at 12:22:14PM -0400, Calum McConnell wrote: > Package: src:linux > Version: 6.12.20-1 > Severity: normal > X-Debbugs-Cc: debian-am...@lists.debian.org > User: debian-am...@lists.debian.org > Usertags: amd64 > > I had a kernel crash and dump occur when running Freecad (version in Trixie > repos) with DRI_PRIME=1 > and an override needed to let it run (COIN_GL_NO_CURRENT_CONTEXT_CHECK=1). > Its likely that there are > bugs in Freecad as well, which I will probably try to report, but it was > successfully running, and I was > manipulating constraints when the whole system went down with a kdump. The > error is a BUG: Null Pointer Dereference. > I had previously (and successfully) been running other games and programs on > the amdgpu during this boot, as confirmed > by framerates and radeontop. I have not yet tried to reproduce the error on > a clean boot, without a boatload of > other programs running. > > Kdump-tools collected the dump. I have attached the DMESG output (xz > compressed). A complete dump is available, but weighs in at 1.4GB after > xz compression; a clean boot reproduction would likely be smaller, and > available on request. I uploaded the dump to: > https://drive.google.com/file/d/1gro_KMPDqG1kp4BN-VXyZCAM_S8Pg64L/view?usp=sharing > > The OOPS/BUG/kernel standard crash log is below. The 'kernel log' in the > main body of this message refers to the log of a normal-so-far boot. I also > want to draw > attention to the line "i915 0000:00...". Unlike the other lines that occur > before the oops, this line is NOT typically printed while my machine is > operating. > > [611643.107695] [ T388600] pcieport 0000:00:1d.0: Intel SPT PCH root port ACS > workaround enabled > [611643.288862] [ T386248] i915 0000:00:02.0: [drm] *ERROR* Atomic update > failure on pipe A (start=1 end=2) time 201 us, min 1073, max 1079, scanline > start 1070, end 1083 > [611643.344171] [ T388600] [drm] PCIE GART of 256M enabled (table at > 0x000000F400000000). > [611643.607739] [ T388600] amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper > [amdgpu]] *ERROR* ring comp_1.1.0 test failed (-110) > [611643.810417] [ T388600] amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper > [amdgpu]] *ERROR* ring comp_1.2.0 test failed (-110) > [611644.014733] [ T388600] amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper > [amdgpu]] *ERROR* ring comp_1.3.0 test failed (-110) > [611644.218676] [ T388600] amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper > [amdgpu]] *ERROR* ring comp_1.0.1 test failed (-110) > [611644.423194] [ T388600] amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper > [amdgpu]] *ERROR* ring comp_1.1.1 test failed (-110) > [611644.627687] [ T388600] amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper > [amdgpu]] *ERROR* ring comp_1.2.1 test failed (-110) > [611644.832908] [ T388600] amdgpu 0000:05:00.0: [drm:amdgpu_ring_test_helper > [amdgpu]] *ERROR* ring comp_1.3.1 test failed (-110) > [611644.969406] [ T388600] [drm] UVD and UVD ENC initialized successfully. > [611645.069388] [ T388600] [drm] VCE initialized successfully. > [611645.075260] [ T388600] amdgpu 0000:05:00.0: [drm] Cannot find any crtc or > sizes > [611645.172785] [ T388615] [drm] scheduler comp_1.1.0 is not ready, skipping > [611645.172788] [ T388615] [drm] scheduler comp_1.2.0 is not ready, skipping > [611645.172791] [ T388615] [drm] scheduler comp_1.3.0 is not ready, skipping > [611645.172793] [ T388615] [drm] scheduler comp_1.0.1 is not ready, skipping > [611645.172794] [ T388615] [drm] scheduler comp_1.1.1 is not ready, skipping > [611645.172795] [ T388615] [drm] scheduler comp_1.2.1 is not ready, skipping > [611645.172796] [ T388615] [drm] scheduler comp_1.3.1 is not ready, skipping > [611645.172798] [ T388615] BUG: kernel NULL pointer dereference, address: > 0000000000000008 > [611645.172801] [ T388615] #PF: supervisor read access in kernel mode > [611645.172802] [ T388615] #PF: error_code(0x0000) - not-present page > [611645.172804] [ T388615] PGD 0 P4D 0 > [611645.172807] [ T388615] Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI > [611645.172811] [ T388615] CPU: 0 UID: 1000 PID: 388615 Comm: freecad:cs0 > Kdump: loaded Tainted: G U 6.12.19-amd64 #1 Debian 6.12.19-1 > [611645.172815] [ T388615] Tainted: [U]=USER > [611645.172817] [ T388615] Hardware name: Dell Inc. Latitude 7424 Rugged > Extreme/0TJ1W1, BIOS 1.35.0 11/07/2024 > [611645.172818] [ T388615] RIP: 0010:drm_sched_job_arm+0x23/0x60 [gpu_sched] > [611645.172827] [ T388615] Code: 90 90 90 90 90 90 90 f3 0f 1e fa 0f 1f 44 00 > 00 55 53 48 8b 6f 60 48 85 ed 74 3f 48 89 fb 48 89 ef e8 a1 38 00 00 48 8b 45 > 10 <48> 8b 50 08 48 89 53 18 8b 45 24 89 43 5c b8 01 00 00 00 f0 48 0f > [611645.172830] [ T388615] RSP: 0018:ffffaa5246bab808 EFLAGS: 00010206 > [611645.172832] [ T388615] RAX: 0000000000000000 RBX: ffff9c6eda0c9400 RCX: > ffff9c6f18e190d0 > [611645.172834] [ T388615] RDX: 0000000000000000 RSI: 0000000000000000 RDI: > ffff9c6f18e1a838 > [611645.172836] [ T388615] RBP: ffff9c6f18e1a810 R08: ffff9c6f0185b468 R09: > ffffaa5246bab648 > [611645.172838] [ T388615] R10: ffffffffac4741e8 R11: 0000000000000003 R12: > 0000000000000000 > [611645.172839] [ T388615] R13: ffffaa5246bab888 R14: 0000000000000000 R15: > 0000000000000000 > [611645.172841] [ T388615] FS: 00007f80c4bff6c0(0000) > GS:ffff9c722fa00000(0000) knlGS:0000000000000000 > [611645.172843] [ T388615] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [611645.172845] [ T388615] CR2: 0000000000000008 CR3: 000000012df78006 CR4: > 00000000003726f0 > [611645.172847] [ T388615] Call Trace: > [611645.172849] [ T388615] <TASK> > [611645.172852] [ T388615] ? __die_body.cold+0x19/0x27 > [611645.172857] [ T388615] ? page_fault_oops+0x15c/0x2e0 > [611645.172862] [ T388615] ? exc_page_fault+0x7e/0x180 > [611645.172865] [ T388615] ? asm_exc_page_fault+0x26/0x30 > [611645.172870] [ T388615] ? drm_sched_job_arm+0x23/0x60 [gpu_sched] > [611645.172875] [ T388615] ? drm_sched_job_arm+0x1f/0x60 [gpu_sched] > [611645.172879] [ T388615] amdgpu_cs_ioctl+0x14f2/0x1a20 [amdgpu] > [611645.173308] [ T388615] ? psi_group_change+0x138/0x300 > [611645.173315] [ T388615] ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu] > [611645.173638] [ T388615] drm_ioctl_kernel+0xad/0x100 [drm] > [611645.173694] [ T388615] drm_ioctl+0x277/0x4f0 [drm] > [611645.173737] [ T388615] ? __pfx_amdgpu_cs_ioctl+0x10/0x10 [amdgpu] > [611645.174084] [ T388615] amdgpu_drm_ioctl+0x4b/0x80 [amdgpu] > [611645.174393] [ T388615] __x64_sys_ioctl+0x91/0xd0 > [611645.174396] [ T388615] do_syscall_64+0x82/0x190 > [611645.174399] [ T388615] ? gup_fast_pte_range+0xd0/0x380 > [611645.174403] [ T388615] ? futex_wake+0x8f/0x1b0 > [611645.174407] [ T388615] ? do_futex+0x125/0x190 > [611645.174409] [ T388615] ? __x64_sys_futex+0x127/0x1e0 > [611645.174411] [ T388615] ? sched_clock+0x10/0x30 > [611645.174413] [ T388615] ? sched_clock_cpu+0xf/0x1d0 > [611645.174416] [ T388615] ? syscall_exit_to_user_mode+0x4d/0x210 > [611645.174419] [ T388615] ? do_syscall_64+0x8e/0x190 > [611645.174421] [ T388615] ? wake_up_q+0x4e/0x90 > [611645.174424] [ T388615] ? futex_wake+0x187/0x1b0 > [611645.174427] [ T388615] ? do_futex+0x125/0x190 > [611645.174429] [ T388615] ? __x64_sys_futex+0x127/0x1e0 > [611645.174431] [ T388615] ? syscall_exit_to_user_mode+0x4d/0x210 > [611645.174433] [ T388615] ? do_syscall_64+0x8e/0x190 > [611645.174436] [ T388615] ? syscall_exit_to_user_mode+0x4d/0x210 > [611645.174438] [ T388615] ? do_syscall_64+0x8e/0x190 > [611645.174440] [ T388615] ? do_syscall_64+0x8e/0x190 > [611645.174442] [ T388615] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [611645.174445] [ T388615] RIP: 0033:0x7f80eb3168db > [611645.174469] [ T388615] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 > 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f > 05 <89> c2 3d 00 f0 ff ff 77 1c 48 8b 44 24 18 64 48 2b 04 25 28 00 00 > [611645.174472] [ T388615] RSP: 002b:00007f80c4bfe750 EFLAGS: 00000246 > ORIG_RAX: 0000000000000010 > [611645.174475] [ T388615] RAX: ffffffffffffffda RBX: 00000000c0186444 RCX: > 00007f80eb3168db > [611645.174477] [ T388615] RDX: 00007f80c4bfe7e0 RSI: 00000000c0186444 RDI: > 000000000000001b > [611645.174478] [ T388615] RBP: 00007f80c4bfe820 R08: 00007f80c4bfe8a0 R09: > 00007f80c4bfe7b0 > [611645.174480] [ T388615] R10: 0000000000000000 R11: 0000000000000246 R12: > 00007f80c4bfe7e0 > [611645.174481] [ T388615] R13: 000000000000001b R14: 00007f80c4bfe9e0 R15: > 00007f80c4bfe860 > [611645.174484] [ T388615] </TASK> > [611645.174485] [ T388615] Modules linked in: snd_usb_audio snd_usbmidi_lib > snd_rawmidi uinput sd_mod ccm snd_seq_dummy snd_hrtimer snd_seq > snd_seq_device rfcomm cmac algif_hash algif_skcipher af_alg bnep cpuid > snd_hda_codec_hdmi dell_pc platform_profile intel_uncore_frequency > intel_uncore_frequency_common snd_sof_pci_intel_skl x86_pkg_temp_thermal > snd_sof_intel_hda_generic binfmt_misc soundwire_intel > soundwire_generic_allocation soundwire_cadence snd_sof_intel_hda_common > nls_ascii nls_cp437 snd_soc_hdac_hda vfat snd_sof_intel_hda_mlink fat > snd_sof_intel_hda dell_rbtn intel_powerclamp snd_sof_pci snd_sof_xtensa_dsp > coretemp snd_sof kvm_intel snd_sof_utils snd_soc_acpi_intel_match > snd_soc_acpi soundwire_bus kvm snd_soc_avs snd_soc_hda_codec snd_hda_ext_core > iwlmvm snd_ctl_led snd_soc_core snd_hda_codec_realtek snd_hda_codec_generic > snd_compress crct10dif_pclmul snd_hda_scodec_component snd_pcm_dmaengine > ghash_clmulni_intel intel_rapl_msr mac80211 sha512_ssse3 snd_hda_intel > sha256_ssse3 dell_laptop snd_intel_dspcfg sha1_ssse3 mei_wdt > [611645.174530] [ T388615] mei_hdcp mei_pxp snd_intel_sdw_acpi aesni_intel > snd_hda_codec gf128mul uvcvideo btusb crypto_simd cryptd btrtl btintel rapl > videobuf2_vmalloc snd_hda_core uvc processor_thermal_device_pci_legacy > processor_thermal_device videobuf2_memops btbcm intel_cstate videobuf2_v4l2 > btmtk processor_thermal_wt_hint snd_hwdep dell_smm_hwmon dell_wmi libarc4 > processor_thermal_rfim intel_uncore videodev snd_pcm processor_thermal_rapl > bluetooth intel_rapl_common iwlwifi snd_timer dell_smbios iTCO_wdt > processor_thermal_wt_req ucsi_acpi dell_wmi_sysman firmware_attributes_class > intel_pmc_bxt pcspkr processor_thermal_power_floor snd typec_ucsi dcdbas > videobuf2_common mei_me typec processor_thermal_mbox iTCO_vendor_support > wmi_bmof intel_xhci_usb_role_switch dell_wmi_descriptor mei mc watchdog > ee1004 soundcore intel_pch_thermal intel_soc_dts_iosf roles cfg80211 > soc_button_array sg joydev dell_smo8800 intel_pmc_core intel_hid intel_vsec > int3400_thermal int3403_thermal sparse_keymap acpi_pad pmt_telemetry > acpi_thermal_rel > [611645.174578] [ T388615] int340x_thermal_zone pmt_class ac rfkill > serio_raw evdev msr parport_pc ppdev lp parport nvme_fabrics dm_mod loop > nvme_keyring efi_pstore configfs nfnetlink ip_tables x_tables autofs4 btrfs > blake2b_generic efivarfs raid10 raid456 async_raid6_recov async_memcpy > async_pq async_xor async_tx xor raid6_pq libcrc32c crc32c_generic raid1 raid0 > md_mod zstd sr_mod cdrom r8153_ecm cdc_ether usbnet hid_multitouch > hid_generic uas usb_storage scsi_mod r8152 mii libphy scsi_common usbhid > amdgpu i915 amdxcp drm_exec gpu_sched drm_buddy i2c_algo_bit > drm_suballoc_helper drm_display_helper cec rc_core drm_ttm_helper xhci_pci > xhci_hcd ttm nvme i2c_hid_acpi i2c_hid usbcore drm_kms_helper hid nvme_core > i2c_i801 intel_lpss_pci crc32_pclmul video intel_lpss crc32c_intel e1000e > i2c_smbus nvme_auth crc16 idma64 usb_common drm battery wmi button > [611645.174634] [ T388615] CR2: 0000000000000008 FWIW, the crash looks similar to what was discussed in https://lore.kernel.org/all/20250107140240.325899-1-philipp.reis...@linbit.com/ . Regards, Salvatore