https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=267028
--- Comment #307 from Mark Millard <marklmi26-f...@yahoo.com> --- (In reply to Mark Millard from comment #306) Going backwards through part of the list node allocations (before the node is filled in but showing the contains and modname addresses that are to be assgined in each case. . . (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos] $7 = {modAddr = 0xfffff8000471eac0, containerAddr = 0xfffff800038caa80, modnameAddr = 0xffffffff82ea6025 "amdgpu_raven_vcn_bin_fw", version = 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-1] $8 = {modAddr = 0xfffff8000471e900, containerAddr = 0xfffff800038cac00, modnameAddr = 0xffffffff82e62026 "amdgpu_raven_mec2_bin_fw", version = 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-2] $9 = {modAddr = 0xfffff800046581c0, containerAddr = 0xfffff8000464a600, modnameAddr = 0xffffffff82e1e010 "amdgpu_raven_mec_bin_fw", version = 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-3] $10 = {modAddr = 0xfffff80004574040, containerAddr = 0xfffff800038c9000, modnameAddr = 0xffffffff82e12009 "amdgpu_raven_rlc_bin_fw", version = 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-4] $11 = {modAddr = 0xfffff80004574100, containerAddr = 0xfffff800038c9300, modnameAddr = 0xffffffff829f6010 "amdgpu_raven_ce_bin_fw", version = 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-5] $12 = {modAddr = 0xfffff800036f00c0, containerAddr = 0xfffff80004ad6c00, modnameAddr = 0xffffffff829ef000 "amdgpu_raven_me_bin_fw", version = 1} (kgdb) print modlist_newmod_hist[modlist_newmod_hist_pos-6] $13 = {modAddr = 0xfffff8000471e980, containerAddr = 0xfffff800038c9480, modnameAddr = 0xffffffff829e7025 "amdgpu_raven_pfp_bin_fw", version = 1} Going backwards through that part of list later, after the failure: (kgdb) print *(modlist_t)0xfffff8000471eac0 $24 = {link = {tqe_next = 0x0, tqe_prev = 0xfffff8000471e900}, container = 0xfffff800038caa80, name = 0xffffffff82ea6025 "amdgpu_raven_vcn_bin_fw", version = 1} (kgdb) print *(modlist_t)0xfffff8000471e900 $25 = {link = {tqe_next = 0xfffff8000471eac0, tqe_prev = 0xfffff800046581c0}, container = 0xfffff800038cac00, name = 0xffffffff82e62026 "amdgpu_raven_mec2_bin_fw", version = 1} . . . (kgdb) print *(modlist_t)0xfffff800046581c0 $27 = {link = {tqe_next = 0xfffff8000471e900, tqe_prev = 0xfffff80004574040}, container = 0xfffff8000464a600, name = 0xffffffff82e1e010 "amdgpu_raven_mec_bin_fw", version = 1} (kgdb) print *(modlist_t)0xfffff80004574040 $28 = {link = {tqe_next = 0xfffff800046581c0, tqe_prev = 0xfffff80004574100}, container = 0xfffff800038c9000, name = 0xffffffff82e12009 "amdgpu_raven_rlc_bin_fw", version = 1} (kgdb) print *(modlist_t)0xfffff80004574100 $29 = {link = {tqe_next = 0xfffff80004574040, tqe_prev = 0xfffff800036f00c0}, container = 0xfffff800038c9300, name = 0xffffffff829f6010 "amdgpu_raven_ce_bin_fw", version = 1} (kgdb) print *(modlist_t)0xfffff800036f00c0 $30 = {link = {tqe_next = 0xfffff80000000007, tqe_prev = 0xfffff8000471e980}, container = 0xfffff80004ad6c00, name = 0xffffffff829ef000 "amdgpu_raven_me_bin_fw", version = 1} NOTE THE BAD tqe_next== 0xfffff80000000007 ABOVE. (kgdb) print *(modlist_t)0xfffff8000471e980 $31 = {link = {tqe_next = 0xfffff800036f00c0, tqe_prev = 0xfffff800036f0100}, container = 0xfffff800038c9480, name = 0xffffffff829e7025 "amdgpu_raven_pfp_bin_fw", version = 1} So: all the nodes are there but just one ends up with the odd tqe_next== 0xfffff80000000007 corruption. There was no allocation that returned 0xfffff80000000007 (not recorded and I'd set up for such a value to panix just after the allocation). Something replaced the intended: *(modlist_t)0xfffff800036f00c0.link.tqe_next == 0xfffff80004574100 with: *(modlist_t)0xfffff800036f00c0.link.tqe_next == 0xfffff80000000007 The scans of the list were okay as of setting up each of (listed in execution order, not backwards list order): "amdgpu_raven_ce_bin_fw" "amdgpu_raven_rlc_bin_fw" "amdgpu_raven_mec_bin_fw" "amdgpu_raven_mec2_bin_fw" "amdgpu_raven_vcn_bin_fw" But as of (the first afater "amdgpu_raven_vcn_bin_fw"): "acpi_wmi" The list had the corrupted link.tqe_next associated with "amdgpu_raven_me_bin_fw". This suggests at/after the generation of: drmn0: successfully loaded firmware image 'amdgpu/raven_vcn.bin' during the generation of the sequence: <6>[drm] Found VCN firmware Version ENC: 1.13 DEC: 2 VEP: 0 Revision: 4 drmn0: Will use PSP to load VCN firmware <6>[drm] reserve 0x400000 from 0xf47fc00000 for PSP TMR drmn0: RAS: optional ras ta ucode is not available drmn0: RAP: optional rap ta ucode is not available <6>[drm] kiq ring mec 2 pipe 1 q 0 <6>[drm] DM_PPLIB: values for F clock <6>[drm] DM_PPLIB: 400000 in kHz, 3649 in mV <6>[drm] DM_PPLIB: 933000 in kHz, 4074 in mV <6>[drm] DM_PPLIB: 1200000 in kHz, 4399 in mV <6>[drm] DM_PPLIB: 1333000 in kHz, 4399 in mV <6>[drm] DM_PPLIB: values for DCF clock <6>[drm] DM_PPLIB: 300000 in kHz, 3649 in mV <6>[drm] DM_PPLIB: 600000 in kHz, 4074 in mV <6>[drm] DM_PPLIB: 626000 in kHz, 4250 in mV <6>[drm] DM_PPLIB: 654000 in kHz, 4399 in mV <6>[drm] Display Core initialized with v3.2.104! lkpi_iic0: <LinuxKPI I2C> on drmn0 iicbus0: <Philips I2C bus> on lkpi_iic0 iic0: <I2C generic I/O> on iicbus0 lkpi_iic1: <LinuxKPI I2C> on drmn0 iicbus1: <Philips I2C bus> on lkpi_iic1 iic1: <I2C generic I/O> on iicbus1 <6>[drm] VCN decode and encode initialized successfully(under SPG Mode). drmn0: SE 1, SH per SE 1, CU per SH 11, active_cu_number 8 <6>[drm] fb mappable at 0x60BCA000 <6>[drm] vram apper at 0x60000000 <6>[drm] size 8294400 <6>[drm] fb depth is 24 <6>[drm] pitch is 7680 VT: Replacing driver "vga" with new "fb". start FB_INFO: type=11 height=1080 width=1920 depth=32 pbase=0x60bca000 vbase=0xfffff80060bca000 name=drmn0 flags=0x0 stride=7680 bpp=32 end FB_INFO drmn0: ring gfx uses VM inv eng 0 on hub 0 drmn0: ring comp_1.0.0 uses VM inv eng 1 on hub 0 drmn0: ring comp_1.1.0 uses VM inv eng 4 on hub 0 drmn0: ring comp_1.2.0 uses VM inv eng 5 on hub 0 drmn0: ring comp_1.3.0 uses VM inv eng 6 on hub 0 drmn0: ring comp_1.0.1 uses VM inv eng 7 on hub 0 drmn0: ring comp_1.1.1 uses VM inv eng 8 on hub 0 drmn0: ring comp_1.2.1 uses VM inv eng 9 on hub 0 drmn0: ring comp_1.3.1 uses VM inv eng 10 on hub 0 drmn0: ring kiq_2.1.0 uses VM inv eng 11 on hub 0 drmn0: ring sdma0 uses VM inv eng 0 on hub 1 drmn0: ring vcn_dec uses VM inv eng 1 on hub 1 drmn0: ring vcn_enc0 uses VM inv eng 4 on hub 1 drmn0: ring vcn_enc1 uses VM inv eng 5 on hub 1 drmn0: ring jpeg_dec uses VM inv eng 6 on hub 1 vgapci0: child drmn0 requested pci_get_powerstate sysctl_warn_reuse: can't re-use a leaf (hw.dri.debug)! <6>[drm] Initialized amdgpu 3.40.0 20150101 for drmn0 on minor 0 Or the very early stages of setting up: acpi_wmi.ko The mismatch was detected during the first modlist_lookup for the found_modules list for the setup of acpi_wmi.ko. The "during" text seems to happen during activity from the likes of: /wrkdirs/usr/ports/graphics/drm-510-kmod/work/drm-kmod-drm_v5.10.163_7/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c (given the raven firmware is in use as well?). -- You are receiving this mail because: You are the assignee for the bug.