On Monday 03/09 at 21:24 -0700, Calvin Owens wrote:
> Commit e1b385726f7f ("drm/amd/display: Add additional checks for PSP
> footer size") introduced a use of an uninitialized stack variable
> in dm_dmub_sw_init() (region_params.bss_data_size).
>
> Interestingly, this seems to cause no issue on normal kernels. But when
> full LTO is enabled, it causes the compiler to "optimize" out huge
> swaths of amdgpu initialization code, and the driver is unusable:
>
> amdgpu 0000:03:00.0: [drm] Loading DMUB firmware via PSP:
> version=0x07002F00
> amdgpu 0000:03:00.0: sw_init of IP block <dm> failed 5
> amdgpu 0000:03:00.0: amdgpu_device_ip_init failed
> amdgpu 0000:03:00.0: Fatal error during GPU init
In case anybody wants to poke around, I uploaded the binaries here:
https://github.com/jcalvinowens/lkml-debug/releases/tag/000001
You can see in the diff of the disassembly that the "missing" piece of
dm_sw_init() reappeared after reverting e1b38572:
https://github.com/jcalvinowens/lkml-debug/blob/main/amdgpu-lto/not-working-to-working.diff
This is my bisect log:
bad: [1f318b96cc84d7c2ab792fcc0bfd42a7ca890681] Linux 7.0-rc3
good: [05f7e89ab9731565d8a62e3b5d1ec206485eeb0b] Linux 6.19
bad: [1c2b4a4c2bcb950f182eeeb33d94b565607608cf] Merge tag
'pci-v7.0-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
good: [6589b3d76db2d6adbf8f2084c303fb24252a0dc6] Merge tag 'soc-dt-7.0' of
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
bad: [a60f627cf4ab474aebf15f62c55eadabab9780da] Merge tag
'amd-drm-next-6.20-2026-01-30' of https://gitlab.freedesktop.org/agd5f/linux
into drm-next
good: [83675851547e835c15252c601f41acf269c351d9] drm/xe: Cleanup unused
header includes
bad: [71573db5ad74b2087a4688cd1dda73ff082620f6] drm/amd/display: switch to
drm_dbg_ macros instead of DRM_DEBUG_ variants
bad: [3235a5b72317be613b69e22c3b2c9f2bec546253] drm/amdgpu: Update MES
VM_CNTX_CNTL for XNACK off for GFX 12.1
bad: [e1b73b64271d706079370b58b81292dafd373163] amdkfd: remove DIQ support
good: [2634ef1b8c00207dde5101e926241957aa5652b8] drm/amdkfd: Fix PTE
clearing during SVM unmap on GFX 12.1
bad: [af441be8b75deb93ded51c54b9a2ba1e048b1c91] drm/amdgpu: add support for
sdma v7_1
good: [69249b477b95f91e56bb19ec53707253899458c4] drm/amd/display: Move
dml2_validate to the non-FPU dml2_wrapper
bad: [ec62b7ded978957ec74add4c1feccc986e2baeef] drm/amdkfd: Uninitialized
and Unused variables
good: [c7062be3380cb20c8b1c4a935a13f1848ead0719] drm/amd/display: Correct
DSC padding accounting
bad: [d28e92093ceffb424b9b0e36bbd391c83b1cfe78] drm/amd/display: [FW
Promotion] Release 0.1.37.0
bad: [e1b385726f7f7fc75b6cd3c2216430de8a625a2d] drm/amd/display: Add
additional checks for PSP footer size
first bad commit: [e1b385726f7f7fc75b6cd3c2216430de8a625a2d]
drm/amd/display: Add additional checks for PSP footer size
Thanks,
Calvin