On Sun, Jun 01, 2025 at 08:44:29PM -0500, Mario Limonciello wrote: > From: Mario Limonciello <mario.limoncie...@amd.com> > > Chris Bainbridge reported some list corruption occurring around the > suspend sequence when an aborted suspend occurs. > > I couldn't reproduce this specific problem, but when I tried I found > some other issues where the cached DM state isn't properly destroyed. > > This is because there isn't a complete() callback to match the prepare() > callback used by amdgpu. Normally the PM core will call complete() after > every suspend attempt (succesful or not). > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/4280 > > Mario Limonciello (3): > drm/amd: Add support for a complete pmops action > drm/amd/display: Stop storing failures into adev->dm.cached_state > drm/amd/display: Destroy cached state in complete() callback > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 +++ > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 +- > .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 125 +++++++++++------- > drivers/gpu/drm/amd/include/amd_shared.h | 1 + > 5 files changed, 103 insertions(+), 48 deletions(-) > > -- > 2.43.0 >
I tested with 30 suspends and the dm_prepare_suspend / amdgpu_device_prepare error did not appear. The list corruption error remain but that bisects to: aa7a9275ab81 ("PM: sleep: Suspend async parents after suspending children"). I applied your patch series to the parent of that commit, tested, and there were no errors. So this issue looks fixed but the other issue remains, email sent: https://lore.kernel.org/all/aD2U3VIhf8vDkl09@debian.local/