On 5/7/2025 2:14 PM, Rafael J. Wysocki wrote:
On Thu, May 1, 2025 at 11:17 PM Mario Limonciello <supe...@kernel.org> wrote:

From: Mario Limonciello <mario.limoncie...@amd.com>

commit 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification
callback support") introduced a VRAM eviction earlier in the PM
sequences when swap was still available for evicting to. This helped
to fix a number of memory pressure related bugs but also exposed a
new one.

If a userspace process is actively using the GPU when suspend starts
then a deadlock could occur.

Instead of going off the prepare notifier, use the PM notifiers that
occur after processes have been frozen to do evictions.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178
Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback 
support")
Signed-off-by: Mario Limonciello <mario.limoncie...@amd.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7f354cd532dc1..cad311b9fd834 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4917,10 +4917,10 @@ static int amdgpu_device_pm_notifier(struct 
notifier_block *nb, unsigned long mo
         int r;

         switch (mode) {
-       case PM_HIBERNATION_PREPARE:
+       case PM_HIBERNATION_POST_FREEZE:
                 adev->in_s4 = true;
                 fallthrough;
-       case PM_SUSPEND_PREPARE:
+       case PM_SUSPEND_POST_FREEZE:
                 r = amdgpu_device_evict_resources(adev);
                 /*
                  * This is considered non-fatal at this time because
--

Why do you need a notifier for this?

It looks like this could be done from amdgpu_device_prepare(), but if
there is a reason why it cannot be done from there, it should be
mentioned in the changelog.

It's actually done in amdgpu_device_prepare() "as well" already, but the reason that it's being done earlier is because swap still needs to be available, especially with heavy memory fragmentation.

I'll add more detail about this to the commit for the next spin if you're relatively happy with the new notifier from the first patch.

Reply via email to