On 5/7/2025 2:39 PM, Rafael J. Wysocki wrote:
On Wed, May 7, 2025 at 9:17 PM Mario Limonciello <supe...@kernel.org> wrote:
On 5/7/2025 2:14 PM, Rafael J. Wysocki wrote:
On Thu, May 1, 2025 at 11:17 PM Mario Limonciello <supe...@kernel.org> wrote:
From: Mario Limonciello <mario.limoncie...@amd.com>
commit 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification
callback support") introduced a VRAM eviction earlier in the PM
sequences when swap was still available for evicting to. This helped
to fix a number of memory pressure related bugs but also exposed a
new one.
If a userspace process is actively using the GPU when suspend starts
then a deadlock could occur.
Instead of going off the prepare notifier, use the PM notifiers that
occur after processes have been frozen to do evictions.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178
Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback
support")
Signed-off-by: Mario Limonciello <mario.limoncie...@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 7f354cd532dc1..cad311b9fd834 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4917,10 +4917,10 @@ static int amdgpu_device_pm_notifier(struct
notifier_block *nb, unsigned long mo
int r;
switch (mode) {
- case PM_HIBERNATION_PREPARE:
+ case PM_HIBERNATION_POST_FREEZE:
adev->in_s4 = true;
fallthrough;
- case PM_SUSPEND_PREPARE:
+ case PM_SUSPEND_POST_FREEZE:
r = amdgpu_device_evict_resources(adev);
/*
* This is considered non-fatal at this time because
--
Why do you need a notifier for this?
It looks like this could be done from amdgpu_device_prepare(), but if
there is a reason why it cannot be done from there, it should be
mentioned in the changelog.
It's actually done in amdgpu_device_prepare() "as well" already, but the
reason that it's being done earlier is because swap still needs to be
available, especially with heavy memory fragmentation.
Swap should be still available when amdgpu_device_prepare() runs.
No; it's not. The basic call trace (for suspend) looks like this:
enter_state(state) {
suspend_prepare(state);
...
pm_restrict_gfp_mask(); // disable swap
suspend_devices_and_enter(state) → dpm_suspend_start() {
dpm_prepare() {
amdgpu_pmops_prepare();
}
dpm_suspend() {
amdgpu_pmops_suspend();
}
}
}
If the intention was for it to be available, it would be better to move
the pm_restrict_gfp_mask() call "into" suspend_devices_and_enter()
between dpm_prepare() and dpm_suspend() calls.
I'll add more detail about this to the commit for the next spin if
you're relatively happy with the new notifier from the first patch.
I need to have a look at it, but adding it for just one user seems a
bit over the top. I'd prefer to avoid doing this.