On Wed, May 7, 2025 at 9:17 PM Mario Limonciello <supe...@kernel.org> wrote: > > On 5/7/2025 2:14 PM, Rafael J. Wysocki wrote: > > On Thu, May 1, 2025 at 11:17 PM Mario Limonciello <supe...@kernel.org> > > wrote: > >> > >> From: Mario Limonciello <mario.limoncie...@amd.com> > >> > >> commit 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification > >> callback support") introduced a VRAM eviction earlier in the PM > >> sequences when swap was still available for evicting to. This helped > >> to fix a number of memory pressure related bugs but also exposed a > >> new one. > >> > >> If a userspace process is actively using the GPU when suspend starts > >> then a deadlock could occur. > >> > >> Instead of going off the prepare notifier, use the PM notifiers that > >> occur after processes have been frozen to do evictions. > >> > >> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178 > >> Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback > >> support") > >> Signed-off-by: Mario Limonciello <mario.limoncie...@amd.com> > >> --- > >> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++-- > >> 1 file changed, 2 insertions(+), 2 deletions(-) > >> > >> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> index 7f354cd532dc1..cad311b9fd834 100644 > >> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >> @@ -4917,10 +4917,10 @@ static int amdgpu_device_pm_notifier(struct > >> notifier_block *nb, unsigned long mo > >> int r; > >> > >> switch (mode) { > >> - case PM_HIBERNATION_PREPARE: > >> + case PM_HIBERNATION_POST_FREEZE: > >> adev->in_s4 = true; > >> fallthrough; > >> - case PM_SUSPEND_PREPARE: > >> + case PM_SUSPEND_POST_FREEZE: > >> r = amdgpu_device_evict_resources(adev); > >> /* > >> * This is considered non-fatal at this time because > >> -- > > > > Why do you need a notifier for this? > > > > It looks like this could be done from amdgpu_device_prepare(), but if > > there is a reason why it cannot be done from there, it should be > > mentioned in the changelog. > > It's actually done in amdgpu_device_prepare() "as well" already, but the > reason that it's being done earlier is because swap still needs to be > available, especially with heavy memory fragmentation.
Swap should be still available when amdgpu_device_prepare() runs. > I'll add more detail about this to the commit for the next spin if > you're relatively happy with the new notifier from the first patch. I need to have a look at it, but adding it for just one user seems a bit over the top. I'd prefer to avoid doing this.