On 2025-02-12 23:33, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
*From:*Yang, Philip <philip.y...@amd.com>
*Sent:* Wednesday, February 12, 2025 10:31 PM
*To:* Deng, Emily <emily.d...@amd.com>; Yang, Philip
<philip.y...@amd.com>; Chen, Xiaogang <xiaogang.c...@amd.com>;
amd-gfx@lists.freedesktop.org
*Subject:* Re: [PATCH] drm/amdkfd: Fix the deadlock in
svm_range_restore_work
On 2025-02-12 03:54, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
Ping……
Emily Deng
Best Wishes
*From:*Deng, Emily <emily.d...@amd.com> <mailto:emily.d...@amd.com>
*Sent:* Tuesday, February 11, 2025 8:21 PM
*To:* Deng, Emily <emily.d...@amd.com>
<mailto:emily.d...@amd.com>; Yang, Philip <philip.y...@amd.com>
<mailto:philip.y...@amd.com>; Chen, Xiaogang
<xiaogang.c...@amd.com> <mailto:xiaogang.c...@amd.com>;
amd-gfx@lists.freedesktop.org
*Subject:* RE: [PATCH] drm/amdkfd: Fix the deadlock in
svm_range_restore_work
[AMD Official Use Only - AMD Internal Distribution Only]
Hi Philip,
Upon further consideration,
removing amdgpu_amdkfd_unreserve_mem_limit is challenging because
it is paired
with amdgpu_amdkfd_reserve_mem_limit in svm_migrate_ram_to_vram.
However, this pairing does introduce issues, as it
prevents amdgpu_amdkfd_reserve_mem_limit from accurately detecting
out-of-memory conditions.
Ideally, amdgpu_amdkfd_unreserve_mem_limit should be tied to the
actual freeing of memory. Furthermore,
since ttm_bo_delayed_delete delays the call
to amdgpu_vram_mgr_del, there remains a possibility
that amdgpu_amdkfd_reserve_mem_limit reports sufficient memory,
while a subsequent call to amdgpu_vram_mgr_new fails. For these
reasons, I believe this patch is still necessary.
Emily Deng
Best Wishes
*From:*amd-gfx <amd-gfx-boun...@lists.freedesktop.org> *On Behalf
Of *Deng, Emily
*Sent:* Tuesday, February 11, 2025 6:56 PM
*To:* Yang, Philip <philip.y...@amd.com>; Chen, Xiaogang
<xiaogang.c...@amd.com>; amd-gfx@lists.freedesktop.org
*Subject:* RE: [PATCH] drm/amdkfd: Fix the deadlock in
svm_range_restore_work
[AMD Official Use Only - AMD Internal Distribution Only]
[AMD Official Use Only - AMD Internal Distribution Only]
*From:*Yang, Philip <philip.y...@amd.com>
*Sent:* Tuesday, February 11, 2025 6:54 AM
*To:* Deng, Emily <emily.d...@amd.com>; Chen, Xiaogang
<xiaogang.c...@amd.com>; amd-gfx@lists.freedesktop.org
*Subject:* Re: [PATCH] drm/amdkfd: Fix the deadlock in
svm_range_restore_work
On 2025-02-10 02:51, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
[AMD Official Use Only - AMD Internal Distribution Only]
*From:*Chen, Xiaogang <xiaogang.c...@amd.com>
<mailto:xiaogang.c...@amd.com>
*Sent:* Monday, February 10, 2025 10:18 AM
*To:* Deng, Emily <emily.d...@amd.com>
<mailto:emily.d...@amd.com>; amd-gfx@lists.freedesktop.org
*Subject:* Re: [PATCH] drm/amdkfd: Fix the deadlock in
svm_range_restore_work
On 2/7/2025 9:02 PM, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
[AMD Official Use Only - AMD Internal Distribution Only]
Ping.......
Emily Deng
Best Wishes
-----Original Message-----
From: Emily Deng<emily.d...@amd.com>
<mailto:emily.d...@amd.com>
Sent: Friday, February 7, 2025 6:28 PM
To:amd-gfx@lists.freedesktop.org
Cc: Deng, Emily<emily.d...@amd.com> <mailto:emily.d...@amd.com>
Subject: [PATCH] drm/amdkfd: Fix the deadlock in
svm_range_restore_work
It will hit deadlock in svm_range_restore_work ramdonly.
Detail as below:
1.svm_range_restore_work
->svm_range_list_lock_and_flush_work
->mmap_write_lock
2.svm_range_restore_work
->svm_range_validate_and_map
->amdgpu_vm_update_range
->amdgpu_vm_ptes_update
->amdgpu_vm_pt_alloc
->svm_range_evict_svm_bo_worker
svm_range_evict_svm_bo_worker is a function running by a
kernel task from default system_wq. It is not the task that
runs svm_range_restore_work which is from system_freezable_wq.
The second task may need wait the first task to release
mmap_write_lock, but there is no cycle lock dependency.
Can you explain more how deadlock happened? If a deadlock
exists between two tasks there are should be at least two
locks used by both tasks.
Regards
Xiaogang
In Step 2, during the amdgpu_vm_pt_alloc process, the system
encounters insufficient memory and triggers an eviction. This
initiates the svm_range_evict_svm_bo_worker task, and waits
for the eviction_fence to be signaled. However,
the svm_range_evict_svm_bo_worker cannot acquire
the mmap_read_lock(mm), preventing it from signaling
the eviction_fence. As a result, amdgpu_vm_pt_alloc remains
incomplete and cannot release the mmap_write_lock(mm).
Which means the svm_range_restore_work task holds
the mmap_write_lock(mm) and is stuck waiting for
the eviction_fence to be signaled
by svm_range_evict_svm_bo_worker.
However, svm_range_evict_svm_bo_worker is itself blocked,
unable to acquire the mmap_read_lock(mm). This creates a deadlock.
The deadlock situation should not happen as svm_range_restore_work
is only used for xnack off case, there is no VRAM over commitment
with KFD amdgpu_amdkfd_reserve_mem_limit. We reserved VRAM
ESTIMATE_PT_SIZE for page table allocation to prevent this situation.
Regards,
Philip
Hi Philip,
You're correct. Upon further investigation, the issue arises from
the additional call
to amdgpu_amdkfd_unreserve_mem_limit in svm_migrate_ram_to_vram,
which prevents amdgpu_amdkfd_reserve_mem_limit from detecting the
out-of-memory condition. I will submit another patch to remove
the amdgpu_amdkfd_unreserve_mem_limit call in svm_migrate_ram_to_vram.
We check all SVM memory must fit in system memory, don't account svm
VRAM usage. For xnack off, application should check available VRAM
size and avoid VRAM over commitment.
svm_range_restore_worker ensure all SVM ranges are mapped to GPUs then
resume queues, this is done by taking mmap write lock and flush
deferred_range_list. downgrade to mmap read lock cannot prevent unmap
from CPU as mmu notifier callback can add range to deferred_range_list
again and unmap from GPUs, so this patch can not work.
Maybe I understand wrong. but downgrading to a read lock could also
prevent svm_range_deferred_list_work from acquiring a write lock. As a
result, it could potentially block unmapping operations from GPUs.
no, svm_range_cpu_invalidate_pagetables takes prange lock to split
prange, and add to deferred_list if needed, then unmap from GPU and return.
This needs app fix, not over commitment, prefetch svm ranges to VRAM if
xnack is off.
Regards,
Philip
Emily Deng
Best Wishes
We should not use mmap write lock to sync with mmu notifier, there is
a plan to rework svm locks to fix this.
Regards,
Philip
Emily Deng
Best Wishes
Emily Deng
Best Wishes
->mmap_read_lock(deadlock here, because already get
mmap_write_lock)
How to fix?
Downgrade the write lock to read lock.
Signed-off-by: Emily Deng<emily.d...@amd.com>
<mailto:emily.d...@amd.com>
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index bd3e20d981e0..c907e2de3dde 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -1841,6 +1841,7 @@ static void svm_range_restore_work(struct
work_struct
*work)
mutex_lock(&process_info->lock);
svm_range_list_lock_and_flush_work(svms, mm);
mutex_lock(&svms->lock);
+ mmap_write_downgrade(mm);
evicted_ranges = atomic_read(&svms->evicted_ranges);
@@ -1890,7 +1891,7 @@ static void svm_range_restore_work(struct
work_struct
*work)
out_reschedule:
mutex_unlock(&svms->lock);
- mmap_write_unlock(mm);
+ mmap_read_unlock(mm);
mutex_unlock(&process_info->lock);
/* If validation failed, reschedule another attempt */
--
2.34.1