On 2025-07-08 16:14, Gang Ba wrote:
If vm belongs to another process, this is fclose after fork,
wait may enable signaling KFD eviction fence and cause parent process queue
evicted.
Signed-off-by: Gang Ba
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 7 +++
1 file changed, 7 insertions
On 2025-07-01 03:28, Christian König wrote:
Clear NAK to removing this!
The amdgpu_flush function is vital for correct operation.
no fflush call from libdrm/amdgpu, so amdgpu_flush is only called from
fclose -> filp_flush
The intention is to block closing the file handle in child processes a
On 2025-06-27 01:20, YuanShang Mao (River) wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
Currently, amdgpu_flush is used to prevent new jobs from being submitted in the
same context when a file descriptor is closed and to wait for existing jobs to
complete. Additionally, if
On 2025-06-23 18:18, Chen, Xiaogang wrote:
On 6/23/2025 11:59 AM, Philip Yang wrote:
If the process is exiting, the mmput inside mmu notifier callback from
compactd or fork or numa balancing could release the last reference
of mm struct to call exit_mmap and free_pgtable, this triggers
]
amdgpu_amdkfd_gpuvm_alloc_memory_of_gpu+0x543/0x7d0 [amdgpu]
kfd_ioctl_alloc_memory_of_gpu+0x24c/0x4e0 [amdgpu]
kfd_ioctl+0x29d/0x500 [amdgpu]
Fixes: fa582c6f3684 ("drm/amdkfd: Use mmget_not_zero in MMU notifier")
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 23 +++--
On 2025-06-16 07:43, Jesse Zhang wrote:
This commit makes two key fixes to SDMA v4.4.2 handling:
1. disable UTC_L1 in sdma_cntl register when stopping SDMA engines
by reading the current value before modifying UTC_L1_ENABLE bit.
2. Ensure UTC_L1_ENABLE is consistently managed by:
- Ad
On 2025-06-05 12:11, Amber Lin wrote:
Starting from MEC v97, GC 9.4.2 supports chain runlists of XNACK+/XNACK-
processes.
Signed-off-by: Amber Lin
Reviewed-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 +++
drivers/gpu/drm/amd/amdgpu/gfx_v9_4_2.c | 12
urce after application exit. NULL pointer check is also necessary as
kfd_lookup_process_by_pid() may return NULL pointer if app process/task
is already destroyed.
Regards,
Philip
-Original Message-
From: amd-gfx On Behalf Of Philip Yang
Sent: Tuesday, May 27, 2025 11:35 AM
T
On 2025-05-28 13:19, James Zhu wrote:
to get migration pages. When migrating pages from system to vram,
needn't check bit MIGRATE_PFN_VALID, since the system page could
be allocated, but not be accessed.
I think the corner case is vram_pages becomes negative value when
migrating prange from
On 2025-05-28 13:19, James Zhu wrote:
upages is assigned under cpages = 0, so it isn't really used in this function.
Signed-off-by: James Zhu
Reviewed-by: Philip.Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkf
kfd_lookup_process_by_pid increases process ref, the refcount is
leaking.
Fixes: 7a566d7f56f4 ("amd/amdkfd: Trigger segfault for early userptr
unmmapping")
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 9 +++--
1 file changed, 7 insert
On 2025-05-21 06:12, Yifan Zhang wrote:
This patch is to fix a kfd_prcess ref leak.
Signed-off-by: Yifan Zhang
Reviewed-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c
b/drivers/gpu
On 2025-05-15 17:31, Chen, Xiaogang wrote:
On 5/15/2025 3:45 PM, Philip Yang wrote:
On 2025-05-15 10:29, Chen, Xiaogang wrote:
Does this patch fix a bug or just make code look more reasonable?
kfd_process_destroy_pdds releases pdd related buffers, not related
to operations on vm. So vm
ently, as fput(pdd->drm_file) to
free vm is right between free vm mapping qpd->cwsr_mem, qpd->ib_mem and
free kernel bo qpd->proc_doorbells, pdd->proc_ctx_bo, to make it clear
for future change.
Regards,
Philip
Regards
Xiaogang
On 5/14/2025 12:10 PM, Philip Yang wrote:
Relea
On 2025-05-15 10:40, Chen, Xiaogang wrote:
On 5/14/2025 12:10 PM, Philip Yang wrote:
Move vm root bo unreserve after vm->va mapping free because we should
hold vm lock to access vm->va.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8
1 file chan
Release pdd->drm_file may free the vm if this is the last reference,
move it to the last step after memory is unmapped.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 10 +++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/
g.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c
index 3939761be31c..d45ebfb642ca 100644
--- a/drivers/gpu/drm/
This series fix the dmesg error message "still active bo inside vm" and
2 potential races when process exit and vm cleanup.
Philip Yang (3):
drm/amdgpu: seq64 memory unmap uses uninterruptible lock
drm/amdgpu: amdgpu_vm_fini hold vm lock to access vm->va
drm/amdkfd: destroy_pdd
Move vm root bo unreserve after vm->va mapping free because we should
hold vm lock to access vm->va.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_v
+0x217/0x3c0
do_group_exit+0x3b/0xb0
get_signal+0x14a/0x8d0
arch_do_signal_or_restart+0xde/0x100
exit_to_user_mode_loop+0xc1/0x1a0
exit_to_user_mode_prepare+0xf4/0x100
syscall_exit_to_user_mode+0x17/0x40
do_syscall_64+0x69/0xc0
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd
On 2025-03-21 19:35, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
[AMD Official Use Only - AMD Internal Distribution Only]
-Original Message-
From: Lazar, Lijo
Sent: Friday, March 21, 2025 7:06 PM
To: Deng, Emily ; amd-gfx@lists.freedesktop.org
Subject
On 2025-03-03 19:44, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
[AMD Official Use Only - AMD Internal Distribution Only]
Ping..
Emily Deng
Best Wishes
-Original Message-
From: Emily Deng
Sent: Monday, March 3, 2025 5:35 PM
To: amd-gfx@lists.f
idate queue cwsr area and eop
buffer size")
This patch is
Reviewed-by: Philip Yang
Signed-off-by: Andrew Martin
---
drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_queue.c
b/drivers/gpu/drm/
On 2025-02-25 21:41, David Yat Sin wrote:
If queue size is less than minimum, clamp it to minimum to prevent
underflow when writing queue mqd.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 4
include/uapi/linux/kfd_ioctl.h | 2 ++
2 files chang
reset_domain->wq to ensure ongoing mode1 reset is done or user queues are
evicted, then free outstanding BOs.
Philip Yang (5):
drm/amdkfd: Remove kfd_process_hw_exception worker
drm/amdkfd: KFD release_work possible circular locking
drm/amdkfd: Fix mode1 reset crash issue
drm/amdkfd:
With GPU reset-domain worker implemented, KFD hw_exception worker is not
needed any more, just call amdgpu_amdkfd_gpu_reset directly from
kfd_hws_hang.
Suggested-by: Felix Kuehling
Signed-off-by: Philip Yang
Reviewed-by: Lijo Lazar
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
pletion)(&p->release_work));
lock((wq_completion)amdgpu-reset-dev);
To fix this, KFD create process move flush release work outside
kfd_process_mutex.
Signed-off-by: Philip Yang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16
1 f
free outstanding BOs.
Signed-off-by: Philip Yang
Reviewed-by: Lijo Lazar
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 17 +
1 file changed, 17 insertions(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 2715ca53e9da
debugfs hang_hws is used by GPU reset test with HWS, for MES this crash
the kernel with NULL pointer access because dqm->packet_mgr is not setup
for MES path.
Skip GPU with MES for now, MES hang_hws debugfs interface will be
supported later.
Signed-off-by: Philip Yang
Reviewed-by: Kent Russ
If GPU in reset, destroy_queue return -EIO, pqm_destroy_queue should
delete the queue from process_queue_list and free the resource.
Signed-off-by: Philip Yang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 2 +-
1 file changed, 1 insertion(+), 1
On 2025-02-20 06:59, Emily Deng wrote:
Call amdgpu_amdkfd_reserve_mem_limit in svm_range_vram_node_new when
creating a new SVM BO. Call amdgpu_amdkfd_unreserve_mem_limit
in svm_range_bo_release when the SVM BO is deleted.
Signed-off-by: Emily Deng
---
drivers/gpu/drm/amd/amdkfd/kfd_migrate.
On 2025-02-18 12:24, David Yat Sin wrote:
When userspace applications call AMDKFD_IOC_UPDATE_QUEUE. Preserve
bitfields that do not need to be modified as they contain flags to track
queue states that are used by CP FW.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_mqd_mana
954] RDX: RSI: RDI: 01200011
[ 223.426965] RBP: R08: R09:
[ 223.426975] R10: 7f4675e81a50 R11: 0246 R12: 0001
[ 223.426986] R13: 7fff5c3e5470 R14: 7fff5c3e53e0 R15: 7f
On 2025-02-12 23:33, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
*From:*Yang, Philip
*Sent:* Wednesday, February 12, 2025 10:31 PM
*To:* Deng, Emily ; Yang, Philip
; Chen, Xiaogang ;
amd-gfx@lists.freedesktop.org
*Subject:* Re: [PATCH] drm/amdkfd: Fix the de
On 2025-02-12 17:42, Uwe Kleine-König
wrote:
#regzbot introduced: 68e599db7a549f010a329515f3508d8a8c3467a4
#regzbot monitor: https://bugs.debian.org/1093124
Hello,
On Thu, Jul 18, 2024 at 05:05:53PM -0400, Philip Yang wrote:
Find user queue
ping...
On 2025-01-29 19:04, Philip Yang wrote:
To workaround queue full h/w issue on Gfx7/8, when application create
AQL queues, the ring buffer bo allocate size is queue_size/2 and
mapped to GPU twice using 2 attachments with same ring_bo backing
memory.
For
On 2025-02-12 03:54, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
Ping……
Emily
Deng
Best
Wishes
On 2025-02-11 05:34, Christian König
wrote:
Am 20.01.25 um 16:59 schrieb Philip Yang:
On 2025-01-15 06:01, Christian König wrote:
Am 14.01.25 um 15:53 schrieb Philip
Yang
On 2025-02-10 02:51, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
[AMD Official Use Only - AMD Internal Distribution Only]
loop
Modified function names based on review comments.
Signed-off-by: Harish Kasiviswanathan
with one nitpick fixed, this patch is
Reviewed-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c | 25 ++
drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
On 2025-02-07 05:17, Christian König
wrote:
Am
30.01.25 um 17:19 schrieb Philip Yang:
On 2025-01-29 11:40, Christian König
wrote:
Am 23.01.25 um 21:39 schrieb Philip
Yang:
SVM migration
On 2025-02-05 22:07, Kasiviswanathan,
Harish wrote:
[Public]
From:
Yang, Philip
S
On 2025-02-04 18:02, Harish
Kasiviswanathan wrote:
SDMA writes has to probe invalidate RW lines. Set snoop bit in mmhub for
this to happen.
Signed-off-by: Harish Kasiviswanathan
---
drivers/gpu/drm/amd/amdgpu/mmhub_v1_7.c | 25 ++
drivers/gpu
On 2025-01-30 15:51, Alex Deucher
wrote:
If the user has configured a large carveout on a small APU,
only use GTT for VRAM allocations if GTT is larger than
VRAM.
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 --
1 file ch
On 2025-01-29 11:40, Christian König
wrote:
Am
23.01.25 um 21:39 schrieb Philip Yang:
SVM migration unmap pages from GPU and
then update mapping to GPU to
recover page fault. Currently unmap clears the PDE entry for
allocation and mapping size.
Fixes: 68e599db7a54 ("drm/amdkfd: Validate user queue buffers")
Suggested-by: Tomáš Trnka
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 12 +++-
1 file changed, 11 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/
GPU
performance. Update mapping to huge page will still free the PTB bo.
With this change, the vm->pt_freed list and work is not needed. Add
WARN_ON(unlocked) in amdgpu_vm_pt_add_list to catch if unmap to free the
PTB.
v2: Limit update fragment size, not hack entry_end (Christian)
Signed-off-by:
On 2025-01-15 16:40, Xiaogang.Chen
wrote:
From: Xiaogang Chen
Current svm_migrate_copy_to_vram handles sys pages(src) and dst pages (vram)
discontinuation in different way. When src got discontinuity migrates j pages
that ith page is not migrated; When dst
On 2025-01-15 06:01, Christian König
wrote:
Am
14.01.25 um 15:53 schrieb Philip Yang:
SVM migration unmap pages from GPU and
then update mapping to GPU to
recover page fault. Currently unmap clears the PDE entry for
the PTB bo.
With this change, the vm->pt_freed list and work is not needed. Add
WARN_ON(unlocked) in amdgpu_vm_pt_free_dfs to catch if unmap to free the
PTB.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c| 4 ---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h| 4 ---
d
On 2025-01-10 11:23, Chen, Xiaogang
wrote:
On 1/10/2025 8:37 AM, Philip Yang wrote:
On 2025-01-10 02:49, Emily Deng wrote:
For partial migrate from ram to vram,
the migrate->cpages
On 2025-01-09 12:14, Felix Kuehling
wrote:
On 2025-01-08 20:11, Philip Yang wrote:
On 2025-01-07 22:08, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal
On 2025-01-10 09:25, Emily Deng wrote:
For partial migrate from ram to vram, the migrate->cpages is not
equal to migrate->npages, should use migrate->npages to check all needed
migrate pages which could be copied or not.
And only need to set those pages could be m
hat fixed, this patch
is
Reviewed-by: Philip Yang
Signed-off-by: Emily Deng
---
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 17 ++---
1 file changed, 10 insertions(+), 7 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu
On 2025-01-07 22:08, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
Hi
Philip,
It
still has the deadlock, maybe the best way is
On 2025-01-08 08:19, Emily Deng wrote:
For partial migrate from ram to vram, the migrate->cpages is not
equal to migrate->npages, should use migrate->npages to check all needed
migrate pages which could be copied or not.
And only need to set those pages could be m
On 2025-01-07 19:31, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
From:
Yan
On 2025-01-07 10:50, Chen, Xiaogang
wrote:
On 1/6/2025 8:02 PM, Deng, Emily
wrote:
[AMD Official Use Only - AMD Internal
Distribution Only]
On 2025-01-07 07:30, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
Hi Felix,
You are right, it is easily to hit deadlock, don't know why LOCKDEP doesn't catch this. Need to find another solution.
Hi Philip,
Do you have a sol
On 2025-01-06 21:31, Deng, Emily wrote:
[AMD Official Use Only - AMD Internal Distribution Only]
From:
Yang, Philip
On 2025-01-02 19:06, Emily Deng wrote:
For partial migrate from ram to vram, the migrate->cpages is not
equal to migrate->npages, should use migrate->npages to check all needed
migrate pages which could be copied or not.
And only need to set those pages could be m
*_ih.c except ASICs older than Vega which has only one ih ring.
Signed-off-by: Philip Yang
Reviewed-by: Christian König
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 6 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 1 +
drivers/gpu/drm/amd/amdgpu/navi10_ih.c
then driver process
the first event interrupt, set_event and event slot is auto-reset, then
for the second event interrupt, KFD goes to slow path as event is not
signaled, just drop the second event interrupt because the application
only need wakeup once.
Signed-off-by: Philip Yang
Reviewed-by:
handle the gfx v9 path, cover retry on/off and CAM filter
on/off cases.
Signed-off-by: Philip Yang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 10
drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 4 ++
drivers/gpu/drm/amd/amdkfd/kfd_device.c| 67
queue with number of workers equals to
number of partitions, let queue_work select the next CPU round robin
among the local CPUs of same NUMA.
Signed-off-by: Philip Yang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c| 25 --
drivers/gpu/drm/amd
-by: Philip Yang
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 8 +---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index 4c8308b2878b..56507ae919b0 100644
--- a
To handle 4 to 8 interrupts per second running CPX mode with 4
streams/queues per KFD node, KFD interrupt handler becomes the
performance bottleneck.
Remove the kfifo_out memcpy overhead by accessing ih_fifo data in-place
and updating rptr with kfifo_skip_count.
Signed-off-by: Philip
On 2024-10-21 04:12, Christian König
wrote:
Am
18.10.24 um 23:59 schrieb Philip Yang:
On 2024-10-18 14:28, Felix Kuehling wrote:
On 2024-10-17 04:34, Victor Zhao wrote:
make sure
On 2024-10-21 13:46, Alex Deucher
wrote:
This reverts commit a3ab2d45b9887ee609cd3bea39f668236935774c.
The userspace side for this code is not ready yet so revert
for now.
Signed-off-by: Alex Deucher
Cc: Philip Yang
Reviewed-by: Philip Yang
On 2024-10-18 14:28, Felix Kuehling
wrote:
On 2024-10-17 04:34, Victor Zhao wrote:
make sure KFD_FENCE_INIT write to
fence_addr before pm_send_query_status
called, to avoid qcm fence timeout caused by incorrect ord
It is safe to access dqm->sched status inside dqm_lock, no
race with gpu reset.
Reviewed-by: Philip Yang
On 2024-10-18 11:10, Shaoyun Liu wrote:
From: shaoyunl
Add back kfd queues in start scheduling that originally been
removed on stop schedul
ordering.
Signed-off-by: Victor Zhao
Reviewed-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +-
2 files changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd
On 2024-10-17 12:12, Shaoyun Liu wrote:
From: shaoyunl
Add back kfd queues in start scheduling that originally been
removed on stop scheduling.
Signed-off-by: Shaoyun Liu
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 40 +--
1 file change
pe because it is updated outside process mutex
now.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++---
drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++--
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 26 +++
Drop this patch series, as Felix pointed out, the forked process
takes svm_bo device pages ref, svm_bo->pdd could refer to the
process that doesn't exist any more.
Regards,
Philip
On 2024-10-11 11:00, Philip Yang wrote:
KFD process device
c64_t because it is updated outside process mutex
now.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++---
drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++--
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 2 ++
4
KFD process device data pdd will be used for VRAM usage accounting, save
pdd to svm_bo to avoid searching pdd for every accounting, and get KFD
node from pdd->dev.
svm_bo->pdd will always be valid because KFD process release free all
svm_bo first, then destroy process pdds.
Signed-off-by:
On 2024-10-09 17:20, Felix Kuehling
wrote:
On 2024-10-04 16:28, Philip Yang wrote:
Per process device data pdd->vram_usage
is used by rocm-smi to report
VRAM usage, this is currently missing the svm_bo us
updated outside process mutex now.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 6 +++---
drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 4 ++--
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 22 ++
4 f
get_wave_state is not defined for sdma queue, copy_context_work_handler
calls it for sdma queue will crash.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 5 -
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdkfd
On 2024-09-11 02:54, Christian König
wrote:
Yeah, I completely agree with Xiaogang.
The PASID is an identifier of an address space. And the idea of
the KFD was that we can just use the same address space and with
it the page ta
On 2024-09-09 14:46, Christian König
wrote:
Am
09.09.24 um 18:02 schrieb Kim, Jonathan:
[Public]
-Original Message-
From: Christian König
Sent: Thursday, September 5, 202
ff-by: Ramesh Errabolu
With 2 below nitpicks fixed, this patch is
Reviewed-by: Philip Yang
change subject to "drm/amdkfd: Add svm_default_granularity module
parameter"
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
drivers/gpu/drm/amd/amdgpu/am
leased before ttm_resource_manager is finilized, drain the workqueue in ttm_device.
v2: move drain_workqueue to amdgpu_ttm.c
Fixes:d99fbd9aab62 ("drm/ttm: Always take the bo delayed cleanup path for imported bos")
Suggested-by: Christian König
Signed-off-by: Asher Song
Acked-by: Ph
On 2024-09-02 05:06, Christian König
wrote:
Am
02.09.24 um 05:03 schrieb Lang Yu:
Fixes: 5a1c27951966 ("drm/amdgpu:
implement TLB flush fence")
Signed-off-by: Lang Yu
Ah yes, that exp
On 2024-08-29 18:31, Chen, Xiaogang
wrote:
On 8/29/2024 5:13 PM, Ramesh Errabolu wrote:
Caution: This message originated from an
External Source. Use proper caution when opening attachments,
clicking links, or respon
On 2024-08-29 17:15, Felix Kuehling
wrote:
On
2024-08-23 15:49, Philip Yang wrote:
If GPU reset kick in while KFD
restore_process_worker running, this may
causes different issues, for example below rcu stall warning
On 2024-08-28 18:01, Felix Kuehling
wrote:
On 2024-08-23 15:49, Philip Yang wrote:
If GPU reset kick in while KFD
restore_process_worker running, this may
causes different issues, for example below rcu stall
On 2024-08-26 15:34, Ramesh Errabolu
wrote:
Enables users to update the default size of buffer used
in migration either from Sysmem to VRAM or vice versa.
The param GOBM refers to granularity of buffer migration,
and is specified in terms of log(numPages(buff
.
v3:
Simplify event drop count handling (James Zhu)
Philip Yang (4):
drm/amdkfd: Document and define SVM events message macro
drm/amdkfd: Output migrate end event if migrate failed
drm/amdkfd: Increase SMI event fifo size
drm/amdkfd: SMI report dropped event count
drivers/gpu/drm/amd
and reset drop count to zero.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 25 +
include/uapi/linux/kfd_ioctl.h | 6 +
2 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
b
and reset drop count to zero.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 25 +
include/uapi/linux/kfd_ioctl.h | 6 +
2 files changed, 27 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
b
future.
No functional changes.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 45 +
include/uapi/linux/kfd_ioctl.h | 100 +---
2 files changed, 109 insertions(+), 36 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd
prefix to the macro name.
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c
index 1d94b445a060..9b8169761ec5
If page migration failed, also output migrate end event to match with
migrate start event, with failure error_code added to the end of the
migrate message macro. This will not break uAPI because application uses
old message macro sscanf drop and ignore the error_code.
Signed-off-by: Philip Yang
]
Call Trace:
update_process_times+0x94/0xd0
RIP: 0010:amdgpu_vm_handle_moved+0x9a/0x210 [amdgpu]
amdgpu_amdkfd_gpuvm_restore_process_bos+0x3d6/0x7d0 [amdgpu]
restore_process_helper+0x27/0x80 [amdgpu]
Signed-off-by: Philip Yang
---
drivers/gpu/drm/amd/amdkfd/kfd_process.c | 56
On 2024-08-22 10:34, James Zhu wrote:
On 2024-07-30 16:15, Philip Yang wrote:
SMI event fifo size 1KB was enough to
report GPU vm fault or reset
[JZ] There is a typo here. it should be NOT enough
On 2024-08-22 10:32, James Zhu wrote:
On 2024-07-30 16:15, Philip Yang
wrote:
Document how to use SMI system management interface to enable and
receive SVM events. Document SVM event triggers.
Define SVM events message
page faults at deferred
work. So, the time period that kfd does not handle page faults is reduced
and can be controlled.
Signed-off-by: Xiaogang.Chen
Some nitpicks below.
This patch is Reviewed-by: Philip Yang
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 4 +-
drivers/gpu
On 2024-08-21 19:22, Ramesh Errabolu
wrote:
KFD's design of unified memory (UM) does not allow users to
configure the size of buffer used in migrating buffer either
from Sysmem to VRAM or vice versa.
This is not true, app can change range granularit
1 - 100 of 772 matches
Mail list logo