[PATCH] drm/amdkfd: set uuid for each partition in topology

2025-08-07 Thread Eric Huang
Currently each kfd compute partition/node is sharing the same uuid of AID, which doen't meet the CUDA spec for visible device, so corresponding XCD id for each partition in smu has been assigned to xcp, and exposed to kfd topology. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/a

[PATCH] drm/amdkfd: change error to warning message for SDMA queues creation

2025-05-02 Thread Eric Huang
SDMA doesn't support oversubsciption, it is the user matter to create queues over HW limit, but not supposed to be a KFD error. Signed-off-by: Eric Huang --- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 14 -- .../gpu/drm/amd/amdkfd/kfd_process_queue_manager.c

[PATCH] drm/amdkfd: add pasid debugfs entries

2025-04-24 Thread Eric Huang
the entries will be appearing at /sys/kernel/debug/kfd/proc//pasid_. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_debugfs.c | 77 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 5 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 3 + 3 files changed, 85

Re: [PATCH next] drm/amdkfd: Fix kfd_smi_event_process()

2025-04-15 Thread Eric Huang
Thanks for the fix, I had the same patch submitted yesterday. Regards, Eric On 2025-04-15 06:44, Dan Carpenter wrote: The "pdd->drm_priv" NULL check is reversed so it will lead to a NULL dereference on the next line. Fixes: 4172b556fd5b ("drm/amdkfd: add smi events for process start and end")

[PATCH 2/2] drm/amdkfd: fix a bug of smi event for superuser

2025-04-14 Thread Eric Huang
ill fix the issue. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c index c27fd7aec1c3..83d9384ac815 100644

[PATCH 1/2] drm/amdkfd: fix NULL check mistake for process smi event

2025-04-14 Thread Eric Huang
The mistake will lead to NULL kernel oops, so fix it. Fixes: 56ed4241e9fe ("drm/amdkfd: add smi events for process start and end") Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/g

Re: [PATCH] drm/amdkfd: add smi events for process start and end

2025-04-11 Thread Eric Huang
Ping ... On 2025-04-07 16:52, Eric Huang wrote: rocm-smi will be able to show the events for KFD process start/end, it is the implementation of this feature. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_process.c| 4 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c

[PATCH] drm/amdkfd: add smi events for process start and end

2025-04-07 Thread Eric Huang
rocm-smi will be able to show the events for KFD process start/end, it is the implementation of this feature. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_process.c| 4 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 21 + drivers/gpu/drm/amd/amdkfd

Re: [PATCH] drm/amdkfd: increase max number of queues per process

2025-03-24 Thread Eric Huang
On 2025-03-24 17:21, Alex Deucher wrote: On Mon, Mar 24, 2025 at 5:07 PM Eric Huang wrote: On 2025-03-24 15:32, Alex Deucher wrote: On Mon, Mar 24, 2025 at 1:26 PM Eric Huang wrote: kfdtest KFDQMTest.OverSubscribeCpQueues with multiple gpu mode fails on gfx v9.4.3+NPS4+CPX which has 64

Re: [PATCH] drm/amdkfd: increase max number of queues per process

2025-03-24 Thread Eric Huang
On 2025-03-24 15:32, Alex Deucher wrote: On Mon, Mar 24, 2025 at 1:26 PM Eric Huang wrote: kfdtest KFDQMTest.OverSubscribeCpQueues with multiple gpu mode fails on gfx v9.4.3+NPS4+CPX which has 64 gpu nodes, the queues created are 65x64=4160, but the number 1024 0f

[PATCH] drm/amdkfd: increase max number of queues per process

2025-03-24 Thread Eric Huang
nubmer will make the test passed. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h index f6aedf69c644..054a78207ffe 100644 --- a

[PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-07 Thread Eric Huang
In some ASICs L2 cache info may miss in kfd topology, because the first bitmap may be empty, that means the first cu may be inactive, so to find the first active cu will solve the issue. v2: Only find the first active cu in the first xcc Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd

Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-07 Thread Eric Huang
On 2025-02-06 22:41, Lazar, Lijo wrote: On 2/6/2025 10:18 PM, Eric Huang wrote: I understand your concern. KFD currently only reports one L2 instance, but not every L2 instance. If customers want to have more detail in all available L2 info, we probably can change the logic in this function

Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-06 Thread Eric Huang
, Lijo ; amd-gfx@lists.freedesktop.org *Subject:* Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology On 2025-02-06 10:14, Lazar, Lijo wrote: > > On 1/29/2025 8:50 PM, Eric Huang wrote: >> In some ASICs L2 cache info may miss in kfd topology, >> because the first b

Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-06 Thread Eric Huang
On 2025-02-06 10:14, Lazar, Lijo wrote: On 1/29/2025 8:50 PM, Eric Huang wrote: In some ASICs L2 cache info may miss in kfd topology, because the first bitmap may be empty, that means the first cu may be inactive, so to find the first active cu will solve the issue. Signed-off-by: Eric

Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-06 Thread Eric Huang
Ping .. On 2025-01-29 10:20, Eric Huang wrote: In some ASICs L2 cache info may miss in kfd topology, because the first bitmap may be empty, that means the first cu may be inactive, so to find the first active cu will solve the issue. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd

[PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-01-29 Thread Eric Huang
In some ASICs L2 cache info may miss in kfd topology, because the first bitmap may be empty, that means the first cu may be inactive, so to find the first active cu will solve the issue. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 18 -- 1 file

[PATCH] drm/amdgpu: fix NULL pointer in amdgpu_reset_get_desc

2024-06-06 Thread Eric Huang
amdgpu_job_ring may return NULL, which causes kernel NULL pointer error, using another way to print ring name instead of ring->name. Suggested-by: Lijo Lazar Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 6 ++ 1 file changed, 2 insertions(+), 4 deleti

[PATCH] drm/amdgpu: add reset source in various cases

2024-06-04 Thread Eric Huang
To fullfill the reset event description. Suggested-by: Lijo Lazar Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 1 + 3 files changed, 3 insertions(+) diff --git a

Re: [PATCH 2/2] drm/amdkfd: add reset cause in gpu pre-reset smi event

2024-06-04 Thread Eric Huang
Thanks for your review Lijo, I will send a patch with reset source in another places. Regards, Eric On 2024-06-04 03:26, Lazar, Lijo wrote: On 6/3/2024 11:42 PM, Eric Huang wrote: reset cause is requested by customer as additional info for gpu reset smi event. v2: integerate reset sources

[PATCH 2/2] drm/amdkfd: add reset cause in gpu pre-reset smi event

2024-06-03 Thread Eric Huang
reset cause is requested by customer as additional info for gpu reset smi event. v2: integerate reset sources suggested by Lijo Lazar Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 3 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 10 +++--- drivers/gpu/drm

[PATCH 1/2] drm/amdgpu: add reset sources in gpu reset context

2024-06-03 Thread Eric Huang
reset source or reset cause is very useful info for reset context, it will be used by events API. Suggested-by: Lijo Lazar Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 34 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_reset.h | 13 + 2 files

Re: [PATCH] drm/amdkfd: add reset cause in gpu pre-reset smi event

2024-06-03 Thread Eric Huang
- From: amd-gfx On Behalf Of Eric Huang Sent: Friday, May 31, 2024 8:38 PM To: amd-gfx@lists.freedesktop.org Cc: Kasiviswanathan, Harish ; Huang, JinHuiEric Subject: [PATCH] drm/amdkfd: add reset cause in gpu pre-reset smi event reset cause is requested by customer as additional info for gpu

[PATCH] drm/amdkfd: add reset cause in gpu pre-reset smi event

2024-05-31 Thread Eric Huang
reset cause is requested by customer as additional info for gpu reset smi event. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c| 34 + drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h| 17 +++-- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 9 ++- drivers

[PATCH] drm/amdkfd: fix TLB flush after unmap for GFX9.4.2

2024-03-20 Thread Eric Huang
TLB flush after unmap accidentially was removed on gfx9.4.2. It is to add it back. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd

[PATCH] amd/amdkfd: remove unused parameter

2024-02-28 Thread Eric Huang
The adev can be found from bo by amdgpu_ttm_adev(bo->tbo.bdev), and adev is also not used in the function amdgpu_amdkfd_map_gtt_bo_to_gart(). Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 3 +-- driv

Re: [PATCH] drm/amdkfd: only flush mes process context if mes support is there

2023-12-14 Thread Eric Huang
Signed-off-by: Jonathan Kim Reviewed-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c b/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manag

[PATCH] drm/amdkfd: fix NULL ptr for debugger mes flush on non-mes asics

2023-12-14 Thread Eric Huang
The field adev->mes.funcs is NULL in function amdgpu_mes_flush_shader_debugger on non-mes asics, add mes enabling check for call this func to resolve the error. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 3 ++- 1 file changed, 2 insertions(+)

Re: [PATCH] drm/amdkfd: fix mes set shader debugger process management

2023-12-12 Thread Eric Huang
MES on process termination. Note that the flush call and the MES debugger calls use the same MES interface but are separated as KFD calls to avoid conflicting with each other. Signed-off-by: Jonathan Kim Tested-by: Alice Wong Reviewed-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c

Re: [PATCH] drm/amdkfd: Copy HW exception data to user event

2023-11-17 Thread Eric Huang
On 2023-11-17 00:20, David Yat Sin wrote: Fixes issue where user events of type KFD_EVENT_TYPE_HW_EXCEPTION do not have valid data Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_e

Re: [PATCH] drm/amdkfd: Fix a race condition of vram buffer unref in svm code

2023-09-27 Thread Eric Huang
On 2023-09-26 23:00, Xiaogang.Chen wrote: From: Xiaogang Chen prange->svm_bo unref can happen in both mmu callback and a callback after migrate to system ram. Both are async call in different tasks. Sync svm_bo unref operation to avoid random "use-after-free". Signed-off-by: Xiaogang.Chen -

Re: [PATCH] drm/amdkfd: fix add queue process context clear without runtime enable

2023-09-14 Thread Eric Huang
that do not support the current exception handling and running KFD tests. The only time ADD_QUEUE.skip_process_ctx_clear is required is for debugger use cases where a debugged process is always runtime enabled when adding a queue. Tested-by: Shikai Guo Signed-off-by: Jonathan Kim Reviewed-by: Eric

Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-11 Thread Eric Huang
On 2023-08-11 09:26, Felix Kuehling wrote: Am 2023-08-10 um 18:27 schrieb Eric Huang: There is not UNMAP_QUEUES command sending for queue preemption because the queue is suspended and test is closed to the end. Function unmap_queue_cpsch will do nothing after that. How do you suspend queues

Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
7;s debugger upstreaming patch series? Or did this come later? This patch only enables the workaround for v9.4.2. Regards, Felix On 2023-08-10 17:52, Eric Huang wrote: The problem is the queue is suspended before clearing address watch call in KFD, there is not queue preemption and queue resu

Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
rt of Jon's debugger upstreaming patch series? Or did this come later? This patch only enables the workaround for v9.4.2. Regards,   Felix On 2023-08-10 17:52, Eric Huang wrote: The problem is the queue is suspended before clearing address watch call in KFD, there is not queue preemption

Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
lowing apps. So the solution is to clear the register as gfx v9 in KFD. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/driv

Re: [PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
Kuehling On 2023-08-10 16:47, Eric Huang wrote: KFD currently relies on MEC FW to clear tcp watch control register by sending MAP_PROCESS packet with 0 of field tcp_watch_cntl to HWS, but if the queue is suspended, the packet will not be sent and the previous value will be left on the register

[PATCH] drm/amdkfd: fix address watch clearing bug for gfx v9.4.2

2023-08-10 Thread Eric Huang
solution is to clear the register as gfx v9 in KFD. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +--- 1 file changed, 1 insertion(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu

Re: [PATCH] drm/amdkfd: fix and enable ttmp setup for gfx11

2023-07-25 Thread Eric Huang
in a safe manner. Signed-off-by: Jonathan Kim Reviewed-by: Eric Huang Regards, Eric --- .../drm/amd/amdgpu/amdgpu_amdkfd_gfx_v11.c| 2 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 13 - drivers/gpu/drm/amd/amdkfd/kfd_debug.c| 19 +-- driver

[PATCH] drm/amdgpu: enable trap of each kfd vmid for gfx v9.4.3

2023-07-25 Thread Eric Huang
To setup ttmp on as default for gfx v9.4.3 in IP hw init. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c index 86a84a0970f0

Re: [PATCH] drm/amdkfd: enable grace period for xcp instance

2023-07-11 Thread Eric Huang
On 2023-07-11 14:38, Felix Kuehling wrote: On 2023-07-11 10:28, Eric Huang wrote: Read/write grace period from/to first xcc instance of xcp in kfd node. Signed-off-by: Eric Huang ---   .../drm/amd/amdkfd/kfd_device_queue_manager.c | 21 ---   .../drm/amd/amdkfd

[PATCH] drm/amdkfd: enable grace period for xcp instance

2023-07-11 Thread Eric Huang
Read/write grace period from/to first xcc instance of xcp in kfd node. Signed-off-by: Eric Huang --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 21 --- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 8 --- 3 files

Re: [PATCH] drm/amdkfd: enable grace period for xcp instance

2023-07-10 Thread Eric Huang
OK. Mukul, I will resend this patch based on top of yours. Regards, Eric On 2023-07-10 18:24, Joshi, Mukul wrote: [AMD Official Use Only - General] -Original Message- From: amd-gfx On Behalf Of Eric Huang Sent: Monday, July 10, 2023 3:46 PM To: amd-gfx@lists.freedesktop.org Cc

[PATCH] drm/amdkfd: enable grace period for xcp instance

2023-07-10 Thread Eric Huang
Read/write grace period from/to first xcc instance of xcp in kfd node. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c| 10

Re: [PATCH 1/4] drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3

2023-07-07 Thread Eric Huang
d the difference for GC v9.4.3 HW spec, i.e. xcc instance. Signed-off-by: Jonathan Kim Signed-off-by: Eric Huang ---  .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   8 +-  .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h  |  27 +++  .../drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c   | 166 +-  ..

[PATCH 3/4] drm/amdkfd: enable watch points globally for gfx943

2023-07-07 Thread Eric Huang
From: Jonathan Kim Set watch points for all xcc instances on GFX943. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang Reviewed-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff

[PATCH 4/4] drm/amdkfd: add multi-process debugging support for GC v9.4.3

2023-07-07 Thread Eric Huang
device initialization. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang Reviewed-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h b/drivers

[PATCH 1/4] drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3

2023-07-07 Thread Eric Huang
From: Jonathan Kim Implement the similarities as GC v9.4.2, and the difference for GC v9.4.3 HW spec, i.e. xcc instance. Signed-off-by: Jonathan Kim Signed-off-by: Eric Huang --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 8 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 27

[PATCH 2/4] drm/amdkfd: restore debugger additional info for gfx v9_4_3

2023-07-07 Thread Eric Huang
Acked-by: Amber Lin Signed-off-by: Eric Huang Reviewed-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 -- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/d

[PATCH 0/4] Upstream debugger feature for GFX v9.4.3

2023-07-07 Thread Eric Huang
Jonathan Kim (4): drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3 drm/amdkfd: restore debugger additional info for gfx v9_4_3 drm/amdkfd: enable watch points globally for gfx943 drm/amdkfd: add multi-process debugging support for GC v9.4.3 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebar

Re: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance

2023-07-07 Thread Eric Huang
time and set grace period accordingly. Signed-off-by: Eric Huang --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 9 -- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- .../gpu/drm/amd/amdkfd/kfd_packet_manager.c | 32 +++--- - .../drm/amd/amdkfd

Re: [PATCH 4/6] drm/amdkfd: enable grace period for xcc instance

2023-07-07 Thread Eric Huang
instance needs to get iq wait time and set grace period accordingly. Signed-off-by: Eric Huang --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 9 -- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- .../gpu/drm/amd/amdkfd/kfd_packet_manager.c | 32 +++ .../drm

[PATCH 4/6] drm/amdkfd: enable grace period for xcc instance

2023-07-06 Thread Eric Huang
each xcc instance needs to get iq wait time and set grace period accordingly. Signed-off-by: Eric Huang --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 9 -- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 2 +- .../gpu/drm/amd/amdkfd/kfd_packet_manager.c | 32

[PATCH 3/6] drm/amdkfd: enable watch points globally for gfx943

2023-07-06 Thread Eric Huang
From: Jonathan Kim Set watch points for all xcc instances on GFX943. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd

[PATCH 2/6] drm/amdkfd: restore debugger additional info for gfx v9_4_3

2023-07-06 Thread Eric Huang
Acked-by: Amber Lin Signed-off-by: Eric Huang Reviewed-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 -- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/d

[PATCH 5/6] drm/amdkfd: always keep trap enabled for GC v9.4.3

2023-07-06 Thread Eric Huang
To set TTMP setup on by default. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 3 ++- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 +++--- 3 files changed, 6 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm

[PATCH 6/6] drm/amdkfd: add multi-process debugging support for GC v9.4.3

2023-07-06 Thread Eric Huang
device initialization. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h b/drivers/gpu/drm/amd/amdkfd

[PATCH 1/6] drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3

2023-07-06 Thread Eric Huang
From: Jonathan Kim Implement the similarities as GC v9.4.2, and the difference for GC v9.4.3 HW spec, i.e. xcc instance. Signed-off-by: Jonathan Kim Signed-off-by: Eric Huang --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 10 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 30

[PATCH 0/6] Upstream debugger feature for GFX v9.4.3

2023-07-06 Thread Eric Huang
Eric Huang (2): drm/amdkfd: enable grace period for xcc instance drm/amdkfd: always keep trap enabled for GC v9.4.3 Jonathan Kim (4): drm/amdkfd: add kfd2kgd debugger callbacks for GC v9.4.3 drm/amdkfd: restore debugger additional info for gfx v9_4_3 drm/amdkfd: enable watch points

[PATCH 3/5] drm/amdkfd: add xcc instance for debugger APIs

2023-07-05 Thread Eric Huang
Since GFX9 GPU has multiple xcc instances, this is to implement this change in KFD for debugger APIs. Signed-off-by: Eric Huang --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c| 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 6 -- drivers/gpu/drm/amd/amdgpu

[PATCH 0/5] Upstream debugger feature for GFX v9.4.3

2023-07-05 Thread Eric Huang
Eric Huang (1): drm/amdkfd: add xcc instance for debugger APIs Jonathan Kim (4): drm/amdgpu: add kfd2kgd debugger callbacks for GC v9.4.3 drm/amdkfd: restore debugger additional info for gfx v9_4_3 drm/amdkfd: enable watch points globally for gfx943 drm/amdkfd: add multi-process

[PATCH 5/5] drm/amdkfd: add multi-process debugging support for GC v9.4.3

2023-07-05 Thread Eric Huang
device initialization. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h b/drivers/gpu/drm/amd/amdkfd

[PATCH 4/5] drm/amdkfd: enable watch points globally for gfx943

2023-07-05 Thread Eric Huang
From: Jonathan Kim Set watch points for all xcc instances on GFX943. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 6 -- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 16 ++-- 2

[PATCH 2/5] drm/amdkfd: restore debugger additional info for gfx v9_4_3

2023-07-05 Thread Eric Huang
Acked-by: Amber Lin Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 -- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/

[PATCH 1/5] drm/amdgpu: add kfd2kgd debugger callbacks for GC v9.4.3

2023-07-05 Thread Eric Huang
From: Jonathan Kim Implement the similarities as GC v9.4.2, and the difference for GC v9.4.3 HW spec. Signed-off-by: Jonathan Kim Signed-off-by: Eric Huang --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 7 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 30 .../drm/amd/amdgpu

[PATCH 5/5] drm/amdkfd: enable watch points globally for gfx943

2023-06-28 Thread Eric Huang
From: Jonathan Kim Set watch points for all xcc instances on GFX943. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 6 -- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 16 ++-- 2

[PATCH 1/5] drm/amdgpu: add debugger support for GC v9.4.3

2023-06-28 Thread Eric Huang
From: Jonathan Kim Implement the similarities as GC v9.4.2, and the difference for GC v9.4.3 HW spec. Signed-off-by: Jonathan Kim Signed-off-by: Eric Huang --- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 7 +- .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.h | 30 .../drm/amd/amdgpu

[PATCH 3/5] drm/amdkfd: restore debugger additional info for gfx v9_4_3

2023-06-28 Thread Eric Huang
Acked-by: Amber Lin Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 10 -- drivers/gpu/drm/amd/amdkfd/kfd_topology.h | 3 +++ 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c b/drivers/gpu/drm/

[PATCH 2/5] drm/amdkfd: add multi-process debugging support for GC v9.4.3

2023-06-28 Thread Eric Huang
device initialization. Signed-off-by: Jonathan Kim Reviewed-by: Felix Kuehling Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_debug.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_debug.h b/drivers/gpu/drm/amd/amdkfd

[PATCH 4/5] drm/amdkfd: add xcc instance for debugger APIs

2023-06-28 Thread Eric Huang
Since GFX9 GPU has multiple xcc instances, this is to implement this change in KFD for debugger APIs. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gc_9_4_3.c | 6 -- drivers/gpu/drm/amd/amdgpu

[PATCH 0/5] Upstream debugger feature for GFX v9.4.3

2023-06-28 Thread Eric Huang
Eric Huang (1): drm/amdkfd: add xcc instance for debugger APIs Jonathan Kim (4): drm/amdgpu: add debugger support for GC v9.4.3 drm/amdkfd: add multi-process debugging support for GC v9.4.3 drm/amdkfd: restore debugger additional info for gfx v9_4_3 drm/amdkfd: enable watch points

Re: [PATCH] drm/amdkfd: Don't trigger evictions unmapping dmabuf attachments

2023-05-01 Thread Eric Huang
Reviewed-by: Eric Huang Regards, Eric On 2023-05-01 16:52, Felix Kuehling wrote: Don't move DMABuf attachments for PCIe P2P mappings to the SYSTEM domain when unmapping. This avoids triggering eviction fences unnecessarily. Instead do the move to SYSTEM and back to GTT when mapping

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-28 Thread Eric Huang
On 2023-04-28 15:42, Felix Kuehling wrote: On 2023-04-28 14:09, Eric Huang wrote: On 2023-04-28 12:41, Felix Kuehling wrote: On 2023-04-28 10:17, Eric Huang wrote: On 2023-04-27 23:46, Kuehling, Felix wrote: [AMD Official Use Only - General] Re-mapping typically happens after evictions

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-28 Thread Eric Huang
On 2023-04-28 12:41, Felix Kuehling wrote: On 2023-04-28 10:17, Eric Huang wrote: On 2023-04-27 23:46, Kuehling, Felix wrote: [AMD Official Use Only - General] Re-mapping typically happens after evictions, before a new eviction fence gets attached. At that time the old eviction fence

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-28 Thread Eric Huang
ian König wrote: Am 26.04.23 um 18:58 schrieb Felix Kuehling: On 2023-04-26 9:03, Christian König wrote: Am 25.04.23 um 16:11 schrieb Eric Huang: Hi Christian, What do you think about Felix's explanation? That's unfortunately not something we can do here. Regards, Eric On 2023-0

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-27 Thread Eric Huang
-26 9:03, Christian König wrote: Am 25.04.23 um 16:11 schrieb Eric Huang: Hi Christian, What do you think about Felix's explanation? That's unfortunately not something we can do here. Regards, Eric On 2023-04-13 09:28, Felix Kuehling wrote: Am 2023-04-13 um 07:35 schrieb Chr

Re: [PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-25 Thread Eric Huang
Hi Christian, What do you think about Felix's explanation? Regards, Eric On 2023-04-13 09:28, Felix Kuehling wrote: Am 2023-04-13 um 07:35 schrieb Christian König: Am 13.04.23 um 03:01 schrieb Felix Kuehling: Am 2023-04-12 um 18:25 schrieb Eric Huang: It is to avoid redundant evictio

[PATCH] drm/amdgpu: Ignore KFD eviction fences invalidating preemptible DMABuf imports

2023-04-12 Thread Eric Huang
It is to avoid redundant eviction for KFD's DMAbuf import bo when dmaunmapping DMAbuf. The DMAbuf import bo has been set as AMDGPU_PL_PREEMPT in KFD when mapping. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 ++- 1 file changed, 6 insertions(+), 1 del

[PATCH] drm/amdgpu: only wait GTT bo's fence in amdgpu_bo_move

2023-04-12 Thread Eric Huang
It is to avoid redundant eviction for KFD's DMAbuf import bo when dmaunmapping DMAbuf. The DMAbuf import bo has been set as AMDGPU_PL_PREEMPT in KFD when mapping. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 5 - 1 file changed, 4 insertions(+), 1 del

Re: [PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping

2023-04-10 Thread Eric Huang
LE in the imported DMABuf BO. On 2023-04-10 14:28, Eric Huang wrote: Hi Felix, Thanks for your review and suggestion, but unfortunately the AMDGPU_GEM_DOMAIN_PREEMPTIBLE is not defined in amdgpu_drm.h. I understand we need the memory eviction on either kfd_mem_dmamap_dmabuf

Re: [PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping

2023-04-10 Thread Eric Huang
pecial case in the above if-block for old_mem->mem_type ==    AMDGPU_PL_PREEMPT: use amdgpu_bo_sync_wait with    owner=AMDGPU_FENCE_OWNER_KFD so that it doesn't wait for eviction fences Regards,   Felix Am 2023-04-04 um 10:36 schrieb Eric Huang: Here is the backtrace from Jira: Thu Nov 10 13:10:23 2022]

Re: [PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping

2023-04-04 Thread Eric Huang
odule param. Regards,   Felix Am 2023-04-03 um 13:59 schrieb Eric Huang: dmabuf is allocated/mapped as GTT domain, when dma-unmapping dmabuf changing placement to CPU will trigger memory eviction after calling ttm_bo_validate, and the eviction will cause performance drop. Keeping the correct

[PATCH] drm/amdkfd: Fix dmabuf's redundant eviction when unmapping

2023-04-03 Thread Eric Huang
dmabuf is allocated/mapped as GTT domain, when dma-unmapping dmabuf changing placement to CPU will trigger memory eviction after calling ttm_bo_validate, and the eviction will cause performance drop. Keeping the correct domain will solve the issue. Signed-off-by: Eric Huang --- drivers/gpu/drm

Re: [PATCH] drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU

2023-01-10 Thread Eric Huang
Ping. On 2023-01-05 14:28, Eric Huang wrote: The point bo->kfd_bo is NULL for queue's write pointer BO when creating queue on mGPU. To avoid using the pointer fixes the error. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu

[PATCH] drm/amdkfd: Add sync after creating vram bo

2023-01-09 Thread Eric Huang
There will be data corruption on vram allocated by svm if initialization is not being done. Adding sync is to resolve this issue. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b

[PATCH] drm/amdkfd: Fix NULL pointer error for GC 11.0.1 on mGPU

2023-01-05 Thread Eric Huang
The point bo->kfd_bo is NULL for queue's write pointer BO when creating queue on mGPU. To avoid using the pointer fixes the error. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 2 +-

[PATCH] amd/amdkfd: Fix a memory limit issue

2022-11-14 Thread Eric Huang
. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index db772942f7a6..fb1bb593312e 100644 --- a

Re: [PATCH] drm/amdkfd: bump KFD version for unified ctx save/restore memory

2022-07-12 Thread Eric Huang
*From:* amd-gfx on behalf of Eric Huang *Sent:* Monday, July 11, 2022 2:41 PM *To:* amd-gfx@lists.freedesktop.org *Cc:* Huang, JinHuiEric ; Kuehling, Felix *Subject:* [PATCH] drm/amdkfd: bump KFD version for unified ctx save/restore memory To

[PATCH 3/3] libhsakmt: allocate unified memory for ctx save restore area

2022-07-11 Thread Eric Huang
To improve performance on queue preemption, allocate ctx s/r area in VRAM instead of system memory, and migrate it back to system memory when VRAM is full. Signed-off-by: Eric Huang Change-Id: If775782027188dbe84b6868260e429373675434c --- include/hsakmttypes.h | 1 + src/queues.c

[PATCH 2/3] libhsakmt: add new flag for svm

2022-07-11 Thread Eric Huang
It is to add new option for always keeping gpu mapping and bump KFD version for the feature of unified save restore memory. Signed-off-by: Eric Huang Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492 --- include/linux/kfd_ioctl.h | 6 +- 1 file changed, 5 insertions(+), 1 deletion

[PATCH] drm/amdkfd: bump KFD version for unified ctx save/restore memory

2022-07-11 Thread Eric Huang
To expose unified memory for ctx save/resotre area feature availablity to libhsakmt. Signed-off-by: Eric Huang --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index

[PATCH 5/5] libhsakmt: allocate unified memory for ctx save restore area

2022-06-30 Thread Eric Huang
To improve performance on queue preemption, allocate ctx s/r area in VRAM instead of system memory, and migrate it back to system memory when VRAM is full. Signed-off-by: Eric Huang Change-Id: If775782027188dbe84b6868260e429373675434c --- include/hsakmttypes.h | 1 + src/queues.c

[PATCH 4/5] libhsakmt: add new flags for svm

2022-06-30 Thread Eric Huang
It is to add new option for always keeping gpu mapping. Signed-off-by: Eric Huang Change-Id: Iebee35e6de4d52fa29f82dd19f6bbf5640249492 --- include/linux/kfd_ioctl.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/linux/kfd_ioctl.h b/include/linux/kfd_ioctl.h index 8a0ed49..5c45f58

[PATCH 1/5] drm/amdkfd: add new flag for svm

2022-06-30 Thread Eric Huang
It is to add new option for always keeping gpu mapping. Signed-off-by: Eric Huang --- include/uapi/linux/kfd_ioctl.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index fd49dde4d5f4..eba04ebfd9a8 100644 --- a/include/uapi

[PATCH 3/5] drm/amdkfd: optimize svm range evict

2022-06-30 Thread Eric Huang
It is to avoid unnecessary queue eviction when range is not mapped to gpu. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c index

[PATCH 2/5] drm/amdkfd: change svm range evict

2022-06-30 Thread Eric Huang
Adding always evict queues when flag is set to KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED as if XNACK off. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/drivers/gpu

[PATCH 0/5] Unified memory for CWSR save restore area

2022-06-30 Thread Eric Huang
amdkfd changes: Eric Huang (3): drm/amdkfd: add new flag for svm drm/amdkfd: change svm range evict drm/amdkfd: optimize svm range evict drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 13 +++-- include/uapi/linux/kfd_ioctl.h | 2 ++ 2 files changed, 13 insertions(+), 2 deletions

Re: [PATCH 2/2] drm/amdkfd: change svm range evict

2022-06-30 Thread Eric Huang
On 2022-06-29 19:29, Felix Kuehling wrote: On 2022-06-29 18:53, Eric Huang wrote: On 2022-06-29 18:20, Felix Kuehling wrote: On 2022-06-28 17:43, Eric Huang wrote: Two changes: 1. reducing unnecessary evict/unmap when range is not mapped to gpu. 2. adding always evict when flags is set to

Re: [PATCH 2/2] drm/amdkfd: change svm range evict

2022-06-29 Thread Eric Huang
On 2022-06-29 18:20, Felix Kuehling wrote: On 2022-06-28 17:43, Eric Huang wrote: Two changes: 1. reducing unnecessary evict/unmap when range is not mapped to gpu. 2. adding always evict when flags is set to always_mapped. Signed-off-by: Eric Huang ---   drivers/gpu/drm/amd/amdkfd

[PATCH 4/4] libhsakmt: allocate unified memory for ctx save restore area

2022-06-28 Thread Eric Huang
To improve performance on queue preemption, allocate ctx s/r area in VRAM instead of system memory, and migrate it back to system memory when VRAM is full. Signed-off-by: Eric Huang Change-Id: If775782027188dbe84b6868260e429373675434c --- include/hsakmttypes.h | 1 + src/queues.c

  1   2   3   >