Re: [PATCH v4 2/2] drm/amdkfd: return migration pages from copy function

2025-07-28 Thread James Zhu
Ping ... On 2025-07-22 08:59, James Zhu wrote: dst MIGRATE_PFN_VALID bit and src MIGRATE_PFN_MIGRATE bit should always be set when migration success. cpage includes src MIGRATE_PFN_MIGRATE bit set and MIGRATE_PFN_VALID bit unset pages for both ram and vram when memory is only allocated without

[PATCH v4 2/2] drm/amdkfd: return migration pages from copy function

2025-07-22 Thread James Zhu
migration pages directly from copy function -v4 correct comments and copy fucntion return mpage (suggested-by Felix) Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 72 1 file changed, 36 insertions(+), 36 deletions(-) diff --git a/drivers/gpu/drm/amd

Re: [PATCH v3 2/2] drm/amdkfd: return migration pages from copy function

2025-07-21 Thread James Zhu
Hi Felix Thanks! Best Regadrs! James Zhu On 2025-07-18 18:09, Felix Kuehling wrote: On 2025-07-14 08:46, James Zhu wrote: dst MIGRATE_PFN_VALID bit and src MIGRATE_PFN_MIGRATE bit should always be set when migration success. cpage includes src MIGRATE_PFN_MIGRATE bit set and

[PATCH v3 2/2] drm/amdkfd: return migration pages from copy function

2025-07-14 Thread James Zhu
count as migrate_unsuccessful_pages. -v2 use dst to check MIGRATE_PFN_VALID bit(suggested-by philip) -v3 add warning when vram pages is less than migration pages return migration pages directly from copy function Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 44

[PATCH v2] drm/amdkfd: improve performance with XNACK enable

2025-06-12 Thread James Zhu
ess will be stuck here. Using down_write_trylock to replace mmap_write_lock will help not block the second and following evictiion work queue process. -v2: just return if failed to get write lock, lets caller decides if retry. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_charde

[PATCH v2 2/2] drm/amdkfd: add svm_migrate_successful_pages

2025-05-28 Thread James Zhu
to get migration pages. dst bit MIGRATE_PFN_VALID and src bit MIGRATE_PFN_MIGRATE should always be set when success. -v2 use dst to check MIGRATE_PFN_VALID bit(suggested-by philip) Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 22 ++ 1 file changed

[PATCH 1/2] drm/amdkfd: remove unused code

2025-05-28 Thread James Zhu
upages is assigned under cpages = 0, so it isn't really used in this function. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c

[PATCH 2/2] drm/amdkfd: add svm_migrate_successful_pages

2025-05-28 Thread James Zhu
to get migration pages. When migrating pages from system to vram, needn't check bit MIGRATE_PFN_VALID, since the system page could be allocated, but not be accessed. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 25 1 file change

Re: [PATCH] drm/amdkfd: improve performance with XNACK enable

2025-05-09 Thread James Zhu
On 2025-05-09 02:00, Christian König wrote: On 5/8/25 19:25, James Zhu wrote: On 2025-05-08 11:20, James Zhu wrote: On 2025-05-08 10:50, Christian König wrote: On 5/8/25 16:46, James Zhu wrote: When XNACK on, hang or low performance is observed with some test cases. The restoring page

Re: [PATCH] drm/amdkfd: improve performance with XNACK enable

2025-05-09 Thread James Zhu
On 2025-05-08 17:54, Felix Kuehling wrote: On 2025-05-08 10:50, Christian König wrote: On 5/8/25 16:46, James Zhu wrote: When XNACK on, hang or low performance is observed with some test cases. The restoring page process has unexpected stuck during evicting/restoring if some bo's fla

Re: [PATCH] drm/amdkfd: improve performance with XNACK enable

2025-05-08 Thread James Zhu
On 2025-05-08 11:20, James Zhu wrote: On 2025-05-08 10:50, Christian König wrote: On 5/8/25 16:46, James Zhu wrote: When XNACK on, hang or low performance is observed with some test cases. The restoring page process has unexpected stuck during evicting/restoring if some bo's fla

Re: [PATCH] drm/amdkfd: improve performance with XNACK enable

2025-05-08 Thread James Zhu
On 2025-05-08 10:50, Christian König wrote: On 5/8/25 16:46, James Zhu wrote: When XNACK on, hang or low performance is observed with some test cases. The restoring page process has unexpected stuck during evicting/restoring if some bo's flag has KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED se

[PATCH] drm/amdkfd: improve performance with XNACK enable

2025-05-08 Thread James Zhu
ess will be stuck here. Using down_write_trylock to replace mmap_write_lock will help not block the second and following evictiion work queue process. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/driver

[PATCH] drm/amdkfd: remove unnecessary cpu domain validation

2025-03-03 Thread James Zhu
before move to GTT domain. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c index 62ca12e94581..2ac6d4fa0601

Re: [PATCH] drm/amdgpu: allow pinning DMA-bufs into VRAM if all importers can do P2P

2025-02-03 Thread James Zhu
On 2025-01-09 11:57, Felix Kuehling wrote: From: Christian König Try pinning into VRAM to allow P2P with RDMA NICs without ODP support if all attachments can do P2P. If any attachment can't do P2P just pin into GTT instead. Signed-off-by: Christian König Signed-off-by: Felix Kuehling Revie

Re: [PATCH 1/2] drm/ttm: test private resv obj on release/destroy

2025-01-29 Thread James Zhu
Reviewed-and-Tested-by: James Zhu for the series On 2025-01-29 10:28, Christian König wrote: Test the fences in the private dma_resv object instead of the pointer to a potentially shared dma_resv object. This only matters for imported BOs with an SG table since those don't get their dma

Re: [RFC PATCH] amd/ttm: test fence->ops->signaled before use

2025-01-08 Thread James Zhu
MyQorAisinline. Thanks! JamesZhu On 2025-01-08 04:18, Christian König wrote: Am 07.01.25 um 21:01 schrieb James Zhu: this original test condition is unclear. No that is completely unnecessary. The point is that with fence->ops->signaled provided the fence should make progres

[RFC PATCH] amd/ttm: test fence->ops->signaled before use

2025-01-07 Thread James Zhu
this original test condition is unclear. Signed-off-by: James Zhu --- drivers/gpu/drm/ttm/ttm_bo.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ttm/ttm_bo.c b/drivers/gpu/drm/ttm/ttm_bo.c index 48c5365efca1..d40f07802c4f 100644 --- a/drivers/gpu/drm/ttm

Re: [PATCH 2/2] drm/amdkfd:Add kfd function to config sq perfmon

2024-09-13 Thread James Zhu
Reviewed-by:JamesZhufortheseries. On 2024-09-13 04:32, Feifei Xu wrote: Expose the interface for kfd to config sq perfmon. Signed-off-by: Feifei Xu Suggested-by: Hawking Zhang Reviewed-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 15 +++ drivers/gpu/drm/amd/amd

Re: [PATCH v3 0/4] Improve SVM migrate event report

2024-08-27 Thread James Zhu
error code if migration failed. 4. Report dropped event count if fifo is full. v3: Simplify event drop count handling (James Zhu) Philip Yang (4): drm/amdkfd: Document and define SVM events message macro drm/amdkfd: Output migrate end event if migrate failed drm/amdkfd: Increase SMI event

Re: [PATCH v2 4/4] drm/amdkfd: SMI report dropped event count

2024-08-22 Thread James Zhu
On 2024-07-30 16:15, Philip Yang wrote: Add new SMI event to report the dropped event count when the event kfifo is full. When the kfifo has space for two events, generate a dropped event record to report how many events were dropped, together with the next event to add to kfifo. After readin

Re: [PATCH v2 3/4] drm/amdkfd: Increase SMI event fifo size

2024-08-22 Thread James Zhu
On 2024-07-30 16:15, Philip Yang wrote: SMI event fifo size 1KB was enough to report GPU vm fault or reset [JZ] There is a typo here. it should be NOT enough. event, increase it to 8KB to store about 100 migrate events, less chance to drop the migrate events if lots of migration happened in t

Re: [PATCH v2 1/4] drm/amdkfd: Document and define SVM events message macro

2024-08-22 Thread James Zhu
On 2024-07-30 16:15, Philip Yang wrote: Document how to use SMI system management interface to enable and receive SVM events. Document SVM event triggers. Define SVM events message string format macro that could be used by user mode for sscanf to parse the event. Add it to uAPI header file to ma

Re: [PATCH] drm/amdkfd: Remove arbitrary timeout for hmm_range_fault

2024-05-02 Thread James Zhu
On 2024-05-01 18:56, Philip Yang wrote: On system with khugepaged enabled and user cases with THP buffer, the hmm_range_fault may takes > 15 seconds to return -EBUSY, the arbitrary timeout value is not accurate, cause memory allocation failure. Remove the arbitrary timeout value, return EAGAIN

Re: [PATCH] drm/amd/amdxcp: Use unique name for partition dev

2024-04-30 Thread James Zhu
On 2024-04-30 07:36, Lijo Lazar wrote: amdxcp is a platform driver for creating partition devices. libdrm library identifies a platform device based on 'OF_FULLNAME' or 'MODALIAS'. If two or more devices have the same platform name, drm library only picks the first device. Platform driver core us

Re: [PATCH v4 00/24] Support Host Trap Sampling for gfx941/gfx942

2024-02-12 Thread James Zhu
Ping . Best Regards! James Zhu On 2024-02-06 10:58, James Zhu wrote: PC sampling is a form of software profiling, where the threads of an application are periodically interrupted and the program counter that the threads are currently attempting to execute is saved out for profiling

[PATCH v4 12/24] drm/amdgpu: use trapID 4 for host trap

2024-02-06 Thread James Zhu
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify the host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 + .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 + .../dr

[PATCH v4 21/24] drm/amdkfd: add pc sampling thread to trigger trap

2024-02-06 Thread James Zhu
Add a kthread to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 91 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 1 + 2 files changed, 89 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd

[PATCH v4 22/24] drm/amdkfd: add pc sampling release when process release

2024-02-06 Thread James Zhu
Add pc sampling release when process release, it will force to stop all activate sessions with this process. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 25 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 1 + drivers/gpu/drm/amd/amdkfd

[PATCH v4 23/24] drm/amdkfd: Set debug trap bit when enabling PC Sampling

2024-02-06 Thread James Zhu
KFD_RUNTIME_ENABLE_MODE_ENABLE_MASK flag on exit. It is also not valid to have the debugger attached to a process while PC sampling is enabled so adding some checks to prevent this. Signed-off-by: David Yat Sin Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 30

[PATCH v4 24/24] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

2024-02-06 Thread James Zhu
Bump the minor version to declare pc sampling feature is now available. Signed-off-by: James Zhu --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index ec1b6404b185

[PATCH v4 15/24] drm/amdkfd: trigger pc sampling trap for aldebaran

2024-02-06 Thread James Zhu
Implement trigger pc sampling trap for aldebaran. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu

[PATCH v4 18/24] drm/amdkfd: enable pc sampling stop

2024-02-06 Thread James Zhu
Enable pc sampling stop. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 29 ++-- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 4 +++ 2 files changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b

[PATCH v4 19/24] drm/amdkfd: add queue remapping

2024-02-06 Thread James Zhu
the queues either waits for the waves to drain, or preempts them with CWSR, which itself executes a trap and waits for previous traps to finish. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++ drivers/gpu/drm/amd/amdkfd

[PATCH v4 17/24] drm/amdkfd: add setting trap pc sampling flag

2024-02-06 Thread James Zhu
Add setting trap pc sampling flag. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 + 2 files changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd

[PATCH v4 09/24] drm/amdkfd: add interface to trigger pc sampling trap

2024-02-06 Thread James Zhu
Add interface to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 7 +++ 1 file changed, 7 insertions(+) diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h index

[PATCH v4 05/24] drm/amdkfd: enable pc sampling create

2024-02-06 Thread James Zhu
From: David Yat Sin Enable pc sampling create. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 59 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 2 files changed, 68 insertions

[PATCH v4 11/24] drm/amdkfd/gfx9: enable host trap

2024-02-06 Thread James Zhu
Enable host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++ .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 --- 2 files changed, 52 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b

[PATCH v4 06/24] drm/amdkfd: add trace_id return

2024-02-06 Thread James Zhu
Add trace_id return for new pc sampling creation per device, Use IDR to quickly locate pc_sampling_entry for reference. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++- drivers/gpu/drm/amd

[PATCH v4 16/24] drm/amdkfd: use bit operation set debug trap

2024-02-06 Thread James Zhu
1st level TMA's 2nd byte which used for trap type setting, to use bit operation to change selected bit only. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/a

[PATCH v4 20/24] drm/amdkfd: enable pc sampling start

2024-02-06 Thread James Zhu
Enable pc sampling start. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 27 +--- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index

[PATCH v4 14/24] drm/amdkfd: trigger pc sampling trap for arcturus

2024-02-06 Thread James Zhu
Implement trigger pc sampling trap for arcturus. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c| 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu

[PATCH v4 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support

2024-02-06 Thread James Zhu
From: David Yat Sin Add pc sampling support in kfd_ioctl. The user mode code which uses this new kfd_ioctl is linked to https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface with master branch. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- include

[PATCH v4 08/24] drm/amdkfd: enable pc sampling destroy

2024-02-06 Thread James Zhu
Enable pc sampling destroy. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index

[PATCH v4 10/24] drm/amdkfd: trigger pc sampling trap for gfx v9

2024-02-06 Thread James Zhu
Implement trigger pc sampling trap for gfx v9. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 36 +++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 7 2 files changed, 43 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v4 13/24] drm/amdgpu: add sq host trap status check

2024-02-06 Thread James Zhu
Before fire a new host trap, check the host trap status. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 35 +++ .../amd/include/asic_reg/gc/gc_9_0_offset.h | 2 ++ .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h | 5 +++ 3 files changed, 42

[PATCH v4 07/24] drm/amdkfd: check pcs_entry valid

2024-02-06 Thread James Zhu
Check pcs_entry valid for pc sampling ioctl. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++-- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd

[PATCH v4 04/24] drm/amdkfd: add pc sampling mutex

2024-02-06 Thread James Zhu
Add pc sampling mutex per node, and do init/destroy in node init. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +++ 2 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c

[PATCH v4 03/24] drm/amdkfd: enable pc sampling query

2024-02-06 Thread James Zhu
From: David Yat Sin Enable pc sampling to query system capability. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 65 +++- 1 file changed, 64 insertions(+), 1 deletion(-) diff --git a

[PATCH v4 02/24] drm/amdkfd: add pc sampling support

2024-02-06 Thread James Zhu
From: David Yat Sin Add pc sampling functions in amdkfd. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/Makefile | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 45 +++ drivers/gpu/drm/amd/amdkfd

[PATCH v4 00/24] Support Host Trap Sampling for gfx941/gfx942

2024-02-06 Thread James Zhu
: add pc sampling support drm/amdkfd: enable pc sampling query drm/amdkfd: enable pc sampling create drm/amdkfd: Set debug trap bit when enabling PC Sampling James Zhu (19): drm/amdkfd: add pc sampling mutex drm/amdkfd: add trace_id return drm/amdkfd: check pcs_entry valid drm/amdkfd

Re: [PATCH] drm/amdgpu: make a correction on comment

2024-01-08 Thread James Zhu
On 2024-01-08 03:12, Christian König wrote: Am 02.01.24 um 21:56 schrieb James Zhu: Current AMDGPU_VM_RESERVED_VRAM is updated to 8M. Signed-off-by: James Zhu Maybe remove the value completely from the comment, just something like "How much memory be reserved for page tables".

[PATCH] drm/amdgpu: make a correction on comment

2024-01-02 Thread James Zhu
Current AMDGPU_VM_RESERVED_VRAM is updated to 8M. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.h index b6cd565562ad

Re: [PATCH v3 23/24] drm/amdkfd: set debug trap bit when enabling PC Sampling

2024-01-02 Thread James Zhu
On 2023-12-15 10:59, James Zhu wrote: From: David Yat Sin We need the SPI_GDBG_PER_VMID_CNTL.TRAP_EN bit to be set during PC Sampling so that the TTMP registers are valid inside the sampling data. runtime_info.ttmp_setup will be cleared when the user application does the

[PATCH v3 23/24] drm/amdkfd: set debug trap bit when enabling PC Sampling

2023-12-15 Thread James Zhu
From: David Yat Sin We need the SPI_GDBG_PER_VMID_CNTL.TRAP_EN bit to be set during PC Sampling so that the TTMP registers are valid inside the sampling data. runtime_info.ttmp_setup will be cleared when the user application does the AMDKFD_IOC_RUNTIME_ENABLE ioctl without KFD_RUNTIME_ENABLE_MODE

[PATCH v3 17/24] drm/amdkfd: add setting trap pc sampling flag

2023-12-15 Thread James Zhu
Add setting trap pc sampling flag. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 + 2 files changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd

[PATCH v3 18/24] drm/amdkfd: enable pc sampling stop

2023-12-15 Thread James Zhu
Enable pc sampling stop. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 28 +--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 4 +++ 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b

[PATCH v3 22/24] drm/amdkfd: add pc sampling release when process release

2023-12-15 Thread James Zhu
Add pc sampling release when process release, it will force to stop all activate sessions with this process. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 21 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 1 + drivers/gpu/drm/amd/amdkfd

[PATCH v3 11/24] drm/amdkfd/gfx9: enable host trap

2023-12-15 Thread James Zhu
Enable host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++ .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 --- 2 files changed, 52 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b

[PATCH v3 13/24] drm/amdgpu: add sq host trap status check

2023-12-15 Thread James Zhu
Before fire a new host trap, check the host trap status. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 35 +++ .../amd/include/asic_reg/gc/gc_9_0_offset.h | 2 ++ .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h | 5 +++ 3 files changed, 42

[PATCH v3 24/24] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

2023-12-15 Thread James Zhu
Bump the minor version to declare pc sampling feature is now available. Signed-off-by: James Zhu --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 1bd1347effea

[PATCH v3 12/24] drm/amdgpu: use trapID 4 for host trap

2023-12-15 Thread James Zhu
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify the host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 + .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 + .../dr

[PATCH v3 04/24] drm/amdkfd: add pc sampling mutex

2023-12-15 Thread James Zhu
Add pc sampling mutex per node, and do init/destroy in node init. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 7 +++ 2 files changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c

[PATCH v3 20/24] drm/amdkfd: enable pc sampling start

2023-12-15 Thread James Zhu
Enable pc sampling start. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index

[PATCH v3 15/24] drm/amdkfd: trigger pc sampling trap for aldebaran

2023-12-15 Thread James Zhu
Implement trigger pc sampling trap for aldebaran. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu

[PATCH v3 19/24] drm/amdkfd: add queue remapping

2023-12-15 Thread James Zhu
the queues either waits for the waves to drain, or preempts them with CWSR, which itself executes a trap and waits for previous traps to finish. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++ drivers/gpu/drm/amd/amdkfd

[PATCH v3 21/24] drm/amdkfd: add pc sampling thread to trigger trap

2023-12-15 Thread James Zhu
Add a kthread to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 68 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 1 + 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd

[PATCH v3 16/24] drm/amdkfd: use bit operation set debug trap

2023-12-15 Thread James Zhu
1st level TMA's 2nd byte which used for trap type setting, to use bit operation to change selected bit only. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/a

[PATCH v3 09/24] drm/amdkfd: add interface to trigger pc sampling trap

2023-12-15 Thread James Zhu
Add interface to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h index 6d094cf3587d

[PATCH v3 10/24] drm/amdkfd: trigger pc sampling trap for gfx v9

2023-12-15 Thread James Zhu
Implement trigger pc sampling trap for gfx v9. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 36 +++ .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h | 7 2 files changed, 43 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v3 07/24] drm/amdkfd: check pcs_entry valid

2023-12-15 Thread James Zhu
Check pcs_entry valid for pc sampling ioctl. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++-- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd

[PATCH v3 14/24] drm/amdkfd: trigger pc sampling trap for arcturus

2023-12-15 Thread James Zhu
Implement trigger pc sampling trap for arcturus. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c| 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c b/drivers/gpu/drm/amd/amdgpu

[PATCH v3 08/24] drm/amdkfd: enable pc sampling destroy

2023-12-15 Thread James Zhu
Enable pc sampling destroy. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index

[PATCH v3 00/24] Support Host Trap Sampling for gfx941/gfx942

2023-12-15 Thread James Zhu
: add pc sampling support drm/amdkfd: enable pc sampling query drm/amdkfd: enable pc sampling create drm/amdkfd: set debug trap bit when enabling PC Sampling James Zhu (19): drm/amdkfd: add pc sampling mutex drm/amdkfd: add trace_id return drm/amdkfd: check pcs_entry valid drm/amdkfd

[PATCH v3 06/24] drm/amdkfd: add trace_id return

2023-12-15 Thread James Zhu
Add trace_id return for new pc sampling creation per device, Use IDR to quickly locate pc_sampling_entry for reference. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++- drivers/gpu/drm/amd

[PATCH v3 03/24] drm/amdkfd: enable pc sampling query

2023-12-15 Thread James Zhu
From: David Yat Sin Enable pc sampling to query system capability. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++- 1 file changed, 53 insertions(+), 1 deletion(-) diff --git a

[PATCH v3 02/24] drm/amdkfd: add pc sampling support

2023-12-15 Thread James Zhu
From: David Yat Sin Add pc sampling functions in amdkfd. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/Makefile | 3 +- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 +++ drivers/gpu/drm/amd/amdkfd

[PATCH v3 05/24] drm/amdkfd: enable pc sampling create

2023-12-15 Thread James Zhu
From: David Yat Sin Enable pc sampling create. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 53 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 2 files changed, 62 insertions

[PATCH v3 01/24] drm/amdkfd/kfd_ioctl: add pc sampling support

2023-12-15 Thread James Zhu
From: David Yat Sin Add pc sampling support in kfd_ioctl. The user mode code which uses this new kfd_ioctl is linked to https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface with master branch. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- include

Re: [PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout

2023-12-13 Thread James Zhu
On 2023-12-13 11:23, Felix Kuehling wrote: On 2023-12-13 10:24, James Zhu wrote: Ping ... On 2023-12-08 18:01, James Zhu wrote: When application tries to allocate all system memory and cause memory to swap out. Needs more time for hmm_range_fault to validate the remaining page for

Re: [PATCH v2 03/23] drm/amdkfd: enable pc sampling query

2023-12-13 Thread James Zhu
/amdkfd: enable pc sampling query From: David Yat Sin Enable pc sampling to query system capability. Co-developed-by: James Zhu Signed-off-by: James Zhu Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++- 1 file changed, 53 insertions(+), 1

Re: [PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout

2023-12-13 Thread James Zhu
Ping ... On 2023-12-08 18:01, James Zhu wrote: When application tries to allocate all system memory and cause memory to swap out. Needs more time for hmm_range_fault to validate the remaining page for allocation. To be safe, increase timeout value to 1 second for 64MB range. Signed-off-by

[PATCH v2 2/2] drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages

2023-12-11 Thread James Zhu
Only schedule when hmm_range_fault returns error. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c index b24eb5821fd1

[PATCH v3 00/23] Support Host Trap Sampling for gfx941/gfx942

2023-12-11 Thread James Zhu
/zhums/ROCT-Thunk-Interface/tree/zhums/ROCT-Thunk. David Yat Sin (4): drm/amdkfd/kfd_ioctl: add pc sampling support drm/amdkfd: add pc sampling support drm/amdkfd: enable pc sampling query drm/amdkfd: enable pc sampling create James Zhu (19): drm/amdkfd: add pc sampling mutex drm/amdkfd

[PATCH v3 07/23] drm/amdkfd: check pcs_entry valid

2023-12-11 Thread James Zhu
Check pcs_entry valid for pc sampling ioctl. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++-- 1 file changed, 30 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd

Re: [PATCH v2 00/23] Support Host Trap Sampling for gfx941/gfx942

2023-12-11 Thread James Zhu
Ping ... On 2023-12-07 17:53, James Zhu wrote: PC sampling is a form of software profiling, where the threads of an application are periodically interrupted and the program counter that the threads are currently attempting to execute is saved out for profiling. David Yat Sin (4): drm

Re: [PATCH 2/2] drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages

2023-12-11 Thread James Zhu
On 2023-12-11 05:38, Christian König wrote: Am 09.12.23 um 00:01 schrieb James Zhu: Needn't do schedule for each hmm_range_fault, and use cond_resched to replace schedule. cond_resched() is usually NAKed upstream since it is a NO-OP in most situations. [JZ] then let me change ba

[PATCH 1/2] drm/amdgpu: increase hmm range get pages timeout

2023-12-08 Thread James Zhu
When application tries to allocate all system memory and cause memory to swap out. Needs more time for hmm_range_fault to validate the remaining page for allocation. To be safe, increase timeout value to 1 second for 64MB range. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu

[PATCH 2/2] drm/amdgpu: make an improvement on amdgpu_hmm_range_get_pages

2023-12-08 Thread James Zhu
Needn't do schedule for each hmm_range_fault, and use cond_resched to replace schedule. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_hmm.c b/drivers/gpu/drm/amd/a

[PATCH v2 22/23] drm/amdkfd: add pc sampling release when process release

2023-12-07 Thread James Zhu
Add pc sampling release when process release, it will force to stop all activate sessions with this process. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 21 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 1 + drivers/gpu/drm/amd/amdkfd

[PATCH v2 23/23] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

2023-12-07 Thread James Zhu
Bump the minor version to declare pc sampling feature is now available. Signed-off-by: James Zhu --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h index 1bd1347effea

[PATCH v2 21/23] drm/amdkfd: add pc sampling thread to trigger trap

2023-12-07 Thread James Zhu
Add a kthread to trigger pc sampling trap. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 68 +++- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 1 + 2 files changed, 68 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd

[PATCH v2 12/23] drm/amdgpu: use trapID 4 for host trap

2023-12-07 Thread James Zhu
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify the host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 + .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 + .../dr

[PATCH v2 19/23] drm/amdkfd: add queue remapping

2023-12-07 Thread James Zhu
the queues either waits for the waves to drain, or preempts them with CWSR, which itself executes a trap and waits for previous traps to finish. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++ drivers/gpu/drm/amd/amdkfd

[PATCH v2 16/23] drm/amdkfd: use bit operation set debug trap

2023-12-07 Thread James Zhu
1st level TMA's 2nd byte which used for trap type setting, to use bit operation to change selected bit only. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/a

[PATCH v2 18/23] drm/amdkfd: enable pc sampling stop

2023-12-07 Thread James Zhu
Enable pc sampling stop. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 28 +--- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 4 +++ 2 files changed, 29 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b

[PATCH v2 15/23] drm/amdkfd: trigger pc sampling trap for aldebaran

2023-12-07 Thread James Zhu
Implement trigger pc sampling trap for aldebaran. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++ 1 file changed, 11 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c b/drivers/gpu/drm/amd/amdgpu

[PATCH v2 17/23] drm/amdkfd: add setting trap pc sampling flag

2023-12-07 Thread James Zhu
Add setting trap pc sampling flag. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 ++ drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 + 2 files changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h b/drivers/gpu/drm/amd/amdkfd

[PATCH v2 20/23] drm/amdkfd: enable pc sampling start

2023-12-07 Thread James Zhu
Enable pc sampling start. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +--- 1 file changed, 23 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c index

[PATCH v2 11/23] drm/amdkfd/gfx9: enable host trap

2023-12-07 Thread James Zhu
Enable host trap. Signed-off-by: James Zhu --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++ .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 --- 2 files changed, 52 insertions(+), 35 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h b

  1   2   3   4   5   6   >