[PATCH] drm/amdgpu: Clear overflow for SRIOV

2025-04-11 Thread Emily Deng
For VF, it doesn't have the permission to clear overflow, clear the bit by reset. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 10 -- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 1 + drivers/gpu/drm/amd/amdgpu/ih_v6_0.c | 6 +- drivers/gpu/drm/amd/amdgpu/ve

[PATCH] drm/amd/amdgpu: Fix out of bounds warning in amdgpu_hw_ip_info

2025-04-11 Thread jesse.zh...@amd.com
Fix an array index out of bounds warning in the DMA IP case of amdgpu_hw_ip_info() where it was incorrectly checking adev->gfx.gfx_ring[i].no_user_submission instead of adev->sdma.instance[i].ring.no_user_submission. The mismatch caused UBSAN to report an array bounds violation since it was access

Re: [PATCH 2/2 v2] drm/amdgpu: Add fw minimum version check for usermode queue

2025-04-11 Thread Pierre-Eric Pelloux-Prayer
Hi, Le 11/04/2025 à 06:54, Yadav, Arvind a écrit : Alex, This is v2 of 2/2 patch. Please review this. ~arvind On 4/10/2025 8:27 PM, Arvind Yadav wrote: This patch is load usermode queue based on FW support for gfx12. CP Ucode FW Vesion: [PFP = 2840, ME = 2780, MEC = 3050, MES = 123] v2: Addr

[v5 1/6] drm/amdgpu: Add the new sdma function pointers for amdgpu_sdma.h

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch introduces new function pointers in the amdgpu_sdma structure to handle queue stop, start and soft reset operations. These will replace the older callback mechanism. The new functions are: - stop_kernel_queue: Stops a specific SDMA queue - start_kernel_queu

[v5 2/6] drm/amdgpu: Register the new sdma function pointers for each sdma IP version that needs them

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Register stop/start/soft_reset queue functions for SDMA IP versions v4.4.2, v5.0 and v5.2. Suggested-by: Alex Deucher Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 22 +++--- drivers/gpu/drm/amd/amdgpu/sdma_v5_0.c | 76 ++--

[v5 3/6] drm/amdgpu: switch amdgpu_sdma_reset_engine to use the new sdma function pointers

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Replace old callback mechanism with direct calls to stop/start functions. Suggested-by: Alex Deucher Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 34 +++- 1 file changed, 4 insertions(+), 30 deletions(-) diff --git

[v5 4/6] drm/amdgpu: optimize queue reset and stop logic

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA v5.x queue reset and stop logic to improve code readability, maintainability, and performance. The key changes include: 1. **Generalized `sdma_v5_x_gfx_stop` Function**: - Added an `inst_mask` parameter to allow stopping specific SDMA

[v5 5/6] drm/amdgpu: Implement SDMA soft reset directly for v5.x

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch introduces a new function `amdgpu_sdma_soft_reset` to handle SDMA soft resets directly, rather than relying on the DPM interface. 1. **New `amdgpu_sdma_soft_reset` Function**: - Implements a soft reset for SDMA engines by directly writing to the hardwa

[v5 6/6] drm/amdgpu:remove old sdma reset callback mechanism

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch removes the deprecated SDMA reset callback mechanism, which was previously used to register pre-reset and post-reset callbacks for SDMA engine resets. The callback mechanism has been replaced with a more direct and efficient approach using `stop_queue` a

[v2] drm/amd/amdgpu: Fix array bounds check in amdgpu_hw_ip_info

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Fix an array index out of bounds warning in the DMA IP case of amdgpu_hw_ip_info() where it was incorrectly checking adev->gfx.gfx_ring[i].no_user_submission instead of adev->sdma.instance[i].ring.no_user_submission. The mismatch caused UBSAN to report an array bounds

[PATCH] drm: function to get process name and pid

2025-04-11 Thread Sunil Khatri
Add helper function which get the process information for the drm_file and updates the user provided character buffer with the information of process name and pid as a string. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/drm_file.c | 30 ++ include/drm/drm_file.h

Re: [PATCH] drm: function to get process name and pid

2025-04-11 Thread Jani Nikula
On Fri, 11 Apr 2025, Sunil Khatri wrote: > Add helper function which get the process information for > the drm_file and updates the user provided character buffer > with the information of process name and pid as a string. Where's the user for this function? BR, Jani. > > Signed-off-by: Sunil K

Re: [PATCH] drm: function to get process name and pid

2025-04-11 Thread Christian König
Am 11.04.25 um 13:26 schrieb Sunil Khatri: > Add helper function which get the process information for > the drm_file and updates the user provided character buffer > with the information of process name and pid as a string. Hi Sunil, you need to send this out together with the patch which makes

[PATCH 1/2] drm/amdgpu: Use generic hdp flush function

2025-04-11 Thread Lijo Lazar
Except HDP v5.2 all use a common logic for HDP flush. Use a generic function. HDP v5.2 forces NO_KIQ logic, revisit it later. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.c | 21 + drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.h | 2 ++ drivers/gpu/drm/amd/amd

[PATCH 2/2] drm/amdgpu: Use the right function for hdp flush

2025-04-11 Thread Lijo Lazar
There are a few prechecks made before HDP flush like a flush is not required on APU bare metal. Using hdp callback directly bypasses those checks. Use amdgpu_device_flush_hdp which takes care of prechecks. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 8 driver

Re: [PATCH 4/9] drm/amdgpu/userq: properly clean up userq fence driver on failure

2025-04-11 Thread Khatri, Sunil
On 4/10/2025 11:41 PM, Alex Deucher wrote: If userq creation fails, we need to properly unwind and free the user queue fence driver. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdg

RE: [PATCH] drm: function to get process name and pid

2025-04-11 Thread Khatri, Sunil
[AMD Official Use Only - AMD Internal Distribution Only] Sure, I will send the patch for the user too. Regards Sunil Khatri -Original Message- From: Koenig, Christian Sent: Friday, April 11, 2025 5:40 PM To: Khatri, Sunil ; dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org

Re: amdgpu_dm_connector_mode_valid regression

2025-04-11 Thread Marek Marczykowski-Górecki
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi, On Wed, Apr 02, 2025 at 04:35:05PM +0200, Gergo Koteles wrote: > Hi Dmitry, > > But the code would start to become quite untraceable. > duplicate mode in amdgpu_dm_connector_mode_valid() > call drm_mode_set_crtcinfo() in amdgpu_dm_connector_mo

Re: [PATCH 04/19] drm: Pass the format info to .fb_create()

2025-04-11 Thread Geert Uytterhoeven
On Thu, 10 Apr 2025 at 18:33, Ville Syrjala wrote: > From: Ville Syrjälä > > Pass long the format information from the top to .fb_create() s/long/along/ > so that we can avoid redundant (and somewhat expensive) lookups > in the drivers. [...] > Signed-off-by: Ville Syrjälä > drivers/gpu/dr

[PATCH next] drm/amdgpu: Fix double free in amdgpu_userq_fence_driver_alloc()

2025-04-11 Thread Dan Carpenter
The goto frees "fence_drv" so this is a double free bug. There is no need to call amdgpu_seq64_free(adev, fence_drv->va) since the seq64 allocation failed so change the goto to goto free_fence_drv. Also propagate the error code from amdgpu_seq64_alloc() instead of hard coding it to -ENOMEM. Fixe

Re: [PATCH 2/2] drm/amdgpu: Use the right function for hdp flush

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 8:16 AM Lijo Lazar wrote: > > There are a few prechecks made before HDP flush like a flush is not > required on APU bare metal. Using hdp callback directly bypasses those > checks. Use amdgpu_device_flush_hdp which takes care of prechecks. > > Signed-off-by: Lijo Lazar Re

[PATCH] drm/amdgpu: fix no_user_submission check for SDMA

2025-04-11 Thread Alex Deucher
Copy paste typo. Use the flag from the sdma structure. Fixes: 4310acd4464b ("drm/amdgpu: add ring flag for no user submissions") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH v1 1/3] drm: function to get process name and pid

2025-04-11 Thread Sunil Khatri
Add helper function which get the process information for the drm_file and updates the user provided character buffer with the information of process name and pid as a string. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/drm_file.c | 30 ++ include/drm/drm_file.h

Re: [PATCH 1/9] drm/amdgpu/userq: rename suspend/resume callbacks

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/10/2025 11:41 PM, Alex Deucher wrote: Rename to map and umap to better align with what is happening at the firmware level and remove the extra level of indirection in the MES userq code. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userque

Re: [PATCH] drm/amdgpu: Clear overflow for SRIOV

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 4:07 AM Emily Deng wrote: > > For VF, it doesn't have the permission to clear overflow, clear the bit > by reset. > > Signed-off-by: Emily Deng > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ih.c | 10 -- > drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 1 + > drivers/gpu/d

Re: [PATCH 2/9] drm/amdgpu/userq: rework front end call sequence

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/10/2025 11:41 PM, Alex Deucher wrote: Split out the queue map from the mqd create call and split out the queue unmap from the mqd destroy call. This splits the queue setup and teardown with the actual enablement in the firmware. Signed-off-by: Alex Deucher ---

[PATCH v1 3/3] drm/amdgpu: update the error logging for more information

2025-04-11 Thread Sunil Khatri
add process and pid information in the userqueue error logging to make it more useful in resolving the error by logs. Sample log: [ 42.444297] [drm:amdgpu_userqueue_wait_for_signal [amdgpu]] *ERROR* Timed out waiting for fence f=1c74d978 for comm:Xwayland pid:3427 [ 42.444669] [drm:am

Re: [PATCH] drm/amd/amdgpu: Fix out of bounds warning in amdgpu_hw_ip_info

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 4:23 AM jesse.zh...@amd.com wrote: > > Fix an array index out of bounds warning in the DMA IP case of > amdgpu_hw_ip_info() where it was incorrectly checking > adev->gfx.gfx_ring[i].no_user_submission instead of > adev->sdma.instance[i].ring.no_user_submission. > > The mism

Re: [PATCH 1/2] drm/amdgpu: Use generic hdp flush function

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 8:42 AM Lijo Lazar wrote: > > Except HDP v5.2 all use a common logic for HDP flush. Use a generic > function. HDP v5.2 forces NO_KIQ logic, revisit it later. > > Signed-off-by: Lijo Lazar Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_hdp.c | 21 ++

Re: [PATCH] drm/amdkfd: add smi events for process start and end

2025-04-11 Thread Eric Huang
Ping ... On 2025-04-07 16:52, Eric Huang wrote: rocm-smi will be able to show the events for KFD process start/end, it is the implementation of this feature. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdkfd/kfd_process.c| 4 drivers/gpu/drm/amd/amdkfd/kfd_smi_events.c | 21

[PATCH] drm/amdgpu: cleanup amdgpu_vm_flush v6

2025-04-11 Thread Christian König
This reverts commit c2cc3648ba517a6c270500b5447d5a1efdad5936. Turned out that this has some negative consequences for some workloads. Instead check if the cleaner shader should run directly. While at it remove amdgpu_vm_need_pipeline_sync(), we also check again if the VMID has seen a GPU reset sin

[PATCH 1/9] drm/amdgpu/userq: rename suspend/resume callbacks

2025-04-11 Thread Alex Deucher
Rename to map and umap to better align with what is happening at the firmware level and remove the extra level of indirection in the MES userq code. Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 10 ++-- drivers/gpu/drm/amd/amdgpu/amdg

[PATCH 3/9] drm/amdgpu/userq: move some code around

2025-04-11 Thread Alex Deucher
Move some userq fence handling code into amdgpu_userq_fence.c. This matches the other code in that file. Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher --- .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 26 +++ .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h | 1 + driver

[PATCH 8/9] drm/amdgpu/userq: add helpers to start/stop scheduling

2025-04-11 Thread Alex Deucher
This will be used to stop/start user queue scheduling for example when switching between kernel and user queues when enforce isolation is enabled. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 68

[PATCH 9/9] drm/amdgpu/userq: integrate with enforce isolation

2025-04-11 Thread Alex Deucher
Enforce isolation serializes access to the GFX IP. User queues are isolated in the MES scheduler, but we still need to serialize between kernel queues and user queues. For enforce isolation, group KGD user queues with KFD user queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/a

[PATCH 7/9] drm/amdgpu: don't swallow errors in amdgpu_userqueue_resume_all()

2025-04-11 Thread Alex Deucher
since we loop through the queues |= the errors. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c index f

[PATCH 5/9] drm/amdgpu/userq: add suspend and resume helpers

2025-04-11 Thread Alex Deucher
Add helpers to unmap and map user queues on suspend and resume. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 39 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h | 3 ++ 2 files changed, 42 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgp

[PATCH V2 4/9] drm/amdgpu/userq: properly clean up userq fence driver on failure

2025-04-11 Thread Alex Deucher
If userq creation fails, we need to properly unwind and free the user queue fence driver. v2: free idr as well (Sunil) Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueu

Re: [PATCH 3/9] drm/amdgpu/userq: move some code around

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/10/2025 11:41 PM, Alex Deucher wrote: Move some userq fence handling code into amdgpu_userq_fence.c. This matches the other code in that file. Signed-off-by: Alex Deucher --- .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 26 +++ .../gpu/drm/

Re: [PATCH v1 3/3] drm/amdgpu: update the error logging for more information

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 9:05 AM Sunil Khatri wrote: > > add process and pid information in the userqueue error > logging to make it more useful in resolving the error > by logs. > > Sample log: > [ 42.444297] [drm:amdgpu_userqueue_wait_for_signal [amdgpu]] *ERROR* Timed > out waiting for fence

[PATCH] drm/amdgpu/gfx: replace a comma with a semicolon

2025-04-11 Thread Alex Deucher
Not techincally wrong, but I think a semicolon was intended here. Fixes: 6cc6e61788f7 ("drm/amdgpu: use a dummy owner for sysfs triggered cleaner shaders v3") Cc: Christian König Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 2 +- 1 file changed, 1 insertion(+), 1 d

[PATCH 2/9] drm/amdgpu/userq: rework front end call sequence

2025-04-11 Thread Alex Deucher
Split out the queue map from the mqd create call and split out the queue unmap from the mqd destroy call. This splits the queue setup and teardown with the actual enablement in the firmware. Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c

Re: [PATCH] drm/amdgpu/gfx: replace a comma with a semicolon

2025-04-11 Thread SRINIVASAN SHANMUGAM
On 4/11/2025 7:50 PM, Alex Deucher wrote: Not techincally wrong, but I think a semicolon was intended here. Fixes: 6cc6e61788f7 ("drm/amdgpu: use a dummy owner for sysfs triggered cleaner shaders v3") Cc: Christian König Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gf

RE: [PATCH] drm/amdkfd: add smi events for process start and end

2025-04-11 Thread Russell, Kent
[Public] Reviewed-by: Kent Russell > -Original Message- > From: amd-gfx On Behalf Of Eric Huang > Sent: Friday, April 11, 2025 9:45 AM > To: Huang, JinHuiEric ; amd- > g...@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdkfd: add smi events for process start and end > > Ping ... >

[PATCH] drm/amdgpu: Add NULL check for 'bo_va' in update_bo_mapping

2025-04-11 Thread Srinivasan Shanmugam
This change adds a check to ensure that 'bo_va' is not null before dereferencing it. If 'bo_va' is null, the function returns early, preventing any potential crashes or undefined behavior Fixes the below: drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c:139 amdgpu_gem_update_bo_mapping() e

Re: [PATCH] drm/amdgpu: Add NULL check for 'bo_va' in update_bo_mapping

2025-04-11 Thread Christian König
Am 11.04.25 um 17:00 schrieb Srinivasan Shanmugam: > This change adds a check to ensure that 'bo_va' is not null before > dereferencing it. If 'bo_va' is null, the function returns early, > preventing any potential crashes or undefined behavior > > Fixes the below: > drivers/gpu/drm/amd/amdgp

[PATCH 6/9] drm/amdgpu/userq: handle system suspend and resume

2025-04-11 Thread Alex Deucher
Unmap user queues on suspend and map them on resume. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_devic

RE: [PATCH] drm/amdkfd: Add rec SDMA engines support with limited XGMI

2025-04-11 Thread Kim, Jonathan
[Public] > -Original Message- > From: amd-gfx On Behalf Of Shane > Xiao > Sent: Thursday, April 10, 2025 12:40 AM > To: amd-gfx@lists.freedesktop.org; Kim, Jonathan > Cc: Xiao, Shane > Subject: [PATCH] drm/amdkfd: Add rec SDMA engines support with limited XGMI > > This patch adds recomm

Re: [v5 1/6] drm/amdgpu: Add the new sdma function pointers for amdgpu_sdma.h

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 4:29 AM jesse.zh...@amd.com wrote: > > From: "jesse.zh...@amd.com" > > This patch introduces new function pointers in the amdgpu_sdma structure > to handle queue stop, start and soft reset operations. These will replace > the older callback mechanism. > > The new functions

Re: [v5 3/6] drm/amdgpu: switch amdgpu_sdma_reset_engine to use the new sdma function pointers

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 4:30 AM jesse.zh...@amd.com wrote: > > From: "jesse.zh...@amd.com" > > Replace old callback mechanism with direct calls to stop/start functions. > > Suggested-by: Alex Deucher > Signed-off-by: Jesse Zhang Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/

Re: [v5 4/6] drm/amdgpu: optimize queue reset and stop logic

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 4:42 AM jesse.zh...@amd.com wrote: > > From: "jesse.zh...@amd.com" > > This patch refactors the SDMA v5.x queue reset and stop logic to improve > code readability, maintainability, and performance. The key changes include: > > 1. **Generalized `sdma_v5_x_gfx_stop` Function

Re: [v5 5/6] drm/amdgpu: Implement SDMA soft reset directly for v5.x

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 4:57 AM jesse.zh...@amd.com wrote: > > From: "jesse.zh...@amd.com" > > This patch introduces a new function `amdgpu_sdma_soft_reset` to handle SDMA > soft resets directly, > rather than relying on the DPM interface. > > 1. **New `amdgpu_sdma_soft_reset` Function**: >-

Re: [v5 6/6] drm/amdgpu:remove old sdma reset callback mechanism

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 4:30 AM jesse.zh...@amd.com wrote: > > From: "jesse.zh...@amd.com" > > This patch removes the deprecated SDMA reset callback mechanism, which was > previously used to register pre-reset and post-reset callbacks for SDMA > engine resets. > The callback mechanism has been

Re: [v5 2/6] drm/amdgpu: Register the new sdma function pointers for each sdma IP version that needs them

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 4:37 AM jesse.zh...@amd.com wrote: > > From: "jesse.zh...@amd.com" > > Register stop/start/soft_reset queue functions for SDMA IP versions > v4.4.2, v5.0 and v5.2. > > Suggested-by: Alex Deucher > Signed-off-by: Jesse Zhang Might want to split this per IP? Either way:

Re: [PATCH v1 3/3] drm/amdgpu: update the error logging for more information

2025-04-11 Thread Khatri, Sunil
On 4/11/2025 7:54 PM, Alex Deucher wrote: On Fri, Apr 11, 2025 at 9:05 AM Sunil Khatri wrote: add process and pid information in the userqueue error logging to make it more useful in resolving the error by logs. Sample log: [ 42.444297] [drm:amdgpu_userqueue_wait_for_signal [amdgpu]] *ERRO

Re: [PATCH V2 4/9] drm/amdgpu/userq: properly clean up userq fence driver on failure

2025-04-11 Thread Khatri, Sunil
LGTM, thanks Alex Reviewed-by: Sunil Khatri On 4/11/2025 7:42 PM, Alex Deucher wrote: If userq creation fails, we need to properly unwind and free the user queue fence driver. v2: free idr as well (Sunil) Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 4 +++

Re: [PATCH 5/9] drm/amdgpu/userq: add suspend and resume helpers

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 7:42 PM, Alex Deucher wrote: Add helpers to unmap and map user queues on suspend and resume. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 39 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h | 3 ++

Re: [PATCH 6/9] drm/amdgpu/userq: handle system suspend and resume

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 7:42 PM, Alex Deucher wrote: Unmap user queues on suspend and map them on resume. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/

Re: [PATCH 7/9] drm/amdgpu: don't swallow errors in amdgpu_userqueue_resume_all()

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 7:42 PM, Alex Deucher wrote: since we loop through the queues |= the errors. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/a

Re: [PATCH 8/9] drm/amdgpu/userq: add helpers to start/stop scheduling

2025-04-11 Thread Khatri, Sunil
On 4/11/2025 7:42 PM, Alex Deucher wrote: This will be used to stop/start user queue scheduling for example when switching between kernel and user queues when enforce isolation is enabled. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm

[PATCH 1/2] drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER_9_0 for Cleaner Shader execution

2025-04-11 Thread Srinivasan Shanmugam
This commit introduces the PACKET3_RUN_CLEANER_SHADER_9_0 definition, which is a command packet utilized to instruct the GPU to execute the cleaner shader for the GFX9.0 graphics architecture. The cleaner shader is a piece of GPU code that is responsible for clearing or initializing essential GPU

Cleaner Shader Management for GFX v9.0 Architecture

2025-04-11 Thread Srinivasan Shanmugam
This patch series enhances the management of the cleaner shader within the AMDGPU driver for GFX v9.0 architecture. The first patch introduces a new packet definition, PACKET3_RUN_CLEANER_SHADER_9_0, to ensure proper execution of the cleaner shader for specific GFX versions. The second patch ref

[PATCH 2/2] drm/amdgpu: Enhance Cleaner Shader Handling in GFX v9.0 Architecture

2025-04-11 Thread Srinivasan Shanmugam
This commit modifies the gfx_v9_0_ring_emit_cleaner_shader function to use a switch statement for cleaner shader emission based on the specific GFX IP version. The function now distinguishes between different IP versions, using PACKET3_RUN_CLEANER_SHADER_9_0 for the versions 9.0.1, 9.1.0, 9.2.1, 9

Re: [PATCH] drm/amdgpu: fix no_user_submission check for SDMA

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 6:20 PM, Alex Deucher wrote: Copy paste typo. Use the flag from the sdma structure. Fixes: 4310acd4464b ("drm/amdgpu: add ring flag for no user submissions") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 2 +- 1 file chang

Re: [v2 2/2] drm/amdgpu: Enable TMZ support for GC 11.0.0

2025-04-11 Thread Deucher, Alexander
[AMD Official Use Only - AMD Internal Distribution Only] GFX11 is a dGPU. We don't currently have a way to deal with migration of encrypted buffers to system ram. Alex From: jesse.zh...@amd.com Sent: Tuesday, April 8, 2025 2:32 AM To: amd-gfx@lists.freedesktop.

Re: [PATCH] drm/amdgpu/userq/mes: remove unused header

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/10/2025 11:48 PM, Alex Deucher wrote: This is unused so remove it. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c b/drivers/gpu/drm/amd/

Re: [PATCH 2/2] drm/amdgpu: Enhance Cleaner Shader Handling in GFX v9.0 Architecture

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 12:37 PM Srinivasan Shanmugam wrote: > > This commit modifies the gfx_v9_0_ring_emit_cleaner_shader function > to use a switch statement for cleaner shader emission based on the > specific GFX IP version. > > The function now distinguishes between different IP versions, usi

Re: [PATCH 02/13] drm/amdgpu/userq: add UAPI for setting queue priority

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 12:23 AM, Alex Deucher wrote: Allow the user to set a queue priority levels: 0 - normal low - most apps (maps to MES AMD_PRIORITY_LEVEL_NORMAL) 1 - low - background jobs (maps to MES AMD_PRIORITY_LEVEL_LOW) 2 - normal high - apps that need relative high (m

Re: [PATCH 01/13] drm/amdgpu: convert userq UAPI _pad to flags

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 12:23 AM, Alex Deucher wrote: Reuse the _pad field for flags. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 4 ++-- include/uapi/drm/amdgpu_drm.h | 5 - 2 files changed, 6 insertions(+), 3 deletio

Re: [PATCH 03/13] drm/amdgpu/mes11: add conversion for priority levels

2025-04-11 Thread Khatri, Sunil
Do you expect priority level in MES11 12 and probably 13 too ? If they are same then we should be using the same conversion function for all versions of MES. For now its fine. Reviewed-by: Sunil Khatri On 4/11/2025 12:23 AM, Alex Deucher wrote: Convert driver priority levels to MES11 priority

[PATCH] drm/amdkfd: Add rec SDMA engines support with limited XGMI

2025-04-11 Thread Shane Xiao
This patch adds recommended SDMA engines with limited XGMI SDMA engines. It will help improve overall performance for device to device copies with this optimization. v2: Update the formatting issues and data type Signed-off-by: Shane Xiao Suggested-by: Jonathan Kim Reviewed-by: Jonathan Kim --

Re: [PATCH 05/13] drm/amdgpu/user: add priorty to user queue structure

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 12:23 AM, Alex Deucher wrote: So we can track this when we create user queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueu

Re: [PATCH 06/13] drm/amdgpu/userq/mes: handle user queue priority

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 12:23 AM, Alex Deucher wrote: Handle the queue priority set by the user. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 17 - 1 file changed, 16 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/a

Re: [PATCH 8/9] drm/amdgpu/userq: add helpers to start/stop scheduling

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 12:17 PM Khatri, Sunil wrote: > > > On 4/11/2025 7:42 PM, Alex Deucher wrote: > > This will be used to stop/start user queue scheduling for > > example when switching between kernel and user queues when > > enforce isolation is enabled. > > > > Signed-off-by: Alex Deucher

RE: [PATCH 2/2] drm/amdgpu: Enable doorbell for JPEG5_0_1

2025-04-11 Thread Liu, Leo
[AMD Official Use Only - AMD Internal Distribution Only] The series is: Reviewed-by: Leo Liu > -Original Message- > From: Sundararaju, Sathishkumar > Sent: April 10, 2025 9:01 AM > To: amd-gfx@lists.freedesktop.org > Cc: Liu, Leo ; Zhang, Hawking > ; Sundararaju, Sathishkumar > > Subje

Re: [PATCH 07/13] drm/amdgpu/userq: enable support for queue priorities

2025-04-11 Thread Khatri, Sunil
A small comment otherwise it looks great. Reviewed-by: Sunil Khatri On 4/11/2025 12:23 AM, Alex Deucher wrote: Enable users to create queues at different priority levels. The highest level is restricted to drm master. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userque

Re: [PATCH 08/13] drm/amdgpu/userq: add UAPI for setting up secure queues

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 12:23 AM, Alex Deucher wrote: If the queues needs to access TMZ surfaces, it must be set up as secure. Signed-off-by: Alex Deucher --- include/uapi/drm/amdgpu_drm.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/include/uapi/drm/amdgpu_drm.h b/

Re: [PATCH 10/13] drm/amdgpu/gfx11: add support for TMZ queues to mqd_init

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 12:23 AM, Alex Deucher wrote: Set up TMZ for queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gf

Re: [PATCH 12/13] drm/amdgpu/userq/mes: pass the secure flag to mqd init

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 12:24 AM, Alex Deucher wrote: So that we initialize the MQD as a secure queue. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.

Re: [PATCH 9/9] drm/amdgpu/userq: integrate with enforce isolation

2025-04-11 Thread Khatri, Sunil
Are we replacing the kfx user queue with KGD userqueue names here? Also this looks like KFD user queue and KGD userqueue are both treated at par ? Looks good in general if the above understanding is correct. Some one with better understanding of isolation should review. Acked-by: Sunil Khatri

Re: [PATCH 13/13] drm/amdgpu/userq: enable support for secure queues

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 12:24 AM, Alex Deucher wrote: Enable users to create secure GFX/compute queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/d

Re: [PATCH] drm/amdgpu/gfx: replace a comma with a semicolon

2025-04-11 Thread Christian König
Am 11.04.25 um 16:20 schrieb Alex Deucher: > Not techincally wrong, but I think a semicolon was > intended here. > > Fixes: 6cc6e61788f7 ("drm/amdgpu: use a dummy owner for sysfs triggered > cleaner shaders v3") > Cc: Christian König > Signed-off-by: Alex Deucher Reviewed-by: Christian König

Re: [PATCH 09/13] drm/amdgpu: add tmz queue parameter to mqd props

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 12:23 AM, Alex Deucher wrote: Use this to track the whether we want TMZ for queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/

Re: [PATCH 11/13] drm/amdgpu/gfx12: add support for TMZ queues to mqd_init

2025-04-11 Thread Khatri, Sunil
Reviewed-by: Sunil Khatri On 4/11/2025 12:24 AM, Alex Deucher wrote: Set up TMZ for queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gf

Re: [PATCH 8/9] drm/amdgpu/userq: add helpers to start/stop scheduling

2025-04-11 Thread Khatri, Sunil
On 4/11/2025 10:22 PM, Alex Deucher wrote: On Fri, Apr 11, 2025 at 12:17 PM Khatri, Sunil wrote: On 4/11/2025 7:42 PM, Alex Deucher wrote: This will be used to stop/start user queue scheduling for example when switching between kernel and user queues when enforce isolation is enabled. Sign

Re: [PATCH 04/13] drm/amdgpu/mes12: add conversion for priority levels

2025-04-11 Thread Khatri, Sunil
Same comment here as MES11 that once we have confirmation we might plan to use same function for all. Reviewed-by: Sunil Khatri On 4/11/2025 12:23 AM, Alex Deucher wrote: Convert driver priority levels to MES11 priority levels. At the moment they are the same, but they may not always be. Sign

[PATCH v2 0/2] Cleaner Shader Management for GFX v9.0 Architecture

2025-04-11 Thread Srinivasan Shanmugam
v2: Simplified logic in second patch (Alex). Srinivasan Shanmugam (2): drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER_9_0 for Cleaner Shader execution drm/amdgpu: Enhance Cleaner Shader Handling in GFX v9.0 Architecture v2 drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 8 +++- drivers/gpu/d

[PATCH v2 2/2] drm/amdgpu: Enhance Cleaner Shader Handling in GFX v9.0 Architecture v2

2025-04-11 Thread Srinivasan Shanmugam
This commit modifies the gfx_v9_0_ring_emit_cleaner_shader function to use a switch statement for cleaner shader emission based on the specific GFX IP version. The function now distinguishes between different IP versions, using PACKET3_RUN_CLEANER_SHADER_9_0 for the versions 9.0.1, 9.1.0, 9.2.1, 9

Re: [PATCH 8/9] drm/amdgpu/userq: add helpers to start/stop scheduling

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 1:26 PM Khatri, Sunil wrote: > > > On 4/11/2025 10:22 PM, Alex Deucher wrote: > > On Fri, Apr 11, 2025 at 12:17 PM Khatri, Sunil wrote: > >> > >> On 4/11/2025 7:42 PM, Alex Deucher wrote: > >>> This will be used to stop/start user queue scheduling for > >>> example when sw

[PATCH v2 1/2] drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER_9_0 for Cleaner Shader execution

2025-04-11 Thread Srinivasan Shanmugam
This commit introduces the PACKET3_RUN_CLEANER_SHADER_9_0 definition, which is a command packet utilized to instruct the GPU to execute the cleaner shader for the GFX9.0 graphics architecture. The cleaner shader is a piece of GPU code that is responsible for clearing or initializing essential GPU

Re: [PATCH 9/9] drm/amdgpu/userq: integrate with enforce isolation

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 12:38 PM Khatri, Sunil wrote: > > Are we replacing the kfx user queue with KGD userqueue names here? > Also this looks like KFD user queue and KGD userqueue are both treated > at par ? Yeah, I could split this into two patches, one to rename the variables because they are

Re: [PATCH] drm/amdgpu: cleanup amdgpu_vm_flush v6

2025-04-11 Thread SRINIVASAN SHANMUGAM
On 4/11/2025 7:24 PM, Christian König wrote: This reverts commit c2cc3648ba517a6c270500b5447d5a1efdad5936. Turned out that this has some negative consequences for some workloads. Instead check if the cleaner shader should run directly. While at it remove amdgpu_vm_need_pipeline_sync(), we also

Re: [PATCH 1/2] drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER_9_0 for Cleaner Shader execution

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 12:28 PM Srinivasan Shanmugam wrote: > > This commit introduces the PACKET3_RUN_CLEANER_SHADER_9_0 definition, > which is a command packet utilized to instruct the GPU to execute the > cleaner shader for the GFX9.0 graphics architecture. > > The cleaner shader is a piece of

Re: [PATCH v2 2/2] drm/amdgpu: Enhance Cleaner Shader Handling in GFX v9.0 Architecture v2

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 1:40 PM Srinivasan Shanmugam wrote: > > This commit modifies the gfx_v9_0_ring_emit_cleaner_shader function > to use a switch statement for cleaner shader emission based on the > specific GFX IP version. > > The function now distinguishes between different IP versions, usin

Re: [PATCH 07/13] drm/amdgpu/userq: enable support for queue priorities

2025-04-11 Thread Alex Deucher
On Fri, Apr 11, 2025 at 1:18 PM Khatri, Sunil wrote: > > A small comment otherwise it looks great. > Reviewed-by: Sunil Khatri > > On 4/11/2025 12:23 AM, Alex Deucher wrote: > > Enable users to create queues at different priority levels. > > The highest level is restricted to drm master. > > > >

[PATCH 04/10] drm/amdgpu/userq: properly clean up userq fence driver on failure

2025-04-11 Thread Alex Deucher
If userq creation fails, we need to properly unwind and free the user queue fence driver. v2: free idr as well (Sunil) Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm

[PATCH 05/10] drm/amdgpu/userq: add suspend and resume helpers

2025-04-11 Thread Alex Deucher
Add helpers to unmap and map user queues on suspend and resume. Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 39 +++ drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h | 3 ++ 2 files changed, 42 insertions(+) diff --git

[PATCH 06/10] drm/amdgpu/userq: handle system suspend and resume

2025-04-11 Thread Alex Deucher
Unmap user queues on suspend and map them on resume. Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/

[PATCH 08/10] drm/amdgpu/userq: track the xcp_id associated with the queue

2025-04-11 Thread Alex Deucher
Track this to align with KFD for enforce isolation handling. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.h index 381b

[PATCH 03/10] drm/amdgpu/userq: move some code around

2025-04-11 Thread Alex Deucher
Move some userq fence handling code into amdgpu_userq_fence.c. This matches the other code in that file. Reviewed-by: Sunil Khatri Signed-off-by: Alex Deucher --- .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 26 +++ .../gpu/drm/amd/amdgpu/amdgpu_userq_fence.h | 1 + driver

  1   2   >