[PATCH] drm/amd/amdgpu: Fix out of bounds warning in amdgpu_hw_ip_info

2025-04-11 Thread jesse.zh...@amd.com
Fix an array index out of bounds warning in the DMA IP case of amdgpu_hw_ip_info() where it was incorrectly checking adev->gfx.gfx_ring[i].no_user_submission instead of adev->sdma.instance[i].ring.no_user_submission. The mismatch caused UBSAN to report an array bounds violation since it was access

[v5 1/6] drm/amdgpu: Add the new sdma function pointers for amdgpu_sdma.h

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch introduces new function pointers in the amdgpu_sdma structure to handle queue stop, start and soft reset operations. These will replace the older callback mechanism. The new functions are: - stop_kernel_queue: Stops a specific

[v5 2/6] drm/amdgpu: Register the new sdma function pointers for each sdma IP version that needs them

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Register stop/start/soft_reset queue functions for SDMA IP versions v4.4.2, v5.0 and v5.2. Suggested-by: Alex Deucher Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 22 +++--- drivers/gpu/drm/amd/amdgpu/sdma_v5_

[v5 3/6] drm/amdgpu: switch amdgpu_sdma_reset_engine to use the new sdma function pointers

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Replace old callback mechanism with direct calls to stop/start functions. Suggested-by: Alex Deucher Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 34 +++- 1 file changed, 4 insertions(+), 30 deletions(-)

[v5 4/6] drm/amdgpu: optimize queue reset and stop logic

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA v5.x queue reset and stop logic to improve code readability, maintainability, and performance. The key changes include: 1. **Generalized `sdma_v5_x_gfx_stop` Function**: - Added an `inst_mask` parameter to allow stopping spe

[v5 5/6] drm/amdgpu: Implement SDMA soft reset directly for v5.x

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch introduces a new function `amdgpu_sdma_soft_reset` to handle SDMA soft resets directly, rather than relying on the DPM interface. 1. **New `amdgpu_sdma_soft_reset` Function**: - Implements a soft reset for SDMA engines by directly writ

[v5 6/6] drm/amdgpu:remove old sdma reset callback mechanism

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch removes the deprecated SDMA reset callback mechanism, which was previously used to register pre-reset and post-reset callbacks for SDMA engine resets. The callback mechanism has been replaced with a more direct and efficient approach using `

[v2] drm/amd/amdgpu: Fix array bounds check in amdgpu_hw_ip_info

2025-04-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Fix an array index out of bounds warning in the DMA IP case of amdgpu_hw_ip_info() where it was incorrectly checking adev->gfx.gfx_ring[i].no_user_submission instead of adev->sdma.instance[i].ring.no_user_submission. The mismatch caused UBSAN to

[PATCH V7 2/3] drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush

2025-03-04 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" - Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and

[PATCH 2/2] drm/amdgpu: Add SDMA queue start/stop functions and integrate with ring funcs

2025-03-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch introduces two new functions, `amdgpu_sdma_stop_queue` and `amdgpu_sdma_start_queue`, to handle the stopping and starting of SDMA queues during engine reset operations. The changes include: 1. **New Functions**: - `amdgpu_sdma_stop_queue`:

[PATCH 1/2] drm/amdgpu: Add SDMA queue start/stop callbacks to amdgpu_ring_funcs

2025-03-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch introduces two new callbacks, `stop_queue` and `start_queue`, to the `amdgpu_ring_funcs` structure. These callbacks are designed to handle the stopping and starting of SDMA queues during engine reset operations. The changes include: 1. **A

[PATCH 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA

2025-02-28 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that al

[PATCH V6 2/3] drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush

2025-02-28 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" - Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and

[PATCH v6 1/3] drm/amd/amdgpu: Increase max rings to enable SDMA page ring

2025-02-28 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Increase the maximum number of rings supported by the AMDGPU driver from 133 to 149. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +- 1 file changed, 1

[PATCH 6/7] drm/amd/amdgpu: Refactor SDMA v5.2 reset logic into stop_queue and restore_queue functions

2025-03-12 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA v5.2 reset logic by splitting the `sdma_v5_2_reset_queue` function into two separate functions: `sdma_v5_2_stop_queue` and `sdma_v5_2_restore_queue`. This change aligns with the new SDMA reset mechanism, where the reset p

[PATCH 3/7] drm/amdgpu: Optimize SDMA v5.0 queue reset and stop logic

2025-03-12 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA v5.0 queue reset and stop logic to improve code readability, maintainability, and performance. The key changes include: 1. **Generalized `sdma_v5_0_gfx_stop` Function**: - Added an `inst_mask` parameter to allow stopping spe

[PATCH V7 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA

2025-03-04 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that al

[PATCH v7 1/3] drm/amd/amdgpu: Increase max rings to enable SDMA page ring

2025-03-04 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Increase the maximum number of rings supported by the AMDGPU driver from 133 to 149. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +- 1 file changed, 1

[PATCH 7/7] drm/amd/amdgpu: Remove deprecated SDMA reset callback mechanism

2025-03-12 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch removes the deprecated SDMA reset callback mechanism, which was previously used to register pre-reset and post-reset callbacks for SDMA engine resets. The callback mechanism has been replaced with a more direct and efficient approach using `

[PATCH 2/7] drm/amd/amdgpu: Implement SDMA soft reset directly for sdma v5

2025-03-12 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch introduces a new function `amdgpu_sdma_soft_reset` to handle SDMA soft resets directly, rather than relying on the DPM interface. 1. **New `amdgpu_sdma_soft_reset` Function**: - Implements a soft reset for SDMA engines by directly writ

[PATCH 5/7] drm/amdgpu: Optimize SDMA v5.2 queue reset and stop logic

2025-03-12 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA v5.2 queue reset and stop logic to improve code readability, maintainability, and performance. The key changes include: 1. **Generalized `sdma_v5_2_gfx_stop` Function**: - Added an `inst_mask` parameter to allow stoppin

[PATCH 4/7] drm/amd/amdgpu: Refactor SDMA v5.0 reset logic into stop_queue and restore_queue functions

2025-03-12 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA v5.0 reset logic by splitting the `sdma_v5_0_reset_queue` function into two separate functions: `sdma_v5_0_stop_queue` and `sdma_v5_0_restore_queue`. This change aligns with the new SDMA reset mechanism, where the reset p

[PATCH 1/7] drm/amd/amdgpu: Simplify SDMA reset mechanism by removing dynamic callbacks

2025-03-12 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Since KFD no longer registers its own callbacks for SDMA resets, and only KGD uses the reset mechanism, we can simplify the SDMA reset flow by directly calling the ring's `stop_queue` and `start_queue` functions. This patch removes the dynamic callbac

[PATCH V4 2/3] drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush

2025-02-25 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" - Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and

[PATCH v4 1/3] drm/amd/amdgpu: Increase max rings to enable SDMA page ring

2025-02-25 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Increase the maximum number of rings supported by the AMDGPU driver from 132 to 148. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +- 1 file changed, 1

[PATCH V4 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA

2025-02-25 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that al

[PATCH] drm/amdgpu: drm/amdgpu/job: fix is_guilty logic change (v2)

2025-02-24 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" The is Reviewed-by: Jesse Zhang Incrementing the gpu_reset counter needs to be in the is_guilty block. Alos move the fence error before the reset to keep the original ordering. Fixes: f447ba2bbd48 ("drm/amdgpu: Update amdgpu_job_timedout to check

[PATCH 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA

2025-02-27 Thread jesse.zh...@amd.com
This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that all relevant page table cache levels (L1 PTEs,

[PATCH 2/3 v5] drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush

2025-02-27 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" - Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and

[PATCH 1/3 v5] drm/amd/amdgpu: Increase max rings to enable SDMA page ring

2025-02-27 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Increase the maximum number of rings supported by the AMDGPU driver from 132 to 148. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +- 1 file changed, 1

[PATCH] Revert "drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA"

2025-03-25 Thread jesse.zh...@amd.com
This temporarily reverts commit 47cad92043909928d7260b77e7a996a0ae043f8c. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 77 --- .../gpu/drm/amd/amdgpu/vega10_sdma_pkt_open.h | 70 - 2 files changed, 14 insertions(+), 133 deletions

[PATCH 2/2] drm/amdgpu: Enable TMZ support for GC 11.0.0

2025-04-03 Thread jesse.zh...@amd.com
Add IP_VERSION(11, 0, 0) to the list of GPU generations that support TMZ. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c index 4646252828

[v3 5/7] drm/amdgpu: Optimize SDMA v5.2 queue reset and stop logic

2025-04-02 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA v5.2 queue reset and stop logic to improve code readability, maintainability, and performance. The key changes include: 1. **Generalized `sdma_v5_2_gfx_stop` Function**: - Added an `inst_mask` parameter to allow stoppin

[v3 7/7] drm/amd/amdgpu: Remove deprecated SDMA reset callback mechanism

2025-04-02 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch removes the deprecated SDMA reset callback mechanism, which was previously used to register pre-reset and post-reset callbacks for SDMA engine resets. The callback mechanism has been replaced with a more direct and efficient approach using `

[v3 6/7] drm/amd/amdgpu: Refactor SDMA v5.2 reset logic into stop_queue and restore_queue functions

2025-04-02 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA v5.2 reset logic by splitting the `sdma_v5_2_reset_queue` function into two separate functions: `sdma_v5_2_stop_queue` and `sdma_v5_2_restore_queue`. This change aligns with the new SDMA reset mechanism, where the reset p

[v3 2/7] drm/amd/amdgpu: Implement SDMA soft reset directly for sdma v5

2025-04-02 Thread jesse.zh...@amd.com
This patch introduces a new function `amdgpu_sdma_soft_reset` to handle SDMA soft resets directly, rather than relying on the DPM interface. 1. **New `amdgpu_sdma_soft_reset` Function**: - Implements a soft reset for SDMA engines by directly writing to the hardware registers. - Handles SDM

[v3 3/7] drm/amdgpu: Optimize SDMA v5.0 queue reset and stop logic

2025-04-02 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA v5.0 queue reset and stop logic to improve code readability, maintainability, and performance. The key changes include: 1. **Generalized `sdma_v5_0_gfx_stop` Function**: - Added an `inst_mask` parameter to allow stopping spe

<    1   2