[PATCH] drm/amdgpu: fix queue reset issue by mmio

2024-09-04 Thread jesse.zh...@amd.com
Initialize the queue type before resetting the queue using mmio. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c index f7d5d4f08a53..10b61

[PATCH] drm/amdkfd: clean up code for interrupt v10

2024-09-05 Thread jesse.zh...@amd.com
Variable hub_inst is unused. Related the commit "bde7ae79ca40": "drm/amdkfd: Drop poison hanlding from gfx v10" Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v10.c | 13 - 1 file changed, 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_pro

[PATCH] drm/amdkfd: Fix resource leak in riu rsetore queue

2024-09-08 Thread jesse.zh...@amd.com
To avoid memory leaks, release q_extra_data when exiting the restore queue. v2: Correct the proto (Alex) Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process_queue_manager.

[PATCH] drm/amdgpu: add the command AMDGPU_INFO_QUEUE_RESET to query queue reset

2024-10-18 Thread jesse.zh...@amd.com
Not all ASICs support the queue reset feature. Therefore, userspace can query this feature via AMDGPU_INFO_QUEUE_RESET before validating a queue reset. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 27 + include/uapi/drm/amdgpu_drm.h |

[PATCH 1/2] drm/amdgpu: add amdgpu_gfx_sched_mask and amdgpu_compute_sched_mask debugfs

2024-10-17 Thread jesse.zh...@amd.com
compute/gfx may have multiple rings on some hardware. In some cases, userspace wants to run jobs on a specific ring for validation purposes. This debugfs entry helps to disable or enable submitting jobs to a specific ring. This entry is populated only if there are at least two or more cores in th

[PATCH 2/2] drm/amdgpu: add amdgpu_sdma_sched_mask debugfs

2024-10-17 Thread jesse.zh...@amd.com
Userspace wants to run jobs on a specific sdma ring for verification purposes. This debugfs entry helps to disable or enable submitting jobs to a specific ring. This entry is populated only if there are at least two or more cores in the sdma ip. Signed-off-by: Jesse Zhang Suggested-by:Alex Deuc

[PATCH 4/5 V2] drm/amdgpu: Add sysfs interface for vpe reset mask

2024-10-22 Thread jesse.zh...@amd.com
Add the sysfs interface for vpe: vpe_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) Sign

[PATCH 1/5 V2] drm/amdgpu: Add sysfs interface for gc reset mask

2024-10-22 Thread jesse.zh...@amd.com
Add two sysfs interfaces for gfx and compute: gfx_reset_mask compute_reset_mask These interfaces are read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead

[PATCH 2/5 V2] drm/amdgpu: Add sysfs interface for sdma reset mask

2024-10-22 Thread jesse.zh...@amd.com
Add the sysfs interface for sdma: sdma_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) Signed

[PATCH 3/5 V2] drm/amdgpu: Add sysfs interface for vcn reset mask

2024-10-22 Thread jesse.zh...@amd.com
Add the sysfs interface for vcn: vcn_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) Signed-o

[PATCH 5/5 V2] drm/amdgpu: Add sysfs interface for jpeg reset mask

2024-10-22 Thread jesse.zh...@amd.com
Add the sysfs interface for jpeg: jpeg_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) Signed

[PATCH 4/5] drm/amdgpu: Add sysfs interface for vpe reset mask

2024-10-22 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Add the sysfs interface for vpe: vpe_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. Signed-off-by: Jesse Zhang Suggest

[PATCH 2/5] drm/amdgpu: Add sysfs interface for sdma reset mask

2024-10-22 Thread jesse.zh...@amd.com
Add the sysfs interface for sdma: sdma_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. Signed-off-by: Jesse Zhang Suggested-by:Alex Deucher --- drivers/gpu/drm/amd/am

[PATCH 1/5] drm/amdgpu: Add sysfs interface for gc reset mask

2024-10-22 Thread jesse.zh...@amd.com
Add two sysfs interfaces for gfx and compute: gfx_reset_mask compute_reset_mask These interfaces are read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. Signed-off-by: Jesse Zhang Suggested-by:Alex De

[PATCH 3/5] drm/amdgpu: Add sysfs interface for vcn reset mask

2024-10-22 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Add the sysfs interface for vcn: vcn_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. Signed-off-by: Jesse Zhang Suggested-by:Al

[PATCH 5/5] drm/amdgpu: Add sysfs interface for jpeg reset mask

2024-10-22 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Add the sysfs interface for jpeg: jpeg_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. Signed-off-by: Jesse Zhang Suggest

[PATCH V3 1/5] drm/amdgpu: Add sysfs interface for gc reset mask

2024-10-24 Thread jesse.zh...@amd.com
Add two sysfs interfaces for gfx and compute: gfx_reset_mask compute_reset_mask These interfaces are read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead

[PATCH 1/5 V4 1/5] drm/amdgpu: Add sysfs interface for gc reset mask

2024-10-29 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Add two sysfs interfaces for gfx and compute: gfx_reset_mask compute_reset_mask These interfaces are read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2:

[PATCH 2/5 V4] drm/amdgpu: Add sysfs interface for sdma reset mask

2024-10-29 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Add the sysfs interface for sdma: sdma_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string

[PATCH V4 3/5] drm/amdgpu: Add sysfs interface for vcn reset mask

2024-10-28 Thread jesse.zh...@amd.com
Add the sysfs interface for vcn: vcn_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) V2: the

[PATCH V4 5/5] drm/amdgpu: Add sysfs interface for jpeg reset mask

2024-10-28 Thread jesse.zh...@amd.com
Add the sysfs interface for jpeg: jpeg_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) v3: add

[PATCH V4 4/5] drm/amdgpu: Add sysfs interface for vpe reset mask

2024-10-28 Thread jesse.zh...@amd.com
Add the sysfs interface for vpe: vpe_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) v3: a

[PATCH V4 2/5] drm/amdgpu: Add sysfs interface for sdma reset mask

2024-10-28 Thread jesse.zh...@amd.com
Add the sysfs interface for sdma: sdma_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead of some flags (Christian) v3: add

[PATCH V4 1/5] drm/amdgpu: Add sysfs interface for gc reset mask

2024-10-28 Thread jesse.zh...@amd.com
Add two sysfs interfaces for gfx and compute: gfx_reset_mask compute_reset_mask These interfaces are read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string instead

[PATCH] drm/amdgpu: fix double free vcn ip_dump

2024-11-10 Thread jesse.zh...@amd.com
[ 90.441868] [ cut here ] [ 90.441873] kernel BUG at mm/slub.c:553! [ 90.441885] Oops: invalid opcode: [#1] PREEMPT SMP NOPTI [ 90.441892] CPU: 0 PID: 1523 Comm: amd_pci_unplug Tainted: GE 6.10.0+ #47 [ 90.441900] Hardware name: AMD Splinter/

[PATCH V3 2/5] drm/amdgpu: Add sysfs interface for sdma reset mask

2024-10-25 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Add the sysfs interface for sdma: sdma_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string

[PATCH V3 3/5] drm/amdgpu: Add sysfs interface for vcn reset mask

2024-10-24 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Add the sysfs interface for vcn: vcn_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string

[PATCH V3 4/5] drm/amdgpu: Add sysfs interface for vpe reset mask

2024-10-24 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Add the sysfs interface for vpe: vpe_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text stri

[PATCH] drm/admgpu: fix vcn reset sysfs warning

2024-11-12 Thread jesse.zh...@amd.com
sysfs: cannot create duplicate filename '/devices/pci:00/:00:01.1/:01:00.0/:02:00.0/:03:00.0/vcn_reset_mask' [ 562.443738] CPU: 13 PID: 4888 Comm: modprobe Tainted: GE 6.10.0+ #51 [ 562.443740] Hardware name: AMD Splinter/Splinter-RPL, BIOS VS2683299N.FD 05

[PATCH V2] drm/admgpu: fix vcn reset sysfs warning

2024-11-12 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" sysfs: cannot create duplicate filename '/devices/pci:00/:00:01.1/:01:00.0/:02:00.0/:03:00.0/vcn_reset_mask' [ 562.443738] CPU: 13 PID: 4888 Comm: modprobe Tainted: GE 6.10.0+ #51 [ 562.443740] Hardwar

[PATCH 2/2] drm/amdgpu: fix vcn sw init failed

2024-11-12 Thread jesse.zh...@amd.com
[ 2875.870277] [drm:amdgpu_device_init [amdgpu]] *ERROR* sw_init of IP block failed -22 [ 2875.880494] amdgpu :01:00.0: amdgpu: amdgpu_device_ip_init failed [ 2875.887689] amdgpu :01:00.0: amdgpu: Fatal error during GPU init [ 2875.894791] amdgpu :01:00.0: amdgpu: amdgpu: finishing de

[PATCH 1/2] drm/admgpu: fix vcn sw init failed

2024-11-12 Thread jesse.zh...@amd.com
For multiple vcn instances, to avoid creating reset sysfs multiple times, add the instance paramter in reset mask init. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 8 drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 4 ++-- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c

[PATCH V3 5/5] drm/amdgpu: Add sysfs interface for jpeg reset mask

2024-10-24 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Add the sysfs interface for jpeg: jpeg_reset_mask The interface is read-only and show the resets supported by the IP. For example, full adapter reset (mode1/mode2/BACO/etc), soft reset, queue reset, and pipe reset. V2: the sysfs node returns a text string

[PATCH] drm/amdgpu: fix warning when removing sysfs

2024-11-07 Thread jesse.zh...@amd.com
Fix similar warning when running IGT: [ 155.585721] kernfs: can not remove 'enforce_isolation', no directory [ 155.592201] WARNING: CPU: 3 PID: 6960 at fs/kernfs/dir.c:1683 kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.601145] Modules linked in: xt_MASQUERADE xt_comment nft_compat veth bridge stp

[PATCH V2] drm/amdgpu: fix warning when removing sysfs

2024-11-08 Thread jesse.zh...@amd.com
Fix the similar warning: [ 155.585721] kernfs: can not remove 'enforce_isolation', no directory [ 155.592201] WARNING: CPU: 3 PID: 6960 at fs/kernfs/dir.c:1683 kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.601145] Modules linked in: xt_MASQUERADE xt_comment nft_compat veth bridge stp llc overlay n

[PATCH 1/3] Revert "drm/amdgpu: fix a mistake when removing mem_info_preempt_used sysfs"

2024-11-18 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This reverts commit 10aec8943bcc5123288ded8c97e78312bcf17fb1. the dev->unplugged flag will also be set to true , Only uninstall the driver by amdgpu_exit, not actually unplug the device. that will cause a new issue. Signed-off-by: Jesse Zhang --- dr

[PATCH 3/3 V2] drm/amdgpu: Fix sysfs warning when hotplugging

2024-11-18 Thread jesse.zh...@amd.com
Fix the similar warning when hotplugging: [ 155.585721] kernfs: can not remove 'enforce_isolation', no directory [ 155.592201] WARNING: CPU: 3 PID: 6960 at fs/kernfs/dir.c:1683 kernfs_remove_by_name_ns+0xb9/0xc0 [ 155.601145] Modules linked in: xt_MASQUERADE xt_comment nft_compat veth bridge

[PATCH] drm/amdgpu: Fix sysfs warning when hotplugging

2024-11-14 Thread jesse.zh...@amd.com
Replace the check drm_dev_enter with sysfs directory entry. Because the dev->unplugged flag will also be set to true, Only uninstall the driver by amdgpu_exit, not actually unplug the device. Signed-off-by: Jesse Zhang Reported-by: Andy Dong --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c |

[PATCH 3/3] drm/amdgpu: Fix sysfs warning when hotplugging

2024-11-17 Thread jesse.zh...@amd.com
Replace the check drm_dev_enter with sysfs directory entry. Because the dev->unplugged flag will also be set to true, Only uninstall the driver by amdgpu_exit, not actually unplug the device. Signed-off-by: Jesse Zhang Reported-by: Andy Dong --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c |

[PATCH 2/3] Revert "drm/amdgpu: fix warning when removing sysfs"

2024-11-17 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This reverts commit 330d97e9b14e0c85cc8b63e0092e4abcb9ce99c8. the dev->unplugged flag will also be set to true , Only uninstall the driver by amdgpu_exit,not actually unplug the device. that will cause a new issue. Signed-off-by: Jesse Zhang --- driver

[PATCH V4] drm/amdkfd: pause autosuspend when creating pdd

2024-12-05 Thread jesse.zh...@amd.com
When using MES creating a pdd will require talking to the GPU to setup the relevant context. The code here forgot to wake up the GPU in case it was in suspend, this causes KVM to EFAULT for passthrough GPU for example. This issue can be masked if the GPU was woken up by other things (e.g. opening t

[PATCH 2/2] drm/amdgpu/gfx12: implement kgq reset via mmio

2025-01-05 Thread jesse.zh...@amd.com
replace MES kgq reset with MMIO. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c index 69941442f00b..ba2ab9296eb4 100644

[PATCH 1/2] drm/amdgpu: enable gfx12 queue reset flag

2025-01-05 Thread jesse.zh...@amd.com
Enable the kcg and kcq queue reset flag Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c index 3aa34c4d..6994144

[PATCH 1/4] drm/amdgpu/kfd: Add shared SDMA reset functionality with callback support

2025-02-09 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch introduces shared SDMA reset functionality between AMDGPU and KFD. The implementation includes the following key changes: 1. Added `amdgpu_sdma_reset_queue`: - Resets a specific SDMA queue by instance ID. - Invokes registered pre-reset and

[PATCH 4/4] drm/amdgpu: Improve SDMA reset logic with guilty queue tracking

2025-02-09 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit introduces several improvements to the SDMA reset logic: 1. Added `cached_rptr` to the `amdgpu_ring` structure to store the read pointer before a reset, ensuring proper state restoration after reset. 2. Introduced `gfx_guilty` and `page_gui

[PATCH 3/4] drm/amdgpu: Add common lock and reset caller parameter for SDMA reset synchronization

2025-02-09 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit introduces a caller parameter to the amdgpu_sdma_reset_instance function to differentiate between reset requests originating from the KGD and KFD. This change ensures proper synchronization between KGD and KFD during SDMA resets. If the cal

[PATCH 2/4] drm/amdgpu/sdma: Refactor SDMA reset functionality and add callback support

2025-02-09 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA reset functionality in the `sdma_v4_4_2` driver to improve modularity and support shared usage between AMDGPU and KFD. The changes include: 1. **Refactored SDMA Reset Logic**: - Split the `sdma_v4_4_2_reset_queue` functio

[PATCH v5 4/4] drm/amdgpu: Improve SDMA reset logic with guilty queue tracking

2025-02-10 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit introduces several improvements to the SDMA reset logic: 1. Added `cached_rptr` to the `amdgpu_ring` structure to store the read pointer before a reset, ensuring proper state restoration after reset. 2. Introduced `gfx_guilty` and `page_gui

[PATCH 2/4 v6] drm/amdgpu/sdma: Refactor SDMA reset functionality and add callback support

2025-02-10 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA reset functionality in the `sdma_v4_4_2` driver to improve modularity and support shared usage between AMDGPU and KFD. The changes include: 1. **Refactored SDMA Reset Logic**: - Split the `sdma_v4_4_2_reset_queue` functio

[PATCH 1/4 v6] drm/amdgpu/kfd: Add shared SDMA reset functionality with callback support

2025-02-10 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch introduces shared SDMA reset functionality between AMDGPU and KFD. The implementation includes the following key changes: 1. Added `amdgpu_sdma_reset_queue`: - Resets a specific SDMA queue by instance ID. - Invokes registered pre-reset and

[PATCH 4/4 V6] drm/amdgpu: Improve SDMA reset logic with guilty queue tracking

2025-02-10 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit introduces several improvements to the SDMA reset logic: 1. Added `cached_rptr` to the `amdgpu_ring` structure to store the read pointer before a reset, ensuring proper state restoration after reset. 2. Introduced `gfx_guilty` and `page_gui

[PATCH 3/4 v6] drm/amdgpu: Add common lock and reset caller parameter for SDMA reset synchronization

2025-02-10 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit introduces a caller parameter to the amdgpu_sdma_reset_instance function to differentiate between reset requests originating from the KGD and KFD. This change ensures proper synchronization between KGD and KFD during SDMA resets. If the cal

[PATCH] drm/amdgpu: Add support for page queue scheduling

2025-02-11 Thread jesse.zh...@amd.com
This patch updates the sdma engine to support scheduling for the page queue. The main changes include: - Introduce a new variable `page` to handle the page queue if it exists. - Update the scheduling logic to conditionally set the `sched.ready` flag for both the sdma gfx queue and the page queue

[PATCH 1/5] drm/amdgpu/sdma7: Implement resume function for each instance

2024-12-09 Thread jesse.zh...@amd.com
Extracts the resume sequence for per sdma instance from sdma_v7_0_gfx_resume. This function can be used in start or restart scenarios of specific instances. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 259 ++--- 1 file changed, 141 insertions(+), 1

[PATCH 2/5] drm/amdgpu/sdma7: implement queue reset callback for sdma7

2024-12-09 Thread jesse.zh...@amd.com
Implement sdma queue reset callback by mes_reset_queue_mmio. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c inde

[PATCH 4/5] drm/amdgpu/mes12: Implement reset gfx/compute queue function by mmio

2024-12-09 Thread jesse.zh...@amd.com
Reset gfx/compute queue through mmio based on me_id and queue_id. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.h | 2 + drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 88 +- 2 files changed, 89 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/a

[PATCH 3/5] drm/amdgpu/mes12: Implement reset sdmav7 queue function by mmio

2024-12-09 Thread jesse.zh...@amd.com
Reset sdma queue through mmio based on me_id and queue_id. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 46 ++ 1 file changed, 46 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c index

[PATCH 5/5] drm/amdgpu/sdma7: Add queue reset sysfs for sdmav7

2024-12-09 Thread jesse.zh...@amd.com
sdmv7 queue reset already supports by mmio, add its sys file. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c index 62

[PATCH 6/7 v2] drm/amdgpu/gfx12: clean up kcq reset code

2024-12-10 Thread jesse.zh...@amd.com
Replace kcq queue reset with existing function amdgpu_mes_reset_legacy_queue. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 18 +- 1 file changed, 5 insertions(+), 13 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/am

[PATCH 1/7 v2] drm/amdgpu/sdma7: Implement resume function for each instance

2024-12-09 Thread jesse.zh...@amd.com
Extracts the resume sequence for per sdma instance from sdma_v7_0_gfx_resume. This function can be used in start or restart scenarios of specific instances. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 259 ++--- 1 file changed, 141 insertions(+), 1

[PATCH 4/7 v2] drm/amdgpu/mes12: Implement reset gfx/compute queue function by mmio

2024-12-09 Thread jesse.zh...@amd.com
Reset gfx/compute queue through mmio based on me_id and queue_id. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.h | 2 + drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 88 +- 2 files changed, 89 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/a

[PATCH 3/7 v2] drm/amdgpu/mes12: Implement reset sdmav7 queue function by mmio

2024-12-09 Thread jesse.zh...@amd.com
Reset sdma queue through mmio based on me_id and queue_id. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 46 ++ 1 file changed, 46 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c index

[PATCH 7/7 v2] drm/amdgpu/gfx11: clean up kcq reset code

2024-12-09 Thread jesse.zh...@amd.com
Replace kcq queue reset with existing function amdgpu_mes_reset_legacy_queue. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 22 +++--- 1 file changed, 3 insertions(+), 19 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/dr

[PATCH 2/7 v2] drm/amdgpu/sdma7: implement queue reset callback for sdma7

2024-12-09 Thread jesse.zh...@amd.com
Implement sdma queue reset callback by mes_reset_queue_mmio. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c inde

[PATCH 5/7 v2] drm/amdgpu/sdma7: Add queue reset sysfs for sdmav7

2024-12-09 Thread jesse.zh...@amd.com
sdmv7 queue reset already supports by mmio, add its sys file. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c index 62

[PATCH 2/3] drm/amdgpu/pm: add PPSMC_MSG_ResetSDMA2 definition

2024-12-16 Thread jesse.zh...@amd.com
add the PPSMC_MSG_ResetSDMA2 definition for smu 13.0.6 Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h | 1 + drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h | 3 ++- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 1 + 3 fi

[PATCH 1/3] drm/amdgpu/sdma4.4.2: add apu support in sdma queue reset

2024-12-16 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Remove apu check in sdma queue reset. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdm

[PATCH 3/3] drm/amdgpu/pm: Implement SDMA queue reset for different asic

2024-12-16 Thread jesse.zh...@amd.com
Implement sdma queue reset by SMU_MSG_ResetSDMA2 Suggested-by: Tim Huang Signed-off-by: Jesse Zhang --- .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 30 ++- 1 file changed, 22 insertions(+), 8 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c b/

[PATCH] drm/amdkfd: fixed page fault when enable MES shader debugger

2024-12-18 Thread jesse.zh...@amd.com
Initialize the process context address before setting the shader debugger. [ 260.781212] amdgpu :03:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:32 vmid:0 pasid:0) [ 260.781236] amdgpu :03:00.0: amdgpu: in page starting at address 0x from client 10 [ 260.781255]

[PATCH 1/3] drm/amdgpu/sdma4.4.2: add apu support in sdma queue reset

2024-12-13 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Remove apu check in sdma queue reset. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c b/drivers/gpu/drm/amd/amdgpu/sdm

[PATCH 2/3] drm/amd/pm: update 13_0_6 ppsmc header

2024-12-13 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" add the definition PPSMC_MSG_ResetSDMA2. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h | 1 + drivers/gpu/drm/amd/pm/swsmu/inc/smu_types.h | 3 ++- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13

[PATCH 3/3] drm/amdgpu/pm: Implement SDMA queue for different asic

2024-12-13 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Implement sdma queue reset by SMU_MSG_ResetSDMA2. Signed-off-by: Jesse Zhang --- .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 28 ++- 1 file changed, 21 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/s

[PATCH 2/2] drm/amdgpu/gfx10: implement gfx queue reset via MMIO

2025-01-08 Thread jesse.zh...@amd.com
implement gfx10 kgq reset via mmio. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 98 ++ 1 file changed, 70 insertions(+), 28 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 89409c

[PATCH 1/2] drm/amdgpu/gfx10: implement queue reset via MMIO

2025-01-08 Thread jesse.zh...@amd.com
implement gfx10 kcq reset via mmio. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 121 ++--- 1 file changed, 88 insertions(+), 33 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index 88393c

[PATCH 1/2 V2] drm/amdgpu/gfx10: implement iqueue reset via MMIO

2025-01-09 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Using mmio to do queue reset. v2: Alignment this function with gfx9/gfx9.4.3. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu

[PATCH 2/2 V2] drm/amdgpu/gfx10: implement gfx queue reset via MMIO

2025-01-09 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Using mmio to do queue reset v2: Alignment the function with gfx9/gfx9.4.3. Signed-off-by: Jesse Zhang adev; unsigned i; + uint32_t tmp; /* enter save mode */ amdgpu_gfx_rlc_enter_safe_mode(adev, xcc_id); @@ -3813,7 +3814,25

[PATCH 1/3] revert "drm/amdgpu/pm: add definition PPSMC_MSG_ResetSDMA2"

2025-01-14 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" pmfw now unifies PPSMC_MSG_ResetSDMA definitions for different devices. PPSMC_MSG_ResetSDMA2 is not needed. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h | 1 - drivers/gpu/drm/amd/pm/swsmu/inc/s

[PATCH 3/3] drm/amd/pm: Refactor SMU 13.0.6 SDMA reset firmware version checks

2025-01-14 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the firmware version checks in `smu_v13_0_6_reset_sdma` to support multiple SMU programs with different firmware version thresholds. Signed-off-by: Jesse Zhang --- .../gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 14 +---

[PATCH 2/3] revert "drm/amdgpu/pm: Implement SDMA queue reset for different asic"

2025-01-14 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" pmfw unified PPSMC_MSG_ResetSDMA definitions for different devices. PPSMC_MSG_ResetSDMA2 is not needed. Signed-off-by: Jesse Zhang --- .../drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 30 +-- 1 file changed, 8 insertions(+), 22 deletion

[PATCH 3/3 V2] drm/amd/pm: Refactor SMU 13.0.6 SDMA reset firmware version checks

2025-01-15 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the firmware version checks in `smu_v13_0_6_reset_sdma` to support multiple SMU programs with different firmware version thresholds. V2: return -EOPNOTSUPP for unspported pmfw Suggested-by: Lazar Lijo Signed-off-by: Jesse Zhang --

[PATCH] drm/amdgpu: Use -ENODATA for GPU job timeout queue recovery

2025-01-14 Thread jesse.zh...@amd.com
When a GPU job times out, the driver attempts to recover by restarting the scheduler. Previously, the scheduler was restarted with an error code of 0, which does not distinguish between a full GPU reset and a queue reset. This patch changes the error code to -ENODATA for queue resets, while -ECANCE

[PATCH V3] drm/amd/pm: Refactor SMU 13.0.6 SDMA reset firmware version checks

2025-01-16 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the firmware version checks in `smu_v13_0_6_reset_sdma` to support multiple SMU programs with different firmware version thresholds. V2: return -EOPNOTSUPP for unspported pmfw V3: except IP_VERSION(13, 0, 12) which is not supported. Su

[PATCH] drm/amd/pm: Refactor SMU 13.0.6 SDMA reset firmware version checks

2025-01-16 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the firmware version checks in `smu_v13_0_6_reset_sdma` to support multiple SMU programs with different firmware version thresholds. V2: return -EOPNOTSUPP for unspported pmfw V3: except IP_VERSION(13, 0, 12) which is not supported. Su

[PATCH V4 2/3] drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush

2025-02-25 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" - Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and

[PATCH v4 1/3] drm/amd/amdgpu: Increase max rings to enable SDMA page ring

2025-02-25 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Increase the maximum number of rings supported by the AMDGPU driver from 132 to 148. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +- 1 file changed, 1

[PATCH V4 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA

2025-02-25 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that al

[PATCH] drm/amdgpu: drm/amdgpu/job: fix is_guilty logic change (v2)

2025-02-24 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" The is Reviewed-by: Jesse Zhang Incrementing the gpu_reset counter needs to be in the is_guilty block. Alos move the fence error before the reset to keep the original ordering. Fixes: f447ba2bbd48 ("drm/amdgpu: Update amdgpu_job_timedout to check

[PATCH 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA

2025-02-27 Thread jesse.zh...@amd.com
This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that all relevant page table cache levels (L1 PTEs,

[PATCH 2/3 v5] drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush

2025-02-27 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" - Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and

[PATCH 1/3 v5] drm/amd/amdgpu: Increase max rings to enable SDMA page ring

2025-02-27 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Increase the maximum number of rings supported by the AMDGPU driver from 132 to 148. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +- 1 file changed, 1

[PATCH V7 2/3] drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush

2025-03-04 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" - Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and

[PATCH 2/2] drm/amdgpu: Add SDMA queue start/stop functions and integrate with ring funcs

2025-03-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch introduces two new functions, `amdgpu_sdma_stop_queue` and `amdgpu_sdma_start_queue`, to handle the stopping and starting of SDMA queues during engine reset operations. The changes include: 1. **New Functions**: - `amdgpu_sdma_stop_queue`:

[PATCH 1/2] drm/amdgpu: Add SDMA queue start/stop callbacks to amdgpu_ring_funcs

2025-03-11 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch introduces two new callbacks, `stop_queue` and `start_queue`, to the `amdgpu_ring_funcs` structure. These callbacks are designed to handle the stopping and starting of SDMA queues during engine reset operations. The changes include: 1. **A

[PATCH 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA

2025-02-28 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that al

[PATCH V6 2/3] drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush

2025-02-28 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" - Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and

[PATCH v6 1/3] drm/amd/amdgpu: Increase max rings to enable SDMA page ring

2025-02-28 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Increase the maximum number of rings supported by the AMDGPU driver from 133 to 149. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +- 1 file changed, 1

[PATCH 6/7] drm/amd/amdgpu: Refactor SDMA v5.2 reset logic into stop_queue and restore_queue functions

2025-03-12 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA v5.2 reset logic by splitting the `sdma_v5_2_reset_queue` function into two separate functions: `sdma_v5_2_stop_queue` and `sdma_v5_2_restore_queue`. This change aligns with the new SDMA reset mechanism, where the reset p

[PATCH 3/7] drm/amdgpu: Optimize SDMA v5.0 queue reset and stop logic

2025-03-12 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This patch refactors the SDMA v5.0 queue reset and stop logic to improve code readability, maintainability, and performance. The key changes include: 1. **Generalized `sdma_v5_0_gfx_stop` Function**: - Added an `inst_mask` parameter to allow stopping spe

[PATCH V7 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA

2025-03-04 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that al

[PATCH v7 1/3] drm/amd/amdgpu: Increase max rings to enable SDMA page ring

2025-03-04 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Increase the maximum number of rings supported by the AMDGPU driver from 133 to 149. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +- 1 file changed, 1

  1   2   >