Re: [PATCH v8 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-05-28 Thread Christian König
On 5/28/25 09:46, Lazar, Lijo wrote: > On 5/22/2025 4:10 PM, Samuel Zhang wrote: >> When switching to new GPU index after hibernation and then resume, >> VRAM offset of each VRAM BO will be changed, and the cached gpu >> addresses needed to updated. >> >> This is to enable pdb0 and switch to use pd

Re: [PATCH] drm/amdgpu: only export available rings to mesa for enabling kq|uq

2025-05-28 Thread Christian König
On 5/28/25 10:37, Prike Liang wrote: > The kernel driver only requires exporting available rings to the mesa > when the userq is disabled; otherwise, the userq IP mask will be cleaned > up in the mesa. Hui? That doesn't sounds correct to me. That userq is disable in mesa when kernel queues are av

[PATCH 5/8] drm/amd/kfd: Add comment about possible drm_gem_handle_create() race

2025-05-28 Thread Simona Vetter
I've long ago stopped trying to fully understand all the locking in amdkfd, so maybe this is safe for a contrived reason. It's definitely not how this should be done. Considers this more a request for a proper patch. Cc: Felix Kuehling Cc: amd-gfx@lists.freedesktop.org Signed-off-by: Simona Vette

Re: [PATCH v8 2/4] drm/amdgpu: update GPU addresses for SMU and PSP

2025-05-28 Thread Lazar, Lijo
On 5/22/2025 4:10 PM, Samuel Zhang wrote: > add amdgpu_bo_fb_aper_addr() and update the cached GPU addresses to use > the FB aperture address for SMU and PSP. > > 2 reasons for this change: > 1. when pdb0 is enabled, gpu addr from amdgpu_bo_create_kernel() is GART > aperture address, it is not

[PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation

2025-05-28 Thread Natalie Vock
If we hand out cleared blocks to users, they are expected to write at least some non-zero values somewhere. If we keep the CLEAR bit set on the block, amdgpu_fill_buffer will assume there is nothing to do and incorrectly skip clearing the block. Ultimately, the (still dirty) block will be reused as

[PATCH 1/2] drm/buddy: Add public helper to dirty blocks

2025-05-28 Thread Natalie Vock
Cleared blocks that are handed out to users after allocation cannot be presumed to remain cleared. Thus, allocators using drm_buddy need to dirty all blocks on the allocation success path. Provide a helper for them to use. Fixes: 96950929eb232 ("drm/buddy: Implement tracking clear page feature") C

[PATCH 0/2] Fix AMDGPU VRAM zeroing

2025-05-28 Thread Natalie Vock
Hi all, I've stumbled upon this while investigating why AMDGPU seems to fail at providing cleared VRAM allocations despite being explicitly asked to with AMDGPU_GEM_CREATE_VRAM_CLEARED[1]. After some code inspection, I believe the problem is actually much worse than not providing cleared VRAM. AM

Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation

2025-05-28 Thread Christian König
On 5/28/25 11:29, Natalie Vock wrote: > Hi, > > On 5/28/25 09:07, Christian König wrote: >> On 5/27/25 21:43, Natalie Vock wrote: >>> If we hand out cleared blocks to users, they are expected to write >>> at least some non-zero values somewhere. If we keep the CLEAR bit set on >>> the block, amdgp

Re: [PATCH 10/19] drm/amdgpu: pad ring in amdgpu_ib_schedule

2025-05-28 Thread Christian König
On 5/28/25 06:19, Alex Deucher wrote: > We'll want to include the padding in the wptr count > for resets. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c > b/drivers

Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation

2025-05-28 Thread Christian König
On 5/27/25 21:43, Natalie Vock wrote: > If we hand out cleared blocks to users, they are expected to write > at least some non-zero values somewhere. If we keep the CLEAR bit set on > the block, amdgpu_fill_buffer will assume there is nothing to do and > incorrectly skip clearing the block. Ultimat

Re: [PATCH v8 4/4] drm/amdgpu: fix fence fallback timer expired error

2025-05-28 Thread Lazar, Lijo
On 5/22/2025 4:10 PM, Samuel Zhang wrote: > IH is not working after switching a new gpu index for the first time. > > The msix table in virtual machine is faked. The real msix table will be > programmed by QEMU when guest enable/disable msix interrupt. But QEMU > accessing VF msix table (regist

[PATCH] drm/amdgpu: only export available rings to mesa for enabling kq|uq

2025-05-28 Thread Prike Liang
The kernel driver only requires exporting available rings to the mesa when the userq is disabled; otherwise, the userq IP mask will be cleaned up in the mesa. Signed-off-by: Prike Liang --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 20 ++-- 1 file changed, 10 insertions(+), 10 de

[PATCH 2/3] drm/amdgpu/userq: add client id for each userq_mgr

2025-05-28 Thread Sunil Khatri
Add client id the for each userq_mgr which is created per fd to track which fd is for which client that could be used in debugfs entry to derive information like vm and mqd. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_userq.h

Re: [PATCH v8 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-05-28 Thread Lazar, Lijo
On 5/22/2025 4:10 PM, Samuel Zhang wrote: > When switching to new GPU index after hibernation and then resume, > VRAM offset of each VRAM BO will be changed, and the cached gpu > addresses needed to updated. > > This is to enable pdb0 and switch to use pdb0-based virtual gpu > address by defaul

Re: [PATCH v8 1/4] drm/amdgpu: update xgmi info and vram_base_offset on resume

2025-05-28 Thread Lazar, Lijo
On 5/22/2025 4:10 PM, Samuel Zhang wrote: > For SRIOV VM env with XGMI enabled systems, XGMI physical node id may > change when hibernate and resume with different VF. > > Update XGMI info and vram_base_offset on resume for gfx444 SRIOV env. > Add amdgpu_virt_xgmi_migrate_enabled() as the featu

Re: [PATCH] drm/amd: Export DMCUB version to sysfs

2025-05-28 Thread Lazar, Lijo
On 5/27/2025 9:29 PM, Mario Limonciello wrote: > For supported ASICs DMCU version is exported, but ASICs that support > DMCUB there is no information exported to sysfs. > > Add an attribute for DMCUB. > > Signed-off-by: Mario Limonciello Reviewed-by: Lijo Lazar Thanks, Lijo > --- > drive

[PATCH] drm/amdgpu/gfx10: Refine Cleaner Shader for GFX10.1.10

2025-05-28 Thread Srinivasan Shanmugam
From: Vitaly Prosyak This patch updates the cleaner shader, which is responsible for initializing GPU resources such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers (SGPRs). Changes include adjustments to register clearing and shader confi

Re: [PATCH 12/19] drm/amdgpu/gfx10: re-emit unprocessed state on kgq reset

2025-05-28 Thread Christian König
On 5/28/25 06:19, Alex Deucher wrote: > Re-emit the unprocessed state after resetting the queue. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 19 --- > 1 file changed, 12 insertions(+), 7 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgp

Re: [PATCH 15/19] drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset

2025-05-28 Thread Christian König
On 5/28/25 06:19, Alex Deucher wrote: > Re-emit the unprocessed state after resetting the queue. I don't think we want any of this for compute queues. Regards, Christian. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 13 +++-- > 1 file changed, 11 ins

Re: [PATCH RESEND] drm/amd/display: Adjust prefix of dcn31_apg construct function name

2025-05-28 Thread Leonardo Gomes
Hi Alex, Thanks for your answer! On Mon, 26 May 2025 at 23:50 Alex Hung wrote: > Hi Leonardo, > > Thank you for this patch, but unfortunately some unit test suites depend > on the names. > > On 5/21/25 07:58, Leonardo Gomes wrote: > > From: Leonardo da Silva Gomes > > > > Adjust the dcn31_apg

Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation

2025-05-28 Thread Paneer Selvam, Arunpravin
On 5/28/2025 2:59 PM, Natalie Vock wrote: Hi, On 5/28/25 09:07, Christian König wrote: On 5/27/25 21:43, Natalie Vock wrote: If we hand out cleared blocks to users, they are expected to write at least some non-zero values somewhere. If we keep the CLEAR bit set on the block, amdgpu_fill_b

Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency

2025-05-28 Thread Simona Vetter
On Mon, May 26, 2025 at 01:27:28PM +0200, Philipp Stanner wrote: > On Mon, 2025-05-26 at 13:16 +0200, Christian König wrote: > > On 5/26/25 11:34, Philipp Stanner wrote: > > > On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote: > > > > On 5/23/25 16:16, Danilo Krummrich wrote: > > > > > On Fr

Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation

2025-05-28 Thread Michel Dänzer
On 2025-05-28 14:14, Paneer Selvam, Arunpravin wrote: > On 5/28/2025 2:59 PM, Natalie Vock wrote: >> On 5/28/25 09:07, Christian König wrote: >>> >>> But the problem rather seems to be that we sometimes don't clear the >>> buffers on release for some reason, but still set it as cleared. >> >> Yes

Re: [PATCH v11 02/10] drm/sched: Store the drm client_id in drm_sched_fence

2025-05-28 Thread Lucas De Marchi
On Mon, May 26, 2025 at 02:54:44PM +0200, Pierre-Eric Pelloux-Prayer wrote: drivers/gpu/drm/xe/xe_sched_job.c| 3 ++- diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c index f0a6ce610948..5921293b25db 100644 --- a/drivers/gpu/drm/xe/xe_sched_job.c

Re: [PATCH] Revert "drm/amd/display: no 3D and blnd LUT as DPP color caps for DCN401"

2025-05-28 Thread Alex Hung
There is a work-in-progress supplement patch. Let's wait and see whether it fixes the regressions without reverting 46e68dd5066c. On 5/27/25 11:06, Alex Hung wrote: This reverts commit 46e68dd5066c9831e9695c1756db017bb9c3762c since it breaks color enhancement in another OS, indicating these two

[PATCH 07/16] drm/amdgpu: rework queue reset scheduler interaction

2025-05-28 Thread Alex Deucher
From: Christian König Stopping the scheduler for queue reset is generally a good idea because it prevents any worker from touching the ring buffer. But using amdgpu_fence_driver_force_completion() before restarting it was a really bad idea because it marked fences as failed while the work was po

RE: [PATCH] drm/amdkfd: Fix kfd process ref leaking when userptr unmapping

2025-05-28 Thread Kasiviswanathan, Harish
[Public] >From the code, it looks like you want to hold reference to the process to >ensure that it doesn't get destroyed while sending the fault event to user. If >that is correct, then your commit message is not reflecting that. With commit message updated, this patch is Reviewed-by: Harish K

Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation

2025-05-28 Thread Christian König
On 5/28/25 14:39, Michel Dänzer wrote: > On 2025-05-28 14:14, Paneer Selvam, Arunpravin wrote: >> On 5/28/2025 2:59 PM, Natalie Vock wrote: >>> On 5/28/25 09:07, Christian König wrote: But the problem rather seems to be that we sometimes don't clear the buffers on release for some r

[PATCH] drm/amdgpu: Enable IFWI update support for PSPv14.0.3

2025-05-28 Thread Shiwu Zhang
Make the psp_vbflash and psp_vbflash_status available in sysfs. Signed-off-by: Shiwu Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c index dd37264f1ec9..c2f

Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency

2025-05-28 Thread Alex Deucher
On Wed, May 28, 2025 at 8:45 AM Simona Vetter wrote: > > On Mon, May 26, 2025 at 01:27:28PM +0200, Philipp Stanner wrote: > > On Mon, 2025-05-26 at 13:16 +0200, Christian König wrote: > > > On 5/26/25 11:34, Philipp Stanner wrote: > > > > On Mon, 2025-05-26 at 11:25 +0200, Christian König wrote: >

Re: [PATCH 2/2] drm/amdgpu: Dirty cleared blocks on allocation

2025-05-28 Thread Natalie Vock
Hi, On 5/28/25 09:07, Christian König wrote: On 5/27/25 21:43, Natalie Vock wrote: If we hand out cleared blocks to users, they are expected to write at least some non-zero values somewhere. If we keep the CLEAR bit set on the block, amdgpu_fill_buffer will assume there is nothing to do and inc

Re: [PATCH v2] drm/amdkfd: Map wptr BO to GART unconditionally

2025-05-28 Thread Felix Kuehling
On 2025-05-27 21:55, Lang Yu wrote: > For simulation C models that don't run CP FW where adev->mes.sched_version > is not populated correctly. This causes NULL dereference in > amdgpu_amdkfd_free_gtt_mem(dev->adev, (void **)&pqn->q->wptr_bo_gart) > and warning on unpinned BO in amdgpu_bo_gpu_offset

Re: [PATCH 12/19] drm/amdgpu/gfx10: re-emit unprocessed state on kgq reset

2025-05-28 Thread Alex Deucher
On Wed, May 28, 2025 at 9:48 AM Christian König wrote: > > On 5/28/25 15:38, Alex Deucher wrote: > > On Wed, May 28, 2025 at 7:40 AM Christian König > > wrote: > >> > >> On 5/28/25 06:19, Alex Deucher wrote: > >>> Re-emit the unprocessed state after resetting the queue. > >>> > >>> Signed-off-by:

Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency

2025-05-28 Thread Danilo Krummrich
On Wed, May 28, 2025 at 09:29:30AM -0400, Alex Deucher wrote: > On Wed, May 28, 2025 at 8:45 AM Simona Vetter wrote: > > I do occasionally find it useful as a record of different approaches > > considered, which sometimes people fail to adequately cover in their > > commit messages. Also useful in

Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency

2025-05-28 Thread Danilo Krummrich
On Wed, May 28, 2025 at 04:39:01PM +0200, Danilo Krummrich wrote: > On Wed, May 28, 2025 at 09:29:30AM -0400, Alex Deucher wrote: > > On Wed, May 28, 2025 at 8:45 AM Simona Vetter > > wrote: > > > I do occasionally find it useful as a record of different approaches > > > considered, which sometim

Re: [PATCH v11 02/10] drm/sched: Store the drm client_id in drm_sched_fence

2025-05-28 Thread Pierre-Eric Pelloux-Prayer
Hi, Le 28/05/2025 à 21:07, Lucas De Marchi a écrit : On Mon, May 26, 2025 at 02:54:44PM +0200, Pierre-Eric Pelloux-Prayer wrote: drivers/gpu/drm/xe/xe_sched_job.c    |  3 ++- diff --git a/drivers/gpu/drm/xe/xe_sched_job.c b/drivers/gpu/drm/xe/xe_sched_job.c index f0a6ce610948..5921

Re: [PATCH 1/2] drm/amdkfd: remove unused code

2025-05-28 Thread Philip Yang
On 2025-05-28 13:19, James Zhu wrote: upages is assigned under cpages = 0, so it isn't really used in this function. Signed-off-by: James Zhu Reviewed-by: Philip.Yang --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkf

Re: [PATCH 2/2] drm/amdkfd: add svm_migrate_successful_pages

2025-05-28 Thread Philip Yang
On 2025-05-28 13:19, James Zhu wrote: to get migration pages. When migrating pages from system to vram, needn't check bit MIGRATE_PFN_VALID, since the system page could be allocated, but not be accessed. I think the corner case is vram_pages becomes negative value when migrating prange from

Re: [PATCH v2] drm/ttm: Should to return the evict error

2025-05-28 Thread Chen, Xiaogang
On 5/28/2025 1:19 AM, Deng, Emily wrote: [AMD Official Use Only - AMD Internal Distribution Only] *From:*amd-gfx *On Behalf Of *Deng, Emily *Sent:* Monday, May 26, 2025 9:51 AM *To:* Chen, Xiaogang ; amd-gfx@lists.freedesktop.org *Subject:* RE: [PATCH v2] drm/ttm: Should to return the ev

[PATCH v2 2/2] drm/amdkfd: add svm_migrate_successful_pages

2025-05-28 Thread James Zhu
to get migration pages. dst bit MIGRATE_PFN_VALID and src bit MIGRATE_PFN_MIGRATE should always be set when success. -v2 use dst to check MIGRATE_PFN_VALID bit(suggested-by philip) Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 22 ++ 1 file changed,

Re: [PATCH] drm/amdkfd: Fix kfd process ref leaking when userptr unmapping

2025-05-28 Thread Philip Yang
On 2025-05-28 14:24, Kasiviswanathan, Harish wrote: [Public] From the code, it looks like you want to hold reference to the process to ensure that it doesn't get destroyed while sending the fault event to user. If that is correct, then your commit message is not reflecting that. With commi

RE: [PATCH] drm/amdgpu: only export available rings to mesa for enabling kq|uq

2025-05-28 Thread Liang, Prike
[Public] > -Original Message- > From: Alex Deucher > Sent: Wednesday, May 28, 2025 9:11 PM > To: Liang, Prike > Cc: amd-gfx@lists.freedesktop.org; Deucher, Alexander > ; Koenig, Christian > Subject: Re: [PATCH] drm/amdgpu: only export available rings to mesa for > enabling > kq|uq > >

RE: [PATCH] drm/amdgpu: only export available rings to mesa for enabling kq|uq

2025-05-28 Thread Liang, Prike
[Public] > -Original Message- > From: amd-gfx On Behalf Of Liang, > Prike > Sent: Thursday, May 29, 2025 9:48 AM > To: Koenig, Christian ; > amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: RE: [PATCH] drm/amdgpu: only export available rings to mesa for > enabling > k

RE: [PATCH] drm/amdgpu: only export available rings to mesa for enabling kq|uq

2025-05-28 Thread Liang, Prike
[Public] > From: Koenig, Christian > Sent: Wednesday, May 28, 2025 7:21 PM > To: Liang, Prike ; amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: Re: [PATCH] drm/amdgpu: only export available rings to mesa for > enabling > kq|uq > > On 5/28/25 10:37, Prike Liang wrote: > > The k

RE: [PATCH v2] drm/ttm: Should to return the evict error

2025-05-28 Thread Deng, Emily
[AMD Official Use Only - AMD Internal Distribution Only] From: Chen, Xiaogang Sent: Thursday, May 29, 2025 5:15 AM To: Deng, Emily ; Zhang, Owen(SRDC) Cc: amd-gfx@lists.freedesktop.org Subject: Re: [PATCH v2] drm/ttm: Should to return the evict error On 5/28/2025 1:19 AM, Deng, Emily wrote:

Re: [PATCH 1/4] drm/sched: optimize drm_sched_job_add_dependency

2025-05-28 Thread Christian König
On 5/28/25 14:30, Simona Vetter wrote: >> Yup, I've seen that a few times. I think we, the DRM community, should >> stop that. It's just not useful and makes the commit messages larger, >> both for the human reader while scrolling, as for the hard drive >> regarding storage size > > I do occasiona

Re: [PATCH] drm/amdgpu/gfx10: Refine Cleaner Shader for GFX10.1.10

2025-05-28 Thread Alex Deucher
On Wed, May 28, 2025 at 5:30 AM Srinivasan Shanmugam wrote: > > From: Vitaly Prosyak > > This patch updates the cleaner shader, which is responsible for > initializing GPU resources such as Local Data Share (LDS), Vector > General Purpose Registers (VGPRs), and Scalar General Purpose Registers >

Re: [PATCH 12/19] drm/amdgpu/gfx10: re-emit unprocessed state on kgq reset

2025-05-28 Thread Christian König
On 5/28/25 15:38, Alex Deucher wrote: > On Wed, May 28, 2025 at 7:40 AM Christian König > wrote: >> >> On 5/28/25 06:19, Alex Deucher wrote: >>> Re-emit the unprocessed state after resetting the queue. >>> >>> Signed-off-by: Alex Deucher >>> --- >>> drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 19 ++

Re: [PATCH 10/19] drm/amdgpu: pad ring in amdgpu_ib_schedule

2025-05-28 Thread Alex Deucher
On Wed, May 28, 2025 at 7:45 AM Christian König wrote: > > On 5/28/25 06:19, Alex Deucher wrote: > > We'll want to include the padding in the wptr count > > for resets. > > > > Signed-off-by: Alex Deucher > > --- > > drivers/gpu/drm/amd/amdgpu/amdgpu_ib.c | 2 ++ > > 1 file changed, 2 insertions

Re: [PATCH 10/19] drm/amdgpu: pad ring in amdgpu_ib_schedule

2025-05-28 Thread Christian König
On 5/28/25 15:41, Alex Deucher wrote: > On Wed, May 28, 2025 at 7:45 AM Christian König > wrote: >> >> On 5/28/25 06:19, Alex Deucher wrote: >>> We'll want to include the padding in the wptr count >>> for resets. >>> >>> Signed-off-by: Alex Deucher >>> --- >>> drivers/gpu/drm/amd/amdgpu/amdgpu_i

Re: [PATCH 15/19] drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset

2025-05-28 Thread Alex Deucher
On Wed, May 28, 2025 at 7:50 AM Christian König wrote: > > On 5/28/25 06:19, Alex Deucher wrote: > > Re-emit the unprocessed state after resetting the queue. > > I don't think we want any of this for compute queues. Why not? This allows us to do per job resets and can be trivially extended to al

Re: [PATCH] drm/amdgpu: only export available rings to mesa for enabling kq|uq

2025-05-28 Thread Alex Deucher
On Wed, May 28, 2025 at 4:38 AM Prike Liang wrote: > > The kernel driver only requires exporting available rings to the mesa > when the userq is disabled; otherwise, the userq IP mask will be cleaned > up in the mesa. The logic should work correctly as is. There are three possible states: 1. KQ

Re: [PATCH 12/19] drm/amdgpu/gfx10: re-emit unprocessed state on kgq reset

2025-05-28 Thread Alex Deucher
On Wed, May 28, 2025 at 7:40 AM Christian König wrote: > > On 5/28/25 06:19, Alex Deucher wrote: > > Re-emit the unprocessed state after resetting the queue. > > > > Signed-off-by: Alex Deucher > > --- > > drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 19 --- > > 1 file changed, 12 in

Re: [PATCH 10/19] drm/amdgpu: pad ring in amdgpu_ib_schedule

2025-05-28 Thread Alex Deucher
On Wed, May 28, 2025 at 9:42 AM Christian König wrote: > > On 5/28/25 15:41, Alex Deucher wrote: > > On Wed, May 28, 2025 at 7:45 AM Christian König > > wrote: > >> > >> On 5/28/25 06:19, Alex Deucher wrote: > >>> We'll want to include the padding in the wptr count > >>> for resets. > >>> > >>> S

Re: [PATCH 12/19] drm/amdgpu/gfx10: re-emit unprocessed state on kgq reset

2025-05-28 Thread Alex Deucher
On Wed, May 28, 2025 at 9:57 AM Alex Deucher wrote: > > On Wed, May 28, 2025 at 9:48 AM Christian König > wrote: > > > > On 5/28/25 15:38, Alex Deucher wrote: > > > On Wed, May 28, 2025 at 7:40 AM Christian König > > > wrote: > > >> > > >> On 5/28/25 06:19, Alex Deucher wrote: > > >>> Re-emit th

[PATCH 1/2] drm/amdkfd: remove unused code

2025-05-28 Thread James Zhu
upages is assigned under cpages = 0, so it isn't really used in this function. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c b/drivers/gpu/drm/amd/amdkfd/kfd_migrate.c index 79251

[PATCH 2/2] drm/amdkfd: add svm_migrate_successful_pages

2025-05-28 Thread James Zhu
to get migration pages. When migrating pages from system to vram, needn't check bit MIGRATE_PFN_VALID, since the system page could be allocated, but not be accessed. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 25 1 file changed, 13 insertions

[PATCH 05/16] drm/amdgpu/gfx8: drop reset_kgq

2025-05-28 Thread Alex Deucher
It doesn't work reliably and we have soft recover and full adapter reset so drop this. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 71 --- 1 file changed, 71 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c b/drivers/gpu/drm/am

[PATCH 09/16] drm/amdgpu/gfx10: re-emit unprocessed state on kgq reset

2025-05-28 Thread Alex Deucher
Re-emit the unprocessed state after resetting the queue. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 25 ++--- 1 file changed, 18 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_

[PATCH 01/16] drm/amdgpu/gfx10: enable legacy enforce isolation

2025-05-28 Thread Alex Deucher
Enable legacy enforce isolation (just serialize kernel GC submissions). This way we can reset a ring and only affect the the process currently using that ring. This mirrors what windows does. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 9 + 1 file changed, 9

[PATCH V4 00/16] Reset improvements for GC10+

2025-05-28 Thread Alex Deucher
This set improves per queue reset support for GC10+. When we reset the queue, the queue is lost so we need to re-emit the unprocessed state from subsequent submissions. To that end, in order to make sure we actually restore unprocessed state, we need to enable legacy enforce isolation so that we ca

[PATCH 15/16] drm/amdgpu/gfx11: re-emit unprocessed state on kcq reset

2025-05-28 Thread Alex Deucher
Re-emit the unprocessed state after resetting the queue. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 33 +- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gf

[PATCH 08/16] drm/amdgpu: track ring state associated with a job

2025-05-28 Thread Alex Deucher
We need to know the wptr and sequence number associated with a job so that we can re-emit the unprocessed state after a ring reset. Pre-allocate storage space for the ring buffer contents and add a helper to save off the unprocessed state so that it can be re-emitted after the queue is reset. Sig

[PATCH 11/16] drm/amdgpu/gfx12: re-emit unprocessed state on kgq reset

2025-05-28 Thread Alex Deucher
Re-emit the unprocessed state after resetting the queue. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c

[PATCH 14/16] drm/amdgpu/gfx10: re-emit unprocessed state on kcq reset

2025-05-28 Thread Alex Deucher
Re-emit the unprocessed state after resetting the queue. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 33 -- 1 file changed, 15 insertions(+), 18 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gf

[PATCH 02/16] drm/amdgpu/gfx11: enable legacy enforce isolation

2025-05-28 Thread Alex Deucher
Enable legacy enforce isolation (just serialize kernel GC submissions). This way we can reset a ring and only affect the the process currently using that ring. This mirrors what windows does. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 9 + 1 file changed, 9

[PATCH 06/16] drm/amdgpu/gfx9: drop reset_kgq

2025-05-28 Thread Alex Deucher
It doesn't work reliably and we have soft recover and full adapter reset so drop this. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 46 --- 1 file changed, 46 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/am

[PATCH 12/16] drm/amdgpu/gfx9: re-emit unprocessed state on kcq reset

2025-05-28 Thread Alex Deucher
Re-emit the unprocessed state after resetting the queue. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 19 +-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c i

[PATCH 04/16] drm/amdgpu/gfx7: drop reset_kgq

2025-05-28 Thread Alex Deucher
It doesn't work reliably and we have soft recover and full adapter reset so drop this. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 71 --- 1 file changed, 71 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c b/drivers/gpu/drm/am

[PATCH 13/16] drm/amdgpu/gfx9.4.3: re-emit unprocessed state on kcq reset

2025-05-28 Thread Alex Deucher
Re-emit the unprocessed state after resetting the queue. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 21 ++--- 1 file changed, 18 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9

[PATCH 10/16] drm/amdgpu/gfx11: re-emit unprocessed state on kgq reset

2025-05-28 Thread Alex Deucher
Re-emit the unprocessed state after resetting the queue. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 18 -- 1 file changed, 16 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c

[PATCH 03/16] drm/amdgpu/gfx12: enable legacy enforce isolation

2025-05-28 Thread Alex Deucher
Enable legacy enforce isolation (just serialize kernel GC submissions). This way we can reset a ring and only affect the the process currently using that ring. This mirrors what windows does. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 9 + 1 file changed, 9

[PATCH 16/16] drm/amdgpu/gfx12: re-emit unprocessed state on kcq reset

2025-05-28 Thread Alex Deucher
Re-emit the unprocessed state after resetting the queue. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 33 +- 1 file changed, 16 insertions(+), 17 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gf