[PATCH v4] drm/amdkfd: Fix partial migrate issue

2025-01-09 Thread Emily Deng
For partial migrate from ram to vram, the migrate->cpages is not equal to migrate->npages, should use migrate->npages to check all needed migrate pages which could be copied or not. And only need to set those pages could be migrated to migrate->dst[i], or the migrate_vma_pages will migrate the wro

Re: [v4 5/5] drm/amdgpu: fix invalid memory access in amdgpu_fence_driver_sw_fini()

2025-01-09 Thread Gerry Liu
> 2025年1月10日 14:51,Christian König 写道: > > Am 10.01.25 um 03:08 schrieb Jiang Liu: >> Function detects initialization status by checking sched->ops, so set >> sched->ops to non-NULL just before return in function >> amdgpu_fence_driver_sw_fini() and amdgpu_device_init_schedulers() >> to avoid

Re: [v4 5/5] drm/amdgpu: fix invalid memory access in amdgpu_fence_driver_sw_fini()

2025-01-09 Thread Christian König
Am 10.01.25 um 03:08 schrieb Jiang Liu: Function detects initialization status by checking sched->ops, so set sched->ops to non-NULL just before return in function amdgpu_fence_driver_sw_fini() and amdgpu_device_init_schedulers() to avoid possible invalid memory access on error recover path. Sig

Re: [PATCH] drm/amdgpu: Add mutex locking to VMID Manager Initialization for Process Isolation

2025-01-09 Thread Christian König
Am 10.01.25 um 04:38 schrieb Srinivasan Shanmugam: This commit adds mutex locking to the `amdgpu_vmid_mgr_init` function. By acquiring and releasing the `enforce_isolation_mutex`, so that it now safely allocates reserved VMIDs, which is important for enforcing isolation between different GPU proc

[PATCH 2/2 V2] drm/amdgpu/gfx10: implement gfx queue reset via MMIO

2025-01-09 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Using mmio to do queue reset v2: Alignment the function with gfx9/gfx9.4.3. Signed-off-by: Jesse Zhang adev; unsigned i; + uint32_t tmp; /* enter save mode */ amdgpu_gfx_rlc_enter_safe_mode(adev, xcc_id); @@ -3813,7 +3814,25 @@ static

[PATCH 1/2 V2] drm/amdgpu/gfx10: implement iqueue reset via MMIO

2025-01-09 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Using mmio to do queue reset. v2: Alignment this function with gfx9/gfx9.4.3. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 34 ++ 1 file changed, 34 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.

[PATCH] drm/amdgpu: Add mutex locking to VMID Manager Initialization for Process Isolation

2025-01-09 Thread Srinivasan Shanmugam
This commit adds mutex locking to the `amdgpu_vmid_mgr_init` function. By acquiring and releasing the `enforce_isolation_mutex`, so that it now safely allocates reserved VMIDs, which is important for enforcing isolation between different GPU processes. Mutex ensures that the process of allocating

Re: [PATCH] drm/amdgpu: fix gpu recovery disable with per queue reset

2025-01-09 Thread Lazar, Lijo
On 1/9/2025 8:27 PM, Kim, Jonathan wrote: > [Public] > >> -Original Message- >> From: Lazar, Lijo >> Sent: Thursday, January 9, 2025 1:14 AM >> To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org >> Cc: Kasiviswanathan, Harish >> Subject: Re: [PATCH] drm/amdgpu: fix gpu recovery disable

Re: [PATCH 1/5] drm/amdgpu/gfx: add ring helpers for setting workload profile

2025-01-09 Thread Lazar, Lijo
On 1/9/2025 10:36 PM, Alex Deucher wrote: > On Thu, Jan 9, 2025 at 12:59 AM Lazar, Lijo wrote: >> >> >> >> On 1/9/2025 4:26 AM, Alex Deucher wrote: >>> Add helpers to switch the workload profile dynamically when >>> commands are submitted. This allows us to switch to >>> the FULLSCREEN3D or CO

Re: [RFC PATCH 03/13] drm/amdgpu: add a flag to track ras debugfs creation status

2025-01-09 Thread Gerry Liu
> 2025年1月9日 01:19,Mario Limonciello 写道: > > On 1/8/2025 07:59, Jiang Liu wrote: >> Add a flag to track ras debugfs creation status, to avoid possible >> incorrect reference count management for ras block object in function >> amdgpu_ras_aca_is_supported(). > > Rather than taking a marker posi

[PATCH] drm/amdgpu: disable gfxoff with the compute workload on gfx12

2025-01-09 Thread Kenneth Feng
Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/d

[v4 5/5] drm/amdgpu: fix invalid memory access in amdgpu_fence_driver_sw_fini()

2025-01-09 Thread Jiang Liu
Function detects initialization status by checking sched->ops, so set sched->ops to non-NULL just before return in function amdgpu_fence_driver_sw_fini() and amdgpu_device_init_schedulers() to avoid possible invalid memory access on error recover path. Signed-off-by: Jiang Liu --- drivers/gpu/dr

[v4 1/5] drm/amdgpu: clear adev->in_suspend flag when fails to suspend

2025-01-09 Thread Jiang Liu
Clear adev->in_suspend flag when fails to suspend, otherwise it will cause too much warnings like: [ 1802.212027] [ cut here ] [ 1802.212028] WARNING: CPU: 97 PID: 11282 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:452 amdgpu_bo_free_kernel+0xf9/0x120 [amdgpu] [ 1802.2121

[v4 3/5] drm/amdgpu: fix use after free bug related to amdgpu_driver_release_kms()

2025-01-09 Thread Jiang Liu
If some GPU device failed to probe, `rmmod amdgpu` will trigger a use after free bug related to amdgpu_driver_release_kms() as: [16002.085540] BUG: kernel NULL pointer dereference, address: [16002.093792] #PF: supervisor read access in kernel mode [16002.03] #PF: error_code(0x0

[v4 2/5] drm/amdxcp: introduce new API amdgpu_xcp_drm_dev_free()

2025-01-09 Thread Jiang Liu
Introduce new interface amdgpu_xcp_drm_dev_free() to free a specific drm_device crreated by amdgpu_xcp_drm_dev_alloc(), which will be used to do error recovery. Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c | 65 + drivers/gpu/drm/amd/amdxcp/amdgpu_

[v4 4/5] drm/amdgpu: enhance error handling in function amdgpu_pci_probe()

2025-01-09 Thread Jiang Liu
Enhance error handling in function amdgpu_pci_probe() to avoid possible resource leakage. Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/d

[v4 0/6] Fix several bugs in error handling during device probe

2025-01-09 Thread Jiang Liu
This patchset tries to fix several memory leakages/invalid memory accesses on error handling path during GPU driver loading/unloading. They applies to: https://gitlab.freedesktop.org/agd5f/linux.git amd-staging-drm-next v4: 1) drop patch 1 in v3 2) split out amdxcp related change into a dedicated

Recall: [PATCH] drm/amdgpu: disable gfxoff with the compute workload on gfx12

2025-01-09 Thread Wang, Yang(Kevin)
Wang, Yang(Kevin) would like to recall the message, "[PATCH] drm/amdgpu: disable gfxoff with the compute workload on gfx12".

Re: [v3 6/6] drm/amdgpu: get rid of false warnings caused by amdgpu_irq_put()

2025-01-09 Thread Gerry Liu
> 2025年1月8日 17:05,Christian König 写道: > > Am 08.01.25 um 09:56 schrieb Jiang Liu: >> If error happens before amdgpu_fence_driver_hw_init() gets called during >> device probe, it will trigger a false warning in amdgpu_irq_put() as >> below: >> [ 1209.300996] [ cut here ]

RE: [PATCH] drm/amdgpu: disable gfxoff with the compute workload on gfx12

2025-01-09 Thread Wang, Yang(Kevin)
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Yang Wang Best Regards, Kevin -Original Message- From: Kenneth Feng Sent: Thursday, January 9, 2025 16:25 To: amd-gfx@lists.freedesktop.org Cc: Wang, Yang(Kevin) ; Feng, Kenneth Subject: [PATCH] drm/amdgpu: disable

Re: kmemleak: Found object by alias at 0xffff888107b65918

2025-01-09 Thread Alex Deucher
On Thu, Jan 9, 2025 at 3:29 PM Borislav Petkov wrote: > > Hi folks, > > this is rc6 + tip/master, machine is Carrizo laptop. Possibly fixed by this patch? https://lore.kernel.org/lkml/CAJZ5v0i=ap+w4QZ8f2DsaHY6D=XUEuSNjyQ-2_=dgolfzjd...@mail.gmail.com/T/ Alex > > full dmesg attached. > > Thx. >

Re: [PATCH] drm/amdgpu: Mark debug KFD module params as unsafe

2025-01-09 Thread Alex Deucher
On Wed, Jan 8, 2025 at 10:18 AM Kent Russell wrote: > > Mark options only meant to be used for debugging as unsafe so that the > kernel is tainted when they are used. > > Signed-off-by: Kent Russell Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 +- > 1

Re: [PATCH] drm/amdgpu: Mark debug KFD module params as unsafe

2025-01-09 Thread Felix Kuehling
On 2025-01-08 10:18, Kent Russell wrote: Mark options only meant to be used for debugging as unsafe so that the kernel is tainted when they are used. Signed-off-by: Kent Russell Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 10 +- 1 file changed, 5 i

Re: [PATCH v4] drm/amdkfd: Have kfd driver use same PASID values from graphic driver

2025-01-09 Thread Felix Kuehling
On 2025-01-08 17:55, Xiaogang.Chen wrote: From: Xiaogang Chen Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kfd process and vm. That design is not working when a physical gpu device has multiple spatial partitions,

[PATCH 2/2] drm/amdkfd: Clear MODE.VSKIP in gfx9 trap handler

2025-01-09 Thread Jay Cornwall
If user shader issues S_SETVSKIP then this state will persist when executing the trap handler, causing vector instructions to be skipped. Restore VSKIP state before resuming the user shader. Signed-off-by: Jay Cornwall Cc: Lancelot Six --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2721 +

[PATCH 1/2] drm/amdkfd: Sync trap handler binary with source

2025-01-09 Thread Jay Cornwall
Source and binary have become mismatched during branch activity. Signed-off-by: Jay Cornwall Cc: Lancelot Six --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 726 +- 1 file changed, 359 insertions(+), 367 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handle

Re: [PATCH] drm/amdgpu: allow pinning DMA-bufs into VRAM if all importers can do P2P

2025-01-09 Thread Simona Vetter
On Thu, 9 Jan 2025 at 17:58, Felix Kuehling wrote: > > From: Christian König > > Try pinning into VRAM to allow P2P with RDMA NICs without ODP > support if all attachments can do P2P. If any attachment can't do > P2P just pin into GTT instead. > > Signed-off-by: Christian König > Signed-off-by:

Re: [PATCH v2] drm/amdgpu: Fix the looply call svm_range_restore_pages issue

2025-01-09 Thread Felix Kuehling
On 2025-01-08 20:11, Philip Yang wrote: On 2025-01-07 22:08, Deng, Emily wrote: [AMD Official Use Only - AMD Internal Distribution Only] Hi Philip, It still has the deadlock, maybe the best way is trying to remove the delayed free pt work. [Wed Jan  8 10:35:44 2025 <    0.00>] INF

Re:

2025-01-09 Thread Mario Limonciello
General note - don't use HTML for mailing list communication. I'm not sure if Apple Mail lets you switch this around. If not, you might try using Thunderbird instead. You can pick to reply in plain text or HTML by holding shift when you hit "reply all" For my reply I'll convert my reply to p

Re: [PATCH 1/5] drm/amdgpu/gfx: add ring helpers for setting workload profile

2025-01-09 Thread Alex Deucher
On Wed, Jan 8, 2025 at 11:17 PM Feng, Kenneth wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > -Original Message- > From: Deucher, Alexander > Sent: Thursday, January 9, 2025 6:56 AM > To: amd-gfx@lists.freedesktop.org > Cc: Pillai, Aurabindo ; Feng, Kenneth > ; De

Re: [PATCH 1/5] drm/amdgpu/gfx: add ring helpers for setting workload profile

2025-01-09 Thread Alex Deucher
On Thu, Jan 9, 2025 at 12:59 AM Lazar, Lijo wrote: > > > > On 1/9/2025 4:26 AM, Alex Deucher wrote: > > Add helpers to switch the workload profile dynamically when > > commands are submitted. This allows us to switch to > > the FULLSCREEN3D or COMPUTE profile when work is submitted. > > Add a del

[PATCH] drm/amdgpu: allow pinning DMA-bufs into VRAM if all importers can do P2P

2025-01-09 Thread Felix Kuehling
From: Christian König Try pinning into VRAM to allow P2P with RDMA NICs without ODP support if all attachments can do P2P. If any attachment can't do P2P just pin into GTT instead. Signed-off-by: Christian König Signed-off-by: Felix Kuehling Reviewed-by: Felix Kuehling Tested-by: Pak Nin Lui

[pull] amdgpu, amdkfd drm-fixes-6.13

2025-01-09 Thread Alex Deucher
Hi Dave, Simona, Fixes for 6.13. The following changes since commit 273b3eb600713a5e71c64b8b403b355dc580f167: Merge tag 'drm-xe-fixes-2025-01-02' of https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes (2025-01-03 10:57:31 +1000) are available in the Git repository at: https://git

Re: [PATCH v2] drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX Isolation

2025-01-09 Thread Alex Deucher
On Thu, Jan 9, 2025 at 11:17 AM Srinivasan Shanmugam wrote: > > This commit addresses a circular locking dependency issue within the GFX > isolation mechanism. The problem was identified by a warning indicating > a potential deadlock due to inconsistent lock acquisition order. > > - The `amdgpu_gf

[PATCH v2] drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX Isolation

2025-01-09 Thread Srinivasan Shanmugam
This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_

Re: [v3 5/6] drm/amdgpu: fix invalid memory access in amdgpu_fence_driver_sw_fini()

2025-01-09 Thread Christian König
Am 08.01.25 um 17:30 schrieb Chen, Xiaogang: On 1/8/2025 3:16 AM, Christian König wrote: Am 08.01.25 um 09:56 schrieb Jiang Liu: Function detects initialization status by checking sched->ops, Where is that done? Inside the scheduler or inside amdgpu? Inside amdgpu set ring->sched.ops to null

RE: [PATCH] drm/amdgpu: fix gpu recovery disable with per queue reset

2025-01-09 Thread Kim, Jonathan
[Public] > -Original Message- > From: Lazar, Lijo > Sent: Thursday, January 9, 2025 1:14 AM > To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org > Cc: Kasiviswanathan, Harish > Subject: Re: [PATCH] drm/amdgpu: fix gpu recovery disable with per queue reset > > > > On 1/9/2025 1:31 AM, Jona

Re: amdgpu 4k@120Hz / HDMI 2.1

2025-01-09 Thread Michel Dänzer
On 2025-01-09 12:00, Mischa Baars wrote: > On Mon, Jan 6, 2025 at 4:41 PM Michel Dänzer > mailto:michel.daen...@mailbox.org>> > wrote: > >> I'm sort of a fan of Michael Abrash, as he inspired me to learn >> programming assembly language a long time ago, but in his Graphics >> Programming Black Boo

Re: [PATCH] drm/amdgpu: disable gfxoff with the compute workload on gfx12

2025-01-09 Thread Alex Deucher
On Thu, Jan 9, 2025 at 3:58 AM Kenneth Feng wrote: > > Disable gfxoff with the compute workload on gfx12. This is a > workaround for the opencl test failure. > > Signed-off-by: Kenneth Feng > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 6 -- > 1 file changed, 4 insertions(+), 2 deleti

[PATCH v4] drm/amdgpu: Fix the looply call svm_range_restore_pages

2025-01-09 Thread Emily Deng
As the delayed free pt, the wanted freed bo has been reused, which will cause unexpected page fault, and then call svm_range_restore_pages. Detail as below: 1.It wants to free the pt in follow code, but it is not freed immediately and used schedule_work(&vm->pt_free_work); [ 92.276838] Call Tra

Re: [PATCH v2 1/2] drm/amdgpu: map doorbell for the requested userq

2025-01-09 Thread Sharma, Shashank
On 09/01/2025 11:34, Saleemkhan Jamadar wrote: Introduce db_info structure to the populate the doorbell information that is required to be mapped. Made changes to the doorbell mapping func more generic, by taking parameters that vary based on IPs and/or usecase into db_info structure. v2 - Fi

Re: [PATCH v2 2/2] drm/amdgpu: add db size and offset range for VCN and VPE

2025-01-09 Thread Sharma, Shashank
On 09/01/2025 11:34, Saleemkhan Jamadar wrote: VCN and VPE have different offset range, update the doorbell offset range repsectively. Doorbell size for VCN and VPE is 32bit . v1 : add gfx switch case and fix checkpatch warnings (Shashank) Signed-off-by: Saleemkhan Jamadar --- drivers/gpu/

Re: amdgpu 4k@120Hz / HDMI 2.1

2025-01-09 Thread Mischa Baars
On Mon, Jan 6, 2025 at 4:41 PM Michel Dänzer wrote: > Yeah, that's not how double-buffering works in GL. The draw buffer is always GL_BACK, SwapBuffers doesn't affect that (it just may internally change which actual buffer GL_BACK refers to). > > I don't see more context about the issue you're in

[PATCH v2 2/2] drm/amdgpu: add db size and offset range for VCN and VPE

2025-01-09 Thread Saleemkhan Jamadar
VCN and VPE have different offset range, update the doorbell offset range repsectively. Doorbell size for VCN and VPE is 32bit . v1 : add gfx switch case and fix checkpatch warnings (Shashank) Signed-off-by: Saleemkhan Jamadar --- drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 24 +

[PATCH v2 1/2] drm/amdgpu: map doorbell for the requested userq

2025-01-09 Thread Saleemkhan Jamadar
Introduce db_info structure to the populate the doorbell information that is required to be mapped. Made changes to the doorbell mapping func more generic, by taking parameters that vary based on IPs and/or usecase into db_info structure. v2 - Fix space alignment and checkpatch warnings(Shashank)

[PATCH v2 0/2] Allocate and map doorbell based on requests

2025-01-09 Thread Saleemkhan Jamadar
Hi, Why: The current implementation of doorbell mapping does not handle the IP specific doorbell size and offset range. Multiple doorbell allocation when requested cannot be allocated due hard use of "struct amdgpu_usermode_queue" parameters. But these parameters vary for each request of doorbell

Re: amdgpu 4k@120Hz / HDMI 2.1

2025-01-09 Thread Mischa Baars
On Mon, Jan 6, 2025 at 4:30 AM Mario Limonciello wrote: > When new specifications are made available it's not like the old one > suddenly becomes "open", so I don't see any reason that a new > specification would change anything. I paid about €3000 for my new PC, including €300 for the graphics

[PATCH] drm/amdgpu: disable gfxoff with the compute workload on gfx12

2025-01-09 Thread Kenneth Feng
Disable gfxoff with the compute workload on gfx12. This is a workaround for the opencl test failure. Signed-off-by: Kenneth Feng --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/

Re: [PATCH v1 2/3] xfs/libxfs: replace kmalloc() and memcpy() with kmemdup()

2025-01-09 Thread Carlos Maiolino
Hi Mirsad. Did you send only this patch, or did I miss patch 1 and 3 of the series? I can't find them anywhere. Carlos On Tue, Dec 17, 2024 at 11:58:12PM +0100, Mirsad Todorovac wrote: > The source static analysis tool gave the following advice: > > ./fs/xfs/libxfs/xfs_dir2.c:382:15-22: WARNING