Re: [PATCH] drm/amdkfd: Fix partial migrate issue

2025-01-03 Thread Felix Kuehling
On 2025-01-02 19:06, Emily Deng wrote: For partial migrate from ram to vram, the migrate->cpages is not equal to migrate->npages, should use migrate->npages to check all needed migrate pages which could be copied or not. And only need to set those pages could be migrated to migrate->dst[i], or

Re: [PATCH] drm/amdgpu: Remove unnecessary NULL check

2025-01-03 Thread Felix Kuehling
On 2025-01-02 10:01, Kent Russell wrote: container_of cannot return NULL, so it is unnecessary to check for NULL after gem_to_amdgpu_bo, which just is a container_of call Signed-off-by: Kent Russell Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 ++ 1

Re: [PATCH v2] drm/amdgpu: Fix the looply call svm_range_restore_pages issue

2025-01-03 Thread Felix Kuehling
On 2025-01-02 21:26, Emily Deng wrote: As the delayed free pt, the wanted freed bo has been reused which will cause unexpected page fault, and then call svm_range_restore_pages. Detail as below: 1.It wants to free the pt in follow code, but it is not freed immediately and used “schedule_work(&

Re: [PATCH v3] drm/amdkfd: Have kfd driver use same PASID values from graphic driver

2025-01-03 Thread Felix Kuehling
On 2024-12-03 13:03, Xiaogang.Chen wrote: From: Xiaogang Chen Current kfd driver has its own PASID value for a kfd process and uses it to locate vm at interrupt handler or mapping between kfd process and vm. That design is not working when a physical gpu device has multiple spatial partitions

[PATCH v4] drm/amdkfd: Uninitialized and Unused variables

2025-01-03 Thread Andrew Martin
This patch initialized key variables and removed unused ones. Signed-off-by: Andrew Martin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 8 ++-- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 2 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 38 --- .../gpu/drm/amd/a

Re: [PATCH] drm/amd/display: add CEC notifier to amdgpu driver

2025-01-03 Thread Harry Wentland
On 2024-12-30 03:15, Kun Liu wrote: > This patch adds the cec_notifier feature to amdgpu driver. > The changes will allow amdgpu driver code to notify EDID > and HPD changes to an eventual CEC adapter. > > Signed-off-by: Kun Liu > --- > drivers/gpu/drm/amd/display/Kconfig | 2 + >

AW: [PATCH] drm/amdgpu: Enable runtime modification of gpu_recovery parameter with validation

2025-01-03 Thread Koenig, Christian
[AMD Official Use Only - AMD Internal Distribution Only] Hi Shuai, setting gpu_recovery=0 is not even remotely related to RAS. If that option affects RAS behavior in any way then that is a bug. The purpose of setting gpu_recovery=0 is to disable resets after a submission timeout most likely ca

Re: [PATCH 4/6] amdgpu: fix use after free bug related to amdgpu_driver_release_kms()

2025-01-03 Thread Shuo Liu
On Fri 3.Jan'25 at 15:02:38 +0800, Gerry Liu wrote: 2025年1月3日 13:58,Chen, Xiaogang 写道: On 1/1/2025 11:36 PM, Jiang Liu wrote: If some GPU device failed to probe, `rmmod amdgpu` will trigger a use after free bug related to amdgpu_driver_release_kms() as: 2024-12-26 16:17:45 [16002.085540]

Re: [PATCH 5/6] amdgpu: fix invalid memory access in amdgpu_fence_driver_sw_fini()

2025-01-03 Thread Gerry Liu
> 2025年1月3日 07:09,Chen, Xiaogang 写道: > > > On 1/1/2025 11:36 PM, Jiang Liu wrote: >> Function detects initialization status by checking sched->ops, so set >> sched->ops to non-NULL just before return in function drm_sched_init() >> to avoid possible invalid memory access on error recover path.

Re: [PATCH 4/6] amdgpu: fix use after free bug related to amdgpu_driver_release_kms()

2025-01-03 Thread Gerry Liu
> 2025年1月3日 13:58,Chen, Xiaogang 写道: > > > > On 1/1/2025 11:36 PM, Jiang Liu wrote: >> If some GPU device failed to probe, `rmmod amdgpu` will trigger a use >> after free bug related to amdgpu_driver_release_kms() as: >> 2024-12-26 16:17:45 [16002.085540] BUG: kernel NULL pointer dereference,

Re: [PATCH 2/6] amdgpu: fix invalid memory access in kfd_cleanup_nodes()

2025-01-03 Thread Gerry Liu
> 2025年1月3日 13:44,Chen, Xiaogang 写道: > > > > On 1/2/2025 8:22 PM, Gerry Liu wrote: >> >> >>> 2025年1月3日 07:08,Chen, Xiaogang >> > 写道: >>> >>> >>> >>> On 1/1/2025 11:36 PM, Jiang Liu wrote: On error recover path during device probe, it may trigger invalid

Re: [PATCH 2/6] amdgpu: fix invalid memory access in kfd_cleanup_nodes()

2025-01-03 Thread Gerry Liu
> 2025年1月3日 07:08,Chen, Xiaogang 写道: > > > > On 1/1/2025 11:36 PM, Jiang Liu wrote: >> On error recover path during device probe, it may trigger invalid >> memory access as below: >> 024-12-25 12:00:53 [ 2703.773040] general protection fault, probably for >> non-canonical address 0x52445f474

Re: [PATCH 2/6] amdgpu: fix invalid memory access in kfd_cleanup_nodes()

2025-01-03 Thread Gerry Liu
> 2025年1月3日 14:19,Chen, Xiaogang 写道: > > > > On 1/2/2025 11:55 PM, Gerry Liu wrote: >> >> >>> 2025年1月3日 13:44,Chen, Xiaogang >> > 写道: >>> >>> >>> >>> On 1/2/2025 8:22 PM, Gerry Liu wrote: > 2025年1月3日 07:08,Chen, Xiaogang

RE: [PATCH 1/6] amdgpu: add flags to track sysfs initialization status

2025-01-03 Thread Russell, Kent
[AMD Official Use Only - AMD Internal Distribution Only] > -Original Message- > From: Chen, Xiaogang > Sent: Thursday, January 2, 2025 6:08 PM > To: Jiang Liu ; amd-gfx@lists.freedesktop.org; > Russell, > Kent ; shuox@linux.alibaba.com > Subject: Re: [PATCH 1/6] amdgpu: add flags to

Re: [PATCH 4/6] amdgpu: fix use after free bug related to amdgpu_driver_release_kms()

2025-01-03 Thread Chen, Xiaogang
On 1/3/2025 1:43 AM, Shuo Liu wrote: On Fri  3.Jan'25 at 15:02:38 +0800, Gerry Liu wrote: 2025年1月3日 13:58,Chen, Xiaogang 写道: On 1/1/2025 11:36 PM, Jiang Liu wrote: If some GPU device failed to probe, `rmmod amdgpu` will trigger a use after free bug related to amdgpu_driver_release_kms(

Re: [PATCH 2/6] amdgpu: fix invalid memory access in kfd_cleanup_nodes()

2025-01-03 Thread Chen, Xiaogang
On 1/3/2025 1:05 AM, Gerry Liu wrote: 2025年1月3日 14:19,Chen, Xiaogang 写道: On 1/2/2025 11:55 PM, Gerry Liu wrote: 2025年1月3日 13:44,Chen, Xiaogang 写道: On 1/2/2025 8:22 PM, Gerry Liu wrote: 2025年1月3日 07:08,Chen, Xiaogang 写道: On 1/1/2025 11:36 PM, Jiang Liu wrote: On error recover

Re: [PATCH v2] drm/ci: uprev IGT

2025-01-03 Thread Dmitry Baryshkov
On Tue, Dec 17, 2024 at 09:36:52PM +0530, Vignesh Raman wrote: > Uprev IGT to the latest version and update expectation files. > > Signed-off-by: Vignesh Raman > --- > > v1: > - Pipeline link - > https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1327810 > Will update the flake

[PATCH] drm/amdgpu: Fix Circular Locking Dependency in AMDGPU GFX Isolation

2025-01-03 Thread Srinivasan Shanmugam
This commit addresses a circular locking dependency issue within the GFX isolation mechanism. The problem was identified by a warning indicating a potential deadlock due to inconsistent lock acquisition order. - The `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_