[PATCH v2] drm/amdgpu: Fix the looply call svm_range_restore_pages issue

2025-01-02 Thread Emily Deng
As the delayed free pt, the wanted freed bo has been reused which will cause unexpected page fault, and then call svm_range_restore_pages. Detail as below: 1.It wants to free the pt in follow code, but it is not freed immediately and used “schedule_work(&vm->pt_free_work);”. [ 92.276838] Call T

Re: [PATCH 2/6] amdgpu: fix invalid memory access in kfd_cleanup_nodes()

2025-01-02 Thread Chen, Xiaogang
On 1/2/2025 8:22 PM, Gerry Liu wrote: 2025年1月3日 07:08,Chen, Xiaogang 写道: On 1/1/2025 11:36 PM, Jiang Liu wrote: On error recover path during device probe, it may trigger invalid memory access as below: 024-12-25 12:00:53 [ 2703.773040] general protection fault, probably for non-canonical

Re: [PATCH v2] drm/ci: uprev IGT

2025-01-02 Thread Dmitry Baryshkov
On Tue, Dec 17, 2024 at 09:36:52PM +0530, Vignesh Raman wrote: > Uprev IGT to the latest version and update expectation files. > > Signed-off-by: Vignesh Raman > --- > > v1: > - Pipeline link - > https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1327810 > Will update the flake

Re: [PATCH 4/6] amdgpu: fix use after free bug related to amdgpu_driver_release_kms()

2025-01-02 Thread Chen, Xiaogang
On 1/1/2025 11:36 PM, Jiang Liu wrote: If some GPU device failed to probe, `rmmod amdgpu` will trigger a use after free bug related to amdgpu_driver_release_kms() as: 2024-12-26 16:17:45 [16002.085540] BUG: kernel NULL pointer dereference, address: 2024-12-26 16:17:45 [16002.09

Re: [PATCH 2/6] amdgpu: fix invalid memory access in kfd_cleanup_nodes()

2025-01-02 Thread Chen, Xiaogang
On 1/2/2025 11:55 PM, Gerry Liu wrote: 2025年1月3日 13:44,Chen, Xiaogang 写道: On 1/2/2025 8:22 PM, Gerry Liu wrote: 2025年1月3日 07:08,Chen, Xiaogang 写道: On 1/1/2025 11:36 PM, Jiang Liu wrote: On error recover path during device probe, it may trigger invalid memory access as below: 024-12

Re: [drm/amdgpu]: user queue doorbell allocation for IP reqs

2025-01-02 Thread Saleemkhan Jamadar
Hi Shashank, Replied inline [Saleem] Regards, Salem On 02/01/25 18:58, Sharma, Shashank wrote: + (amd-gfx) On 01/01/2025 07:03, Saleemkhan Jamadar wrote: #resending  patch  From 79cd62f882197505dbf9c489ead2b0bcab98209f Mon Sep 17 00:00:00 2001 From: Saleemkhan Jamadar Date: Wed, 18 Dec 2

Re: [drm/amdgpu]: user queue doorbell allocation for IP reqs

2025-01-02 Thread Sharma, Shashank
On 03/01/2025 07:34, Saleemkhan Jamadar wrote: Hi Shashank, Replied inline [Saleem] Regards, Salem On 02/01/25 18:58, Sharma, Shashank wrote: + (amd-gfx) On 01/01/2025 07:03, Saleemkhan Jamadar wrote: #resending  patch  From 79cd62f882197505dbf9c489ead2b0bcab98209f Mon Sep 17 00:00:00

[PATCH] drm/amdkfd: Fix partial migrate issue

2025-01-02 Thread Emily Deng
For partial migrate from ram to vram, the migrate->cpages is not equal to migrate->npages, should use migrate->npages to check all needed migrate pages which could be copied or not. And only need to set those pages could be migrated to migrate->dst[i], or the migrate_vma_pages will migrate the wro

[PATCH 1/2] drm/amdgpu: Fix the looply call svm_range_restore_pages issue

2025-01-02 Thread Emily Deng
As the delayed free pt, the wanted freed bo has been reused which will cause unexpected page fault, and then call svm_range_restore_pages. Detail as below: 1.It wants to free the pt in follow code, but it is not freed immediately and used “schedule_work(&vm->pt_free_work);”. [ 92.276838] Call T

Re: [PATCH 1/6] amdgpu: add flags to track sysfs initialization status

2025-01-02 Thread Chen, Xiaogang
On 1/1/2025 11:36 PM, Jiang Liu wrote: Add flags to track sysfs initialization status, so we can correctly clean them up on error recover paths. Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 34

Re: [PATCH 2/6] amdgpu: fix invalid memory access in kfd_cleanup_nodes()

2025-01-02 Thread Chen, Xiaogang
On 1/1/2025 11:36 PM, Jiang Liu wrote: On error recover path during device probe, it may trigger invalid memory access as below: 024-12-25 12:00:53 [ 2703.773040] general protection fault, probably for non-canonical address 0x52445f4749464e4f: [#1] SMP NOPTI 2024-12-25 12:00:53 [ 2703.7851

Re: [PATCH 5/6] amdgpu: fix invalid memory access in amdgpu_fence_driver_sw_fini()

2025-01-02 Thread Chen, Xiaogang
On 1/1/2025 11:36 PM, Jiang Liu wrote: Function detects initialization status by checking sched->ops, so set sched->ops to non-NULL just before return in function drm_sched_init() to avoid possible invalid memory access on error recover path. Signed-off-by: Jiang Liu --- drivers/gpu/drm/sch

Re: [PATCH 4/6] amdgpu: fix use after free bug related to amdgpu_driver_release_kms()

2025-01-02 Thread Lazar, Lijo
On 1/2/2025 11:06 AM, Jiang Liu wrote: > If some GPU device failed to probe, `rmmod amdgpu` will trigger a use > after free bug related to amdgpu_driver_release_kms() as: > 2024-12-26 16:17:45 [16002.085540] BUG: kernel NULL pointer dereference, > address: > 2024-12-26 16:17:45

RE: [PATCH] drm/amdgpu: reduce RLC safe mode request for gfx clock gating

2025-01-02 Thread Liang, Prike
[AMD Official Use Only - AMD Internal Distribution Only] Ping... Regards, Prike > -Original Message- > From: Liang, Prike > Sent: Tuesday, December 24, 2024 2:16 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Lazar, Lijo > ; Liang, Prike > Subject: [PATCH] drm/

RE: [PATCH] drm/amdkfd: test release process eviction fence before signal

2025-01-02 Thread Liang, Prike
[AMD Official Use Only - AMD Internal Distribution Only] Thanks for the information. I draft this patch to resolve the HIP stream test that complained about the KFD process signals an invalidate fence on the latest drm-next branch. BTW, it looks like your patch still hasn't landed in the drm-ne

Re: [PATCH 1/6] amdgpu: add flags to track sysfs initialization status

2025-01-02 Thread Lazar, Lijo
On 1/2/2025 11:06 AM, Jiang Liu wrote: > Add flags to track sysfs initialization status, so we can correctly > clean them up on error recover paths. > > Signed-off-by: Jiang Liu > --- > drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++ > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 34 +

[PATCH] drm/amdgpu: Remove unnecessary NULL check

2025-01-02 Thread Kent Russell
container_of cannot return NULL, so it is unnecessary to check for NULL after gem_to_amdgpu_bo, which just is a container_of call Signed-off-by: Kent Russell --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 ++ 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/

RE: [PATCH v2] drm/amdgpu: Fix error handling in amdgpu_ras_add_bad_pages

2025-01-02 Thread Zhou1, Tao
[AMD Official Use Only - AMD Internal Distribution Only] > -Original Message- > From: SHANMUGAM, SRINIVASAN > Sent: Tuesday, December 17, 2024 5:39 PM > To: Koenig, Christian ; Deucher, Alexander > ; Chai, Thomas > Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN > ; Dan Carpente

Re: [PATCH 09/12] drm/amdgpu: Optimise amdgpu_ring_write()

2025-01-02 Thread Christian König
Am 27.12.24 um 12:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin There are more than 2000 calls to amdgpu_ring_write() in the driver and the majority is multiple sequential calls which the compiler cannot optimise much. Lets make this helper variadic via some pre-processor magic which allows

Re: [drm/amdgpu]: user queue doorbell allocation for IP reqs

2025-01-02 Thread Sharma, Shashank
+ (amd-gfx) On 01/01/2025 07:03, Saleemkhan Jamadar wrote: #resending patch From 79cd62f882197505dbf9c489ead2b0bcab98209f Mon Sep 17 00:00:00 2001 From: Saleemkhan Jamadar Date: Wed, 18 Dec 2024 19:30:22 +0530 Subject: [PATCH] drm/amdgpu: user queue doorbell allocation for IP reqs Currenlty

Re: [PATCH 01/12] drm/amdgpu: Use memset32 for IB padding

2025-01-02 Thread Christian König
Am 27.12.24 um 12:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Use memset32 instead of open coding it, just because it is that bit nicer. In general looks mostly good, my only concern is that we already had to switch to memset_io() on some platforms for clearing buffers. Now an IB sh

Re: [PATCH 04/12] drm/amdgpu: Consolidate a bunch of similar sdma insert nop vfuncs

2025-01-02 Thread Christian König
Am 27.12.24 um 12:19 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin A lot of the hardware generations apparently uses the same nop insertion logic, just with different masks and shifts. We can consolidate if we store those shifts and mask in the ring and shrink both the source and binary. The

Re: [PATCH] drm/amdgpu: Add null pointer check before task_info get and put

2025-01-02 Thread Sharma, Shashank
Lgtm, Reviewed-by: Shashank Sharma Regards Shashank On 01/01/2025 02:58, Lu Yao wrote: This patch add null pointer check for amdgpu_vm_put_task_info and amdgpu_vm_get_task_info_vm, because there is only a warning if create task_info failed in amdgpu_vm_init. Fixes: b8f67b9ddf4f ("drm/amdgpu

[PATCH] Revert "drm/amd/display: Optimize cursor position updates"

2025-01-02 Thread Aurabindo Pillai
This reverts commit 742d670b416b272e42f6674e30e393bbb8ffa6d1. SW and HW state are not always matching in some cases causing cursor to be disabled. --- drivers/gpu/drm/amd/display/dc/dpp/dcn10/dcn10_dpp.c | 7 +++ .../gpu/drm/amd/display/dc/dpp/dcn401/dcn401_dpp_cm.c | 6 ++ drivers/g

RE: [PATCH] drm/amdkfd: test release process eviction fence before signal

2025-01-02 Thread Zhu, Lingshan
[AMD Official Use Only - AMD Internal Distribution Only] There is currently a bug in CI (requiring Intel Graphic Card which is totally nonsense ) blocking this merge request. Thanks Lingshan -Original Message- From: Liang, Prike Sent: Thursday, January 2, 2025 4:04 PM To: Zhu, Lingshan

Re: [PATCH] drm/amdgpu: reduce RLC safe mode request for gfx clock gating

2025-01-02 Thread Lazar, Lijo
On 12/24/2024 11:46 AM, Prike Liang wrote: > The driver can only request one time for the power safe mode instead of > polling and disabling the power feature each time prior to program the > GFX clock gating control registers. This update will reduce the latency > on the GFX clock gating entry.

Re: [PATCH v2] drm/amdgpu: Fix Illegal opcode in command stream Error

2025-01-02 Thread Christian König
Am 23.12.24 um 16:34 schrieb Arvind Yadav: When applications closes, it triggers the drm_file_free function which subsequently releases all allocated buffer objects. Concurrently, the resume_worker thread will attempt to map the usermode queue. However, since the wptr buffer object has already be

[PATCH 5/6] amdgpu: fix invalid memory access in amdgpu_fence_driver_sw_fini()

2025-01-02 Thread Jiang Liu
Function detects initialization status by checking sched->ops, so set sched->ops to non-NULL just before return in function drm_sched_init() to avoid possible invalid memory access on error recover path. Signed-off-by: Jiang Liu --- drivers/gpu/drm/scheduler/sched_main.c | 3 +++ 1 file changed,

[PATCH 1/6] amdgpu: add flags to track sysfs initialization status

2025-01-02 Thread Jiang Liu
Add flags to track sysfs initialization status, so we can correctly clean them up on error recover paths. Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 34 +- 2 files changed, 30 insertions(+), 7

[PATCH 6/6] amdgpu: get rid of false warnings caused by amdgpu_irq_put()

2025-01-02 Thread Jiang Liu
If error happens before amdgpu_fence_driver_hw_init() gets called during device probe, it will trigger a false warning in amdgpu_irq_put() as below: [ 1209.300996] [ cut here ] [ 1209.301061] WARNING: CPU: 48 PID: 293 at /tmp/amd.Rc9jFrl7/amd/amdgpu/amdgpu_irq.c:633 amdgpu_

[PATCH 4/6] amdgpu: fix use after free bug related to amdgpu_driver_release_kms()

2025-01-02 Thread Jiang Liu
If some GPU device failed to probe, `rmmod amdgpu` will trigger a use after free bug related to amdgpu_driver_release_kms() as: 2024-12-26 16:17:45 [16002.085540] BUG: kernel NULL pointer dereference, address: 2024-12-26 16:17:45 [16002.093792] #PF: supervisor read access in kerne

[PATCH 3/6] amdgpu: clear adev->in_suspend flag when fails to suspend

2025-01-02 Thread Jiang Liu
Clear adev->in_suspend flag when fails to suspend, otherwise it will cause too much warnings like: [ 1802.212027] [ cut here ] [ 1802.212028] WARNING: CPU: 97 PID: 11282 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:452 amdgpu_bo_free_kernel+0xf9/0x120 [amdgpu] [ 1802.2121

[PATCH 0/6] Fix several bugs in error handling during device

2025-01-02 Thread Jiang Liu
This patchset tries to fix several memory leakages/invalid memory accesses on error handling path during GPU driver loading/unloading. They applies to: https://github.com/ROCm/ROCK-Kernel-Driver/tree/master/drivers Jiang Liu (6): amdgpu: add flags to track sysfs initialization status amdgpu: f

[PATCH 2/6] amdgpu: fix invalid memory access in kfd_cleanup_nodes()

2025-01-02 Thread Jiang Liu
On error recover path during device probe, it may trigger invalid memory access as below: 024-12-25 12:00:53 [ 2703.773040] general protection fault, probably for non-canonical address 0x52445f4749464e4f: [#1] SMP NOPTI 2024-12-25 12:00:53 [ 2703.785199] CPU: 157 PID: 151951 Comm: rmmod Kdump

Re: [PATCH] Revert "drm/amd/display: Optimize cursor position updates"

2025-01-02 Thread Leo Li
On 2025-01-02 13:16, Aurabindo Pillai wrote: This reverts commit 742d670b416b272e42f6674e30e393bbb8ffa6d1. SW and HW state are not always matching in some cases causing cursor to be disabled. With your SOB, this is Reviewed-by: Leo Li --- drivers/gpu/drm/amd/display/dc/dpp/dcn10/dcn10