As the delayed free pt, the wanted freed bo has been reused which will cause
unexpected page fault, and then call svm_range_restore_pages.
Detail as below:
1.It wants to free the pt in follow code, but it is not freed immediately
and used “schedule_work(&vm->pt_free_work);”.
[ 92.276838] Call T
On 1/2/2025 8:22 PM, Gerry Liu wrote:
2025年1月3日 07:08,Chen, Xiaogang 写道:
On 1/1/2025 11:36 PM, Jiang Liu wrote:
On error recover path during device probe, it may trigger invalid
memory access as below:
024-12-25 12:00:53 [ 2703.773040] general protection fault, probably for
non-canonical
On Tue, Dec 17, 2024 at 09:36:52PM +0530, Vignesh Raman wrote:
> Uprev IGT to the latest version and update expectation files.
>
> Signed-off-by: Vignesh Raman
> ---
>
> v1:
> - Pipeline link -
> https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1327810
> Will update the flake
On 1/1/2025 11:36 PM, Jiang Liu wrote:
If some GPU device failed to probe, `rmmod amdgpu` will trigger a use
after free bug related to amdgpu_driver_release_kms() as:
2024-12-26 16:17:45 [16002.085540] BUG: kernel NULL pointer dereference,
address:
2024-12-26 16:17:45 [16002.09
On 1/2/2025 11:55 PM, Gerry Liu wrote:
2025年1月3日 13:44,Chen, Xiaogang 写道:
On 1/2/2025 8:22 PM, Gerry Liu wrote:
2025年1月3日 07:08,Chen, Xiaogang 写道:
On 1/1/2025 11:36 PM, Jiang Liu wrote:
On error recover path during device probe, it may trigger invalid
memory access as below:
024-12
Hi Shashank,
Replied inline [Saleem]
Regards,
Salem
On 02/01/25 18:58, Sharma, Shashank wrote:
+ (amd-gfx)
On 01/01/2025 07:03, Saleemkhan Jamadar wrote:
#resending patch
From 79cd62f882197505dbf9c489ead2b0bcab98209f Mon Sep 17 00:00:00 2001
From: Saleemkhan Jamadar
Date: Wed, 18 Dec 2
On 03/01/2025 07:34, Saleemkhan Jamadar wrote:
Hi Shashank,
Replied inline [Saleem]
Regards,
Salem
On 02/01/25 18:58, Sharma, Shashank wrote:
+ (amd-gfx)
On 01/01/2025 07:03, Saleemkhan Jamadar wrote:
#resending patch
From 79cd62f882197505dbf9c489ead2b0bcab98209f Mon Sep 17 00:00:00
For partial migrate from ram to vram, the migrate->cpages is not
equal to migrate->npages, should use migrate->npages to check all needed
migrate pages which could be copied or not.
And only need to set those pages could be migrated to migrate->dst[i], or
the migrate_vma_pages will migrate the wro
As the delayed free pt, the wanted freed bo has been reused which will cause
unexpected page fault, and then call svm_range_restore_pages.
Detail as below:
1.It wants to free the pt in follow code, but it is not freed immediately
and used “schedule_work(&vm->pt_free_work);”.
[ 92.276838] Call T
On 1/1/2025 11:36 PM, Jiang Liu wrote:
Add flags to track sysfs initialization status, so we can correctly
clean them up on error recover paths.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 34
On 1/1/2025 11:36 PM, Jiang Liu wrote:
On error recover path during device probe, it may trigger invalid
memory access as below:
024-12-25 12:00:53 [ 2703.773040] general protection fault, probably for
non-canonical address 0x52445f4749464e4f: [#1] SMP NOPTI
2024-12-25 12:00:53 [ 2703.7851
On 1/1/2025 11:36 PM, Jiang Liu wrote:
Function detects initialization status by checking sched->ops, so set
sched->ops to non-NULL just before return in function drm_sched_init()
to avoid possible invalid memory access on error recover path.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/sch
On 1/2/2025 11:06 AM, Jiang Liu wrote:
> If some GPU device failed to probe, `rmmod amdgpu` will trigger a use
> after free bug related to amdgpu_driver_release_kms() as:
> 2024-12-26 16:17:45 [16002.085540] BUG: kernel NULL pointer dereference,
> address:
> 2024-12-26 16:17:45
[AMD Official Use Only - AMD Internal Distribution Only]
Ping...
Regards,
Prike
> -Original Message-
> From: Liang, Prike
> Sent: Tuesday, December 24, 2024 2:16 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Lazar, Lijo
> ; Liang, Prike
> Subject: [PATCH] drm/
[AMD Official Use Only - AMD Internal Distribution Only]
Thanks for the information. I draft this patch to resolve the HIP stream test
that complained about the KFD process signals an invalidate fence on the latest
drm-next branch. BTW, it looks like your patch still hasn't landed in the
drm-ne
On 1/2/2025 11:06 AM, Jiang Liu wrote:
> Add flags to track sysfs initialization status, so we can correctly
> clean them up on error recover paths.
>
> Signed-off-by: Jiang Liu
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 34 +
container_of cannot return NULL, so it is unnecessary to check for
NULL after gem_to_amdgpu_bo, which just is a container_of call
Signed-off-by: Kent Russell
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 6 ++
1 file changed, 2 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/
[AMD Official Use Only - AMD Internal Distribution Only]
> -Original Message-
> From: SHANMUGAM, SRINIVASAN
> Sent: Tuesday, December 17, 2024 5:39 PM
> To: Koenig, Christian ; Deucher, Alexander
> ; Chai, Thomas
> Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN
> ; Dan Carpente
Am 27.12.24 um 12:19 schrieb Tvrtko Ursulin:
From: Tvrtko Ursulin
There are more than 2000 calls to amdgpu_ring_write() in the driver and
the majority is multiple sequential calls which the compiler cannot
optimise much.
Lets make this helper variadic via some pre-processor magic which allows
+ (amd-gfx)
On 01/01/2025 07:03, Saleemkhan Jamadar wrote:
#resending patch
From 79cd62f882197505dbf9c489ead2b0bcab98209f Mon Sep 17 00:00:00 2001
From: Saleemkhan Jamadar
Date: Wed, 18 Dec 2024 19:30:22 +0530
Subject: [PATCH] drm/amdgpu: user queue doorbell allocation for IP reqs
Currenlty
Am 27.12.24 um 12:19 schrieb Tvrtko Ursulin:
From: Tvrtko Ursulin
Use memset32 instead of open coding it, just because it is
that bit nicer.
In general looks mostly good, my only concern is that we already had to
switch to memset_io() on some platforms for clearing buffers.
Now an IB sh
Am 27.12.24 um 12:19 schrieb Tvrtko Ursulin:
From: Tvrtko Ursulin
A lot of the hardware generations apparently uses the same nop insertion
logic, just with different masks and shifts.
We can consolidate if we store those shifts and mask in the ring and
shrink both the source and binary.
The
Lgtm, Reviewed-by: Shashank Sharma
Regards
Shashank
On 01/01/2025 02:58, Lu Yao wrote:
This patch add null pointer check for amdgpu_vm_put_task_info and
amdgpu_vm_get_task_info_vm, because there is only a warning if create
task_info failed in amdgpu_vm_init.
Fixes: b8f67b9ddf4f ("drm/amdgpu
This reverts commit 742d670b416b272e42f6674e30e393bbb8ffa6d1.
SW and HW state are not always matching in some cases causing cursor to
be disabled.
---
drivers/gpu/drm/amd/display/dc/dpp/dcn10/dcn10_dpp.c | 7 +++
.../gpu/drm/amd/display/dc/dpp/dcn401/dcn401_dpp_cm.c | 6 ++
drivers/g
[AMD Official Use Only - AMD Internal Distribution Only]
There is currently a bug in CI (requiring Intel Graphic Card which is totally
nonsense ) blocking this merge request.
Thanks
Lingshan
-Original Message-
From: Liang, Prike
Sent: Thursday, January 2, 2025 4:04 PM
To: Zhu, Lingshan
On 12/24/2024 11:46 AM, Prike Liang wrote:
> The driver can only request one time for the power safe mode instead of
> polling and disabling the power feature each time prior to program the
> GFX clock gating control registers. This update will reduce the latency
> on the GFX clock gating entry.
Am 23.12.24 um 16:34 schrieb Arvind Yadav:
When applications closes, it triggers the drm_file_free
function which subsequently releases all allocated buffer
objects. Concurrently, the resume_worker thread will attempt
to map the usermode queue. However, since the wptr buffer
object has already be
Function detects initialization status by checking sched->ops, so set
sched->ops to non-NULL just before return in function drm_sched_init()
to avoid possible invalid memory access on error recover path.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/scheduler/sched_main.c | 3 +++
1 file changed,
Add flags to track sysfs initialization status, so we can correctly
clean them up on error recover paths.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 34 +-
2 files changed, 30 insertions(+), 7
If error happens before amdgpu_fence_driver_hw_init() gets called during
device probe, it will trigger a false warning in amdgpu_irq_put() as
below:
[ 1209.300996] [ cut here ]
[ 1209.301061] WARNING: CPU: 48 PID: 293 at
/tmp/amd.Rc9jFrl7/amd/amdgpu/amdgpu_irq.c:633 amdgpu_
If some GPU device failed to probe, `rmmod amdgpu` will trigger a use
after free bug related to amdgpu_driver_release_kms() as:
2024-12-26 16:17:45 [16002.085540] BUG: kernel NULL pointer dereference,
address:
2024-12-26 16:17:45 [16002.093792] #PF: supervisor read access in kerne
Clear adev->in_suspend flag when fails to suspend, otherwise it will
cause too much warnings like:
[ 1802.212027] [ cut here ]
[ 1802.212028] WARNING: CPU: 97 PID: 11282 at
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:452 amdgpu_bo_free_kernel+0xf9/0x120
[amdgpu]
[ 1802.2121
This patchset tries to fix several memory leakages/invalid memory
accesses on error handling path during GPU driver loading/unloading.
They applies to:
https://github.com/ROCm/ROCK-Kernel-Driver/tree/master/drivers
Jiang Liu (6):
amdgpu: add flags to track sysfs initialization status
amdgpu: f
On error recover path during device probe, it may trigger invalid
memory access as below:
024-12-25 12:00:53 [ 2703.773040] general protection fault, probably for
non-canonical address 0x52445f4749464e4f: [#1] SMP NOPTI
2024-12-25 12:00:53 [ 2703.785199] CPU: 157 PID: 151951 Comm: rmmod Kdump
On 2025-01-02 13:16, Aurabindo Pillai wrote:
This reverts commit 742d670b416b272e42f6674e30e393bbb8ffa6d1.
SW and HW state are not always matching in some cases causing cursor to
be disabled.
With your SOB, this is
Reviewed-by: Leo Li
---
drivers/gpu/drm/amd/display/dc/dpp/dcn10/dcn10
35 matches
Mail list logo