Re: [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-05-21 Thread Zhang, GuoQing (Sam)
On 2025/5/21 20:00, Christian König wrote: > On 5/21/25 13:55, Zhang, GuoQing (Sam) wrote: >> On 2025/5/21 16:06, Christian König wrote: >>> On 5/20/25 07:10, Zhang, GuoQing (Sam) wrote: >> +if (amdgpu_virt_xgmi_migrate_enabled(adev)) { >> +/* set mc->vram_start to 0 to swi

[PATCH v7 2/4] drm/amdgpu: update GPU addresses for SMU and PSP

2025-05-21 Thread Samuel Zhang
add amdgpu_bo_fb_aper_addr() and update the cached GPU addresses to use the FB aperture address for SMU and PSP. 2 reasons for this change: 1. when pdb0 is enabled, gpu addr from amdgpu_bo_create_kernel() is GART aperture address, it is not compatible with SMU and PSP, it need to be updated to use

[PATCH v7 0/4] enable xgmi node migration support for hibernate on SRIOV.

2025-05-21 Thread Samuel Zhang
On SRIOV and VM environment, customer may need to switch to new vGPU indexes after hibernate and then resume the VM. For GPUs with XGMI, `vram_start` will change in this case, the FB aperture gpu address of VRAM BOs will also change. These gpu addresses need to be updated when resume. But these a

[PATCH v7 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-05-21 Thread Samuel Zhang
When switching to new GPU index after hibernation and then resume, VRAM offset of each VRAM BO will be changed, and the cached gpu addresses needed to updated. This is to enable pdb0 and switch to use pdb0-based virtual gpu address by default in amdgpu_bo_create_reserved(). since the virtual addre

RE: [PATCH v2] drm/ttm: Should to return the evict error

2025-05-21 Thread Deng, Emily
[AMD Official Use Only - AMD Internal Distribution Only] Ping.. Emily Deng Best Wishes >-Original Message- >From: Emily Deng >Sent: Wednesday, May 21, 2025 11:57 AM >To: amd-gfx@lists.freedesktop.org >Cc: Deng, Emily >Subject: [PATCH v2] drm/ttm: Should to return the evict error

[PATCH v7 4/4] drm/amdgpu: fix fence fallback timer expired error

2025-05-21 Thread Samuel Zhang
IH is not working after switching a new gpu index for the first time. The msix table in virtual machine is faked. The real msix table will be programmed by QEMU when guest enable/disable msix interrupt. But QEMU accessing VF msix table (register GFXMSIX_VECT0_ADDR_LO) is blocked by nBIF protection

Re: [PATCH v10 08/10] drm: Get rid of drm_sched_job.id

2025-05-21 Thread kernel test robot
Hi Pierre-Eric, kernel test robot noticed the following build errors: [auto build test ERROR on drm-xe/drm-xe-next] [also build test ERROR on next-20250521] [cannot apply to lwn/docs-next linus/master v6.15-rc7] [If your patch is applied to the wrong git tree, kindly drop us a note. And when

[PATCH v7 1/4] drm/amdgpu: update xgmi info and vram_base_offset on resume

2025-05-21 Thread Samuel Zhang
For SRIOV VM env with XGMI enabled systems, XGMI physical node id may change when hibernate and resume with different VF. Update XGMI info and vram_base_offset on resume for gfx444 SRIOV env. Add amdgpu_virt_xgmi_migrate_enabled() as the feature flag. Signed-off-by: Jiang Liu Signed-off-by: Samu

RE: [PATCH] Revert "drm/amd/display: pause the workload setting in dm"

2025-05-21 Thread Feng, Kenneth
[AMD Official Use Only - AMD Internal Distribution Only] Will figure out another way to fix the MALL idle power issue previously. Reviewed-by: Kenneth Feng -Original Message- From: Alex Deucher Sent: Thursday, May 22, 2025 1:47 AM To: Zuo, Jerry Cc: amd-gfx@lists.freedesktop.org; Feng

Re: [PATCH v5] drm/amd/amdgpu: Add GPIO resources required for amdisp

2025-05-21 Thread Mario Limonciello
On 5/21/2025 3:49 PM, Pratap Nirujogi wrote: ISP is a child device to GFX, and its device specific information is not available in ACPI. Adding the 2 GPIO resources required for ISP_v4_1_1 in amdgpu_isp driver. - GPIO 0 to allow sensor driver to enable and disable sensor module. - GPIO 85 to all

[PATCH v3 8/9] drm/amdgpu: read back register after written

2025-05-21 Thread David (Ming Qiang) Wu
The addition of register read-back in VCN v5.0.0 is intended to prevent potential race conditions. Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_0.c b

[PATCH v5] drm/amd/amdgpu: Add GPIO resources required for amdisp

2025-05-21 Thread Pratap Nirujogi
ISP is a child device to GFX, and its device specific information is not available in ACPI. Adding the 2 GPIO resources required for ISP_v4_1_1 in amdgpu_isp driver. - GPIO 0 to allow sensor driver to enable and disable sensor module. - GPIO 85 to allow ISP driver to enable and disable ISP RGB str

[PATCH] Revert "drm/amd/display: more liberal vmin/vmax update for freesync"

2025-05-21 Thread Aurabindo Pillai
This reverts commit 219898d29c438d8ec34a5560fac4ea8f6b8d4f20 since it causes regressions on certain configs. Revert until the issue can be isolated and debugged. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4238 Signed-off-by: Aurabindo Pillai --- .../gpu/drm/amd/display/amdgpu_dm/amd

[PATCH v3 6/9] drm/amdgpu: read back register after written

2025-05-21 Thread David (Ming Qiang) Wu
The addition of register read-back in VCN v4.0.3 is intended to prevent potential race conditions. Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 16 1 file changed, 16 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c b/dri

[PATCH v3 3/9] drm/amdgpu: read back register after written

2025-05-21 Thread David (Ming Qiang) Wu
The addition of register read-back in VCN v2.5 is intended to prevent potential race conditions. Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c | 19 +++ 1 file changed, 19 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_5.c b/driver

[PATCH v3 9/9] drm/amdgpu: read back register after written

2025-05-21 Thread David (Ming Qiang) Wu
The addition of register read-back in VCN v5.0.1 is intended to prevent potential race conditions. Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c | 15 +++ 1 file changed, 15 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v5_0_1.c b/driv

[PATCH v3 7/9] drm/amdgpu: read back register after written

2025-05-21 Thread David (Ming Qiang) Wu
The addition of register read-back in VCN v4.0.5 is intended to prevent potential race conditions. Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c | 10 ++ 1 file changed, 10 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0_5.c b/drivers/g

[PATCH v3 4/9] drm/amdgpu: read back register after written

2025-05-21 Thread David (Ming Qiang) Wu
The addition of register read-back in VCN v3.0 is intended to prevent potential race conditions. Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v3_0.c b/drive

[PATCH v3 5/9] drm/amdgpu: read back register after written

2025-05-21 Thread David (Ming Qiang) Wu
The addition of register read-back in VCN v4.0.0 is intended to prevent potential race conditions. Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c | 20 1 file changed, 20 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v4_0.c b/dri

[PATCH v3 2/9] drm/amdgpu: read back register after written

2025-05-21 Thread David (Ming Qiang) Wu
The addition of register read-back in VCN v2.0 is intended to prevent potential race conditions. Signed-off-by: David (Ming Qiang) Wu --- drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vcn_v2_0.c b/driv

[PATCH v3 1/9] drm/amdgpu: read back register after written

2025-05-21 Thread David (Ming Qiang) Wu
V3: drop changes where readbacks have implemented. This patch set is to add readbacks only. V2: use common register UVD_STATUS for readback (standard PCI MMIO behavior, i.e. readback post all writes to let the writes hit the hardware) add read-back in ..._stop() for more coverage.

Re: [PATCH] Revert "drm/amd/display: more liberal vmin/vmax update for freesync"

2025-05-21 Thread Alex Deucher
On Wed, May 21, 2025 at 4:22 PM Aurabindo Pillai wrote: > > This reverts commit 219898d29c438d8ec34a5560fac4ea8f6b8d4f20 since it > causes regressions on certain configs. Revert until the issue can be > isolated and debugged. > > Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4238 > Signe

[PATCH] drm/amd/display: Add some missing register headers for DCN401

2025-05-21 Thread Aurabindo Pillai
Add some HDCP related register headers for future use. Signed-off-by: Aurabindo Pillai --- .../include/asic_reg/dcn/dcn_4_1_0_offset.h | 26 +++ .../include/asic_reg/dcn/dcn_4_1_0_sh_mask.h | 16 2 files changed, 42 insertions(+) diff --git a/drivers/gpu/drm/amd/

Re: [PATCH V9 00/43] Color Pipeline API w/ VKMS

2025-05-21 Thread Harry Wentland
On 2025-05-17 07:51, Xaver Hugl wrote: > Am Do., 15. Mai 2025 um 22:00 Uhr schrieb Leandro Ribeiro > : >> >> >> >> On 5/15/25 15:39, Daniel Stone wrote: >>> Hi, >>> >>> On Thu, 15 May 2025 at 19:02, Harry Wentland wrote: On 2025-05-15 13:19, Daniel Stone wrote: > Yeah, the Weston patch

Re: [PATCH V8 40/43] drm/colorop: Add 3D LUT support to color pipeline

2025-05-21 Thread Harry Wentland
On 2025-05-20 16:13, Harry Wentland wrote: > > > On 2025-05-19 19:43, Simon Ser wrote: >> On Sunday, May 18th, 2025 at 00:32, Xaver Hugl wrote: >> We can always make the property mutable on drivers that support it in >>> the future, much like the zpos property. I think we should kee

Re: [PATCH] Revert "drm/amd/display: pause the workload setting in dm"

2025-05-21 Thread Alex Deucher
On Wed, May 21, 2025 at 1:12 PM Fangzhi Zuo wrote: > > This reverts commit 243678df7a058f65f5f43e8026b359bcc91e0b69. > > Reason for revert: cause corruption on Dell U3224KB DP2 display. Missing your signed-off-by. Reverting this could result in higher power usage because I think the display idle

Re: 6.15-rc6/regression/bisected - after commit f1c6be3999d2 error appeared: *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error

2025-05-21 Thread Pillai, Aurabindo
[AMD Official Use Only - AMD Internal Distribution Only] Hi Mike, Thanks for the details. We tried to repro the issue at our end on 9000 and 7000 series dgpu, but we're not seeing the dmub errors. We were on Ubunti, so we'll try Fedora. -- Regards, Jay From: M

[PATCH] Revert "drm/amd/display: pause the workload setting in dm"

2025-05-21 Thread Fangzhi Zuo
This reverts commit 243678df7a058f65f5f43e8026b359bcc91e0b69. Reason for revert: cause corruption on Dell U3224KB DP2 display. --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_crtc.c| 11 +-- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/display/amdg

Re: 6.15-rc6/regression/bisected - after commit f1c6be3999d2 error appeared: *ERROR* dc_dmub_srv_log_diagnostic_data: DMCUB error

2025-05-21 Thread Mikhail Gavrilov
On Tue, May 20, 2025 at 9:22 PM Mikhail Gavrilov wrote: > > > Could you more details about your setup, and how you were able to repro it ? > > Hi, Were you able to reproduce the issue? I’ve prepared a step-by-step guide that may help: 1. Set up a system with a Radeon 6900XT and an LG TV connecte

[PATCH] Revert "drm/amd/display: [FW Promotion] Release 0.1.11.0"

2025-05-21 Thread Aurabindo Pillai
This reverts commit 572193a6e3a842204757a6fa2944125811b29f70 since it introduces incompatbility with older firmware Signed-off-by: Aurabindo Pillai --- .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h | 34 ++- 1 file changed, 2 insertions(+), 32 deletions(-) diff --git a/drivers/g

[PATCH v10 10/10] drm/amdgpu: update trace format to match gpu_scheduler_trace

2025-05-21 Thread Pierre-Eric Pelloux-Prayer
Log fences using the same format for coherency. Signed-off-by: Pierre-Eric Pelloux-Prayer Reviewed-by: Christian König Reviewed-by: Arvind Yadav --- drivers/gpu/drm/amd/amdgpu/amdgpu_trace.h | 22 ++ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/drivers/gp

[PATCH v10 08/10] drm: Get rid of drm_sched_job.id

2025-05-21 Thread Pierre-Eric Pelloux-Prayer
Its only purpose was for trace events, but jobs can already be uniquely identified using their fence. The downside of using the fence is that it's only available after 'drm_sched_job_arm' was called which is true for all trace events that used job.id so they can safely switch to using it. Suggest

[PATCH v10 02/10] drm/sched: Store the drm client_id in drm_sched_fence

2025-05-21 Thread Pierre-Eric Pelloux-Prayer
This will be used in a later commit to trace the drm client_id in some of the gpu_scheduler trace events. This requires changing all the users of drm_sched_job_init to add an extra parameter. The newly added drm_client_id field in the drm_sched_fence is a bit of a duplicate of the owner one. One

[PATCH v10 00/10] Improve gpu_scheduler trace events + UAPI

2025-05-21 Thread Pierre-Eric Pelloux-Prayer
Hi, The initial goal of this series was to improve the drm and amdgpu trace events to be able to expose more of the inner workings of the scheduler and drivers to developers via tools. Then, the series evolved to become focused only on gpu_scheduler. The changes around vblank events will be part

[PATCH 09/10] drm/amdgpu/gfx10: enable legacy enforce isolation

2025-05-21 Thread Alex Deucher
Enable legacy enforce isolation (just serialize kernel GC submissions). This way we can reset a ring and only affect the the process currently using that ring. This mirrors what windows does. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 9 + 1 file changed, 9

[PATCH 10/10] drm/amdgpu/gfx10: adjust ring reset sequences

2025-05-21 Thread Alex Deucher
Write the fence after we reset the ring and use an IB test to validate the reset. This is safe since we have enforce isolation legacy enabled by default. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletio

[PATCH 03/10] drm/amdgpu: adjust ring reset behavior

2025-05-21 Thread Alex Deucher
The idea here is to enable enforce isolation legacy behavior for gfx 10+ and as such, we can adjust the behavior to better suite that. This aligns with how windows handles resets and seems to work reliably in my testing on GFX 10+. For older chips, or if enforce isolation is disabled, the soft rec

[PATCH 00/10] Reset improvements for GC10+

2025-05-21 Thread Alex Deucher
This set improves per queue reset support for GC10+. This enables the legacy enforce isolation behavior to serialize access to GC for kernel queues so that only one process uses the queue at a time. When we reset the queue, only that process is effected which improves the user experience when a qu

[PATCH 08/10] drm/amdgpu/gfx12: adjust ring reset sequences

2025-05-21 Thread Alex Deucher
Write the fence after we reset the ring and use an IB test to validate the reset. This is safe since we have enforce isolation legacy enabled by default. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletio

[PATCH 04/10] drm/amdgpu: add AMDGPU_QUEUE_RESET_TIMEOUT

2025-05-21 Thread Alex Deucher
Add a new define for queue reset timeout. This will be used for the IB tests used when validatating ring resets. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amd

[PATCH 05/10] drm/amdgpu/gfx11: enable legacy enforce isolation

2025-05-21 Thread Alex Deucher
Enable legacy enforce isolation (just serialize kernel GC submissions). This way we can reset a ring and only affect the the process currently using that ring. This mirrors what windows does. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 9 + 1 file changed, 9

[PATCH 01/10] Revert "drm/amd/amdgpu: add pipe1 hardware support"

2025-05-21 Thread Alex Deucher
This reverts commit b7a1a0ef12b81957584fef7b61e2d5ec049c7209. A user reported stuttering under heavy gfx load with this commit. I suspect it's due to the fact that the gfx contexts are shared between the pipes so if there is alot of load on one pipe, we could end up stalling waiting for a context.

[PATCH 06/10] drm/amdgpu/gfx11: adjust ring reset sequences

2025-05-21 Thread Alex Deucher
Write the fence after we reset the ring and use an IB test to validate the reset. This is safe since we have enforce isolation legacy enabled by default. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 20 +--- drivers/gpu/drm/amd/amdgpu/nvd.h | 1

[PATCH 07/10] drm/amdgpu/gfx12: enable legacy enforce isolation

2025-05-21 Thread Alex Deucher
Enable legacy enforce isolation (just serialize kernel GC submissions). This way we can reset a ring and only affect the the process currently using that ring. This mirrors what windows does. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 9 + 1 file changed, 9

[PATCH 02/10] drm/amdgpu: rework queue reset scheduler interaction

2025-05-21 Thread Alex Deucher
From: Christian König Stopping the scheduler for queue reset is generally a good idea because it prevents any worker from touching the ring buffer. But using amdgpu_fence_driver_force_completion() before restarting it was a really bad idea because it marked fences as failed while the work was po

Re: [PATCH] Revert "drm/amd/display: [FW Promotion] Release 0.1.11.0"

2025-05-21 Thread Alex Deucher
On Wed, May 21, 2025 at 10:25 AM Aurabindo Pillai wrote: > > This reverts commit 572193a6e3a842204757a6fa2944125811b29f70 since it > introduces incompatbility with older firmware > > Signed-off-by: Aurabindo Pillai Acked-by: Alex Deucher > --- > .../gpu/drm/amd/display/dmub/inc/dmub_cmd.h |

Re: [PATCH 2/8] drm/amdgpu: rework queue reset scheduler interaction

2025-05-21 Thread Alex Deucher
On Wed, May 21, 2025 at 5:03 AM Christian König wrote: > > On 5/20/25 18:38, Alex Deucher wrote: > > On Tue, May 20, 2025 at 9:49 AM Christian König > > wrote: > >> > >> On 5/20/25 15:09, Alex Deucher wrote: > >>> On Mon, May 19, 2025 at 2:30 PM Alex Deucher > >>> wrote: > > From: Chr

Re: [PATCH v1 1/7] drm/amdgpu: make devcoredump reading fast

2025-05-21 Thread Christian König
On 5/21/25 11:49, Pierre-Eric Pelloux-Prayer wrote: > Update the way drm_coredump_printer is used based on its documentation > and Xe's code: the main idea is to generate the final version in one go > and then use memcpy to return the chunks requested by the caller of > amdgpu_devcoredump_read. >

Re: [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-05-21 Thread Christian König
On 5/21/25 13:55, Zhang, GuoQing (Sam) wrote: > > On 2025/5/21 16:06, Christian König wrote: >> On 5/20/25 07:10, Zhang, GuoQing (Sam) wrote: > +    if (amdgpu_virt_xgmi_migrate_enabled(adev)) { > +    /* set mc->vram_start to 0 to switch the returned GPU > address of > + 

Re: [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-05-21 Thread Zhang, GuoQing (Sam)
On 2025/5/21 16:06, Christian König wrote: > On 5/20/25 07:10, Zhang, GuoQing (Sam) wrote: +if (amdgpu_virt_xgmi_migrate_enabled(adev)) { +/* set mc->vram_start to 0 to switch the returned GPU address of + * amdgpu_bo_create_reserved() from FB apert

Re: [PATCH v1 5/7] drm/amdgpu: port some debugfs function to drm_printer

2025-05-21 Thread Christian König
On 5/21/25 11:49, Pierre-Eric Pelloux-Prayer wrote: > Using the drm_printer interface will allow us to use these functions > when generating the coredump. This change in general is harmless, but you must be extremely careful to not grab locks in the core dump which somebody could hold while wait

Re: [PATCH v1 2/7] drm/amdgpu: don't report stale vm_fault info in devcoredump

2025-05-21 Thread Christian König
On 5/21/25 11:49, Pierre-Eric Pelloux-Prayer wrote: > The coredump needs to contain accurate data and reporting a page > fault from a previous issue is incorrect. > > Signed-off-by: Pierre-Eric Pelloux-Prayer > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 13 - > drivers/

Re: [PATCH v4] PCI: Prevent power state transition of erroneous device

2025-05-21 Thread Rafael J. Wysocki
On Wed, May 21, 2025 at 10:54 AM Raag Jadav wrote: > > On Tue, May 20, 2025 at 01:56:28PM -0500, Mario Limonciello wrote: > > On 5/20/2025 1:42 PM, Raag Jadav wrote: > > > On Tue, May 20, 2025 at 12:39:12PM -0500, Mario Limonciello wrote: > > > > On 5/20/2025 12:22 PM, Denis Benato wrote: > > > >

[PATCH] amd/amdkfd: fix a kfd_process ref leak

2025-05-21 Thread Yifan Zhang
This patch is to fix a kfd_prcess ref leak. Signed-off-by: Yifan Zhang --- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_events.c b/drivers/gpu/drm/amd/amdkfd/kfd_events.c index e54e708ed82d..2b294ada3ec0 100644 --- a/

[PATCH v1 6/7] drm/amdgpu: add VA ranges to amdgpu_bo_print_info output

2025-05-21 Thread Pierre-Eric Pelloux-Prayer
This information is useful when investigating page faults. Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c| 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 25 +- drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 3 ++- drivers/gpu/dr

[PATCH v1 7/7] drm/amdgpu: include BO dump into coredump

2025-05-21 Thread Pierre-Eric Pelloux-Prayer
It can be useful when getting a page fault. Signed-off-by: Pierre-Eric Pelloux-Prayer --- .../gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 36 +++ .../gpu/drm/amd/amdgpu/amdgpu_dev_coredump.h | 1 + 2 files changed, 37 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgp

[PATCH v1 5/7] drm/amdgpu: port some debugfs function to drm_printer

2025-05-21 Thread Pierre-Eric Pelloux-Prayer
Using the drm_printer interface will allow us to use these functions when generating the coredump. Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 5 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 5 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c

[PATCH v1 4/7] Revert "drm/amdgpu: add the evf attached gem obj resv dump"

2025-05-21 Thread Pierre-Eric Pelloux-Prayer
This reverts commit b75818d0d28f1e06ca396cc2b8a38601b44c4788. dma_resv_describe outputs new lines so it breaks the "one line per BO" formatting. Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 6 +- 1 file changed, 1 insertion(+), 5 deletions(-) d

[PATCH v1 2/7] drm/amdgpu: don't report stale vm_fault info in devcoredump

2025-05-21 Thread Pierre-Eric Pelloux-Prayer
The coredump needs to contain accurate data and reporting a page fault from a previous issue is incorrect. Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 13 - drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.h | 1 + drivers/gpu/drm/a

[PATCH v1 3/7] drm/amdgpu: always keep the latest coredump

2025-05-21 Thread Pierre-Eric Pelloux-Prayer
Coredumps are automatically removed after 5 minutes, but if a new one is created while one exists already, the new one is discarded silently. Signed-off-by: Pierre-Eric Pelloux-Prayer --- drivers/gpu/drm/amd/amdgpu/amdgpu_dev_coredump.c | 5 + 1 file changed, 5 insertions(+) diff --git a/dr

[PATCH v1 1/7] drm/amdgpu: make devcoredump reading fast

2025-05-21 Thread Pierre-Eric Pelloux-Prayer
Update the way drm_coredump_printer is used based on its documentation and Xe's code: the main idea is to generate the final version in one go and then use memcpy to return the chunks requested by the caller of amdgpu_devcoredump_read. This cuts the time to copy the dump from 40s to ~0s on my mach

Re: [PATCH 2/8] drm/amdgpu: rework queue reset scheduler interaction

2025-05-21 Thread Christian König
On 5/20/25 18:38, Alex Deucher wrote: > On Tue, May 20, 2025 at 9:49 AM Christian König > wrote: >> >> On 5/20/25 15:09, Alex Deucher wrote: >>> On Mon, May 19, 2025 at 2:30 PM Alex Deucher >>> wrote: From: Christian König Stopping the scheduler for queue reset is generally

Re: [PATCH 1/3] drm/sched: add drm_sched_prealloc_dependency_slots v3

2025-05-21 Thread Christian König
On 5/15/25 18:17, Tvrtko Ursulin wrote: > > On 15/05/2025 16:00, Christian König wrote: >> Sometimes drivers need to be able to submit multiple jobs which depend on >> each other to different schedulers at the same time, but using >> drm_sched_job_add_dependency() can't fail any more after the fir

Re: [PATCH 1/3] drm/sched: add drm_sched_prealloc_dependency_slots v3

2025-05-21 Thread Christian König
Sorry for the delayed reply. On 5/19/25 11:04, Philipp Stanner wrote: >>> > > Also, if someone preallocates and does not consume the > slot > will that > confuse the iteration in drm_sched_job_dependency()? No it doesn't. The xarray is filtering NULL an

[PATCH v5 3/3] drm/amdgpu: Make use of drm_wedge_task_info

2025-05-21 Thread André Almeida
To notify userspace about which task (if any) made the device get in a wedge state, make use of drm_wedge_task_info parameter, filling it with the task PID and name. Signed-off-by: André Almeida --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +-- drivers/gpu/drm/amd/amdgpu/a

[PATCH v5 2/3] drm/doc: Add a section about "Task information" for the wedge API

2025-05-21 Thread André Almeida
Add a section about "Task information" for the wedge API. Reviewed-by: Krzysztof Karas Reviewed-by: Raag Jadav Signed-off-by: André Almeida --- v5: - Change app to task in the text as well v4: - Change APP to TASK v3: - Change "app that caused ..." to "app involved ..." - Clarify that devco

[PATCH v5 1/3] drm: Create a task info option for wedge events

2025-05-21 Thread André Almeida
When a device get wedged, it might be caused by a guilty application. For userspace, knowing which task was the cause can be useful for some situations, like for implementing a policy, logs or for giving a chance for the compositor to let the user know what task caused the problem. This is an optio

Re: [PATCH 1/3] drm/sched: add drm_sched_prealloc_dependency_slots v3

2025-05-21 Thread Philipp Stanner
On Tue, 2025-05-20 at 17:15 +0100, Tvrtko Ursulin wrote: > > On 19/05/2025 10:04, Philipp Stanner wrote: > > On Mon, 2025-05-19 at 09:51 +0100, Tvrtko Ursulin wrote: > > > > > > On 16/05/2025 18:16, Philipp Stanner wrote: > > > > On Fri, 2025-05-16 at 15:30 +0100, Tvrtko Ursulin wrote: > > > > >

[PATCH v5 0/3] drm: Create a tas info option for wedge events

2025-05-21 Thread André Almeida
This patchset implements a request made by Xaver Hugl about wedge events: "I'd really like to have the PID of the client that triggered the GPU reset, so that we can kill it if multiple resets are triggered in a row (or switch to software rendering if it's KWin itself) and show a user-friendly not

Re: [PATCH v4 1/3] drm: Create an app info option for wedge events

2025-05-21 Thread Raag Jadav
On Mon, May 19, 2025 at 07:03:30PM -0300, André Almeida wrote: > When a device get wedged, it might be caused by a guilty application. > For userspace, knowing which app was the cause can be useful for some > situations, like for implementing a policy, logs or for giving a chance > for the composit

Re: [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-05-21 Thread Christian König
On 5/20/25 07:10, Zhang, GuoQing (Sam) wrote: >>> +    if (amdgpu_virt_xgmi_migrate_enabled(adev)) { >>> +    /* set mc->vram_start to 0 to switch the returned GPU address >>> of >>> + * amdgpu_bo_create_reserved() from FB aperture to GART >>> aperture. >>> + */ >>

RE: [PATCH v6 3/4] drm/amdgpu: enable pdb0 for hibernation on SRIOV

2025-05-21 Thread Zhang, Owen(SRDC)
[AMD Official Use Only - AMD Internal Distribution Only] Ping... @Koenig, Christian kindly pls review and feedback... thanks you very much! Rgds/Owen From: Zhang, GuoQing (Sam) Sent: Tuesday, May 20, 2025 1:11 PM To: Koenig, Christian ; Zhang, GuoQing (Sam) ;

Re: [PATCH] drm/amd/display: Add a new dcdebugmask to allow skip detection LT

2025-05-21 Thread Chung, ChiaHsuan (Tom)
Patch looks good to me. Just some nitpick may need in commit messages. dcdebugmas -> dcdebugmask Reviewed-by: Tom Chung On 5/21/2025 2:39 PM, Wayne Lin wrote: Under specific embedded scenarios, we might still use DP interface rather than eDP interface. Under such case, detection link training

Re: [PATCH 1/3] drm/amdgpu: seq64 memory unmap uses uninterruptible lock

2025-05-21 Thread Christian König
On 5/14/25 19:10, Philip Yang wrote: > To unmap and free seq64 memory when drm node close to free vm, if there > is signal accepted, then taking vm lock failed and leaking seq64 va > mapping, and then dmesg has error log "still active bo inside vm". > > Change to use uninterruptible lock fix th