RE: [PATCH] drm/amdgpu: fix random data corruption for sdma 7

2024-10-15 Thread Min, Frank
[AMD Official Use Only - AMD Internal Distribution Only] Hi Lijo, Thanks a lot for your review and suggestions. Here are the updated patches. Best Regards, Frank From: Frank Min Date: Thu, 10 Oct 2024 16:41:32 +0800 Subject: [PATCH 1/2] drm/amdgpu: fix random data corruption for sdma 7 There

[PATCH v4 3/3] drm/amdgpu/sdma4.4.2: implement ring reset callback for sdma4.4.2

2024-10-15 Thread jiadong.zhu
From: Jiadong Zhu Implement sdma queue reset callback via SMU interface. v2: Leverage inst_stop/start functions in reset sequence. Use GET_INST for physical SDMA instance. Disable apu for sdma reset. v3: Rephrase error prints. v4: Remove redundant prints. Remove setting PREEMPT registers

[PATCH v4 2/3] drm/amd/pm: implement dpm sdma reset function

2024-10-15 Thread jiadong.zhu
From: Jiadong Zhu Implement sdma soft reset by sending MSG_ResetSDMA on smu 13.0.6. v2: Add firmware version for the reset message. v3: Add ip version check. Print inst_mask on failure. Signed-off-by: Jiadong Zhu --- drivers/gpu/drm/amd/pm/amdgpu_dpm.c | 15 drivers/gpu

[PATCH v4 1/3] drm/amd/pm: update smu_v13_0_6 smu header

2024-10-15 Thread jiadong.zhu
From: Jiadong Zhu update smu header for sdma soft reset. Signed-off-by: Jiadong Zhu --- drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/pmfw_if/smu_v13_0_6_ppsmc.h b/drivers/gpu

Re: [PATCH] drm/amdgpu: fix random data corruption for sdma 7

2024-10-15 Thread Lazar, Lijo
On 10/16/2024 8:29 AM, Min, Frank wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > [AMD Official Use Only - AMD Internal Distribution Only] > > From: Frank Min > > There is random data corruption caused by const fill, this is caused by write > compression mode not corre

[PATCH] drm/amdgpu: fix random data corruption for sdma 7

2024-10-15 Thread Min, Frank
[AMD Official Use Only - AMD Internal Distribution Only] From: Frank Min There is random data corruption caused by const fill, this is caused by write compression mode not correclt configured. So correct compression mode for const fill. Signed-off-by: Frank Min --- drivers/gpu/drm/amd/amdgp

RE: [PATCH] Revert "drm/amdgpu/gfx9: put queue resets behind a debug option"

2024-10-15 Thread Zhu, Jiadong
[AMD Official Use Only - AMD Internal Distribution Only] Acked-by: Jiadong Zhu > -Original Message- > From: Deucher, Alexander > Sent: Wednesday, October 16, 2024 4:00 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Kim, Jonathan > ; Zhu, Jiadong > Subject: [PATCH] R

Re: [PATCH v4 09/09] drm/amdgpu: Add separate array of read and write for BO handles

2024-10-15 Thread Marek Olšák
On Tue, Oct 15, 2024 at 3:58 AM Arunpravin Paneer Selvam wrote: > > Drop AMDGPU_USERQ_BO_WRITE as this should not be a global option > of the IOCTL, It should be option per buffer. Hence adding separate > array for read and write BO handles. > > Signed-off-by: Arunpravin Paneer Selvam > Acked-by:

[PATCH v2] drm/amdkfd: Add kfd driver function to support hot plug/unplug amdgpu devices

2024-10-15 Thread Xiaogang . Chen
From: Xiaogang Chen The purpose of this patch is having kfd driver function as expected during AMD gpu device plug/unplug. When an AMD gpu device got unplug kfd driver stops all queues from this device. If there are user processes still ref the render node this device is marked as invalid. kfd d

[PATCH] Revert "drm/amdgpu/gfx9: put queue resets behind a debug option"

2024-10-15 Thread Alex Deucher
This reverts commit 7c1a2d8aba6cadde0cc542b2d805edc0be667e79. Extended validation has completed successfully, so enable these features by default. Signed-off-by: Alex Deucher Cc: Jonathan Kim Cc: Jiadong Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 4 drivers/gpu/drm/amd/a

Re: [PATCH] drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts for vega20_ih

2024-10-15 Thread Alex Deucher
On Tue, Oct 15, 2024 at 2:23 PM Victor Lu wrote: > > Port this change to vega20_ih.c: > "89ae318001e5 drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts" Might be helpful to quote the commit message here just so it's clear why that change is needed. With that, the patch is: Reviewed-by:

[PATCH] drm/amdgpu: add ring reset messages

2024-10-15 Thread Alex Deucher
Add messages to make it clear when a per ring reset happens. This is helpful for debugging and aligns with other reset methods. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c

[PATCH] drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts for vega20_ih

2024-10-15 Thread Victor Lu
Port this change to vega20_ih.c: "89ae318001e5 drm/amdgpu: clear RB_OVERFLOW bit when enabling interrupts" Signed-off-by: Victor Lu --- drivers/gpu/drm/amd/amdgpu/vega20_ih.c | 27 ++ 1 file changed, 27 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/vega20_ih.c b

RE: [PATCH] drm/amdkfd: add/remove kfd queues through on stop/start KFD scheduling

2024-10-15 Thread Liu, Shaoyun
[AMD Official Use Only - AMD Internal Distribution Only] Ping -Original Message- From: Liu, Shaoyun Sent: Friday, October 4, 2024 12:08 PM To: amd-gfx@lists.freedesktop.org Cc: Liu, Shaoyun Subject: [PATCH] drm/amdkfd: add/remove kfd queues through on stop/start KFD scheduling Add bac

[PATCH] drm/amd/display: Add debug option to disable idle optimizations

2024-10-15 Thread Aurabindo Pillai
For debugging purposes, add a runtime override to disable display scanout from MALL cache (MALL Static Screen) by disallowing the driver from triggering the idle power optimizations when desktop is idle. Signed-off-by: Aurabindo Pillai --- drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 4 ++

Re: [PATCH] drm/amdgpu: set MES GFX HQD mask

2024-10-15 Thread Sharma, Shashank
On 15/10/2024 18:50, Alex Deucher wrote: On Tue, Oct 15, 2024 at 12:33 PM Shashank Sharma wrote: This patch sets MES HQD mask to setup GFX queues for MES and KIQ operations. We are using one queue each for KIQ operations, and setting rest of the queues for MES scheduling. This also fixes a r

Re: [PATCH] drm/amdgpu: set MES GFX HQD mask

2024-10-15 Thread Alex Deucher
On Tue, Oct 15, 2024 at 12:33 PM Shashank Sharma wrote: > > This patch sets MES HQD mask to setup GFX queues for MES and KIQ > operations. We are using one queue each for KIQ operations, and > setting rest of the queues for MES scheduling. > > This also fixes a regression for missing Navi 4x MES m

Re: [PATCH] drm/amdgpu: enable userqueue support for GFX12

2024-10-15 Thread Sharma, Shashank
On 15/10/2024 16:58, Alex Deucher wrote: On Tue, Oct 15, 2024 at 6:13 AM Sharma, Shashank wrote: Hello Alex, On 14/10/2024 22:29, Deucher, Alexander wrote: [AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: Sharma, Shashank Sent: Thursday, October 10

[PATCH] drm/amdgpu: set MES GFX HQD mask

2024-10-15 Thread Shashank Sharma
This patch sets MES HQD mask to setup GFX queues for MES and KIQ operations. We are using one queue each for KIQ operations, and setting rest of the queues for MES scheduling. This also fixes a regression for missing Navi 4x MES mask from usermode queue series. V2: Rebase on staging, accommodate

Re: [PATCH] drm/amdgpu: enable userqueue support for GFX12

2024-10-15 Thread Alex Deucher
On Tue, Oct 15, 2024 at 6:13 AM Sharma, Shashank wrote: > > Hello Alex, > > On 14/10/2024 22:29, Deucher, Alexander wrote: > > [AMD Official Use Only - AMD Internal Distribution Only] > > -Original Message- > From: Sharma, Shashank > Sent: Thursday, October 10, 2024 2:08 PM > To: amd-gfx@

Re: [PATCH] drm/amd: Guard against bad data for ATIF ACPI method

2024-10-15 Thread Alex Deucher
On Fri, Oct 11, 2024 at 1:33 PM Mario Limonciello wrote: > > If a BIOS provides bad data in response to an ATIF method call > this causes a NULL pointer dereference in the caller. > > ``` > ? show_regs (arch/x86/kernel/dumpstack.c:478 (discriminator 1)) > ? __die (arch/x86/kernel/dumpstack.c:423 a

RE: [PATCH 00/16] DC Patches October 9, 2024

2024-10-15 Thread Wheeler, Daniel
[Public] Hi all, This week this patchset was tested on the following systems: * Lenovo ThinkBook T13s Gen4 with AMD Ryzen 5 6600U * MSI Gaming X Trio RX 6800 * Gigabyte Gaming OC RX 7900 XTX These systems were tested on the following display/connection types: * eD

Re: [PATCH 01/10] drm/amd/display: temp w/a for dGPU to enter idle optimizations

2024-10-15 Thread Mario Limonciello
On 10/15/2024 03:17, Wayne Lin wrote: From: Aurabindo Pillai [Why&How] vblank immediate disable currently does not work for all asics. On DCN401, the vblank interrupts never stop coming, and hence we never get a chance to trigger idle optimizations. Add a workaround to enable immediate disable

Re: [PATCH] drm/amdgpu: enable userqueue support for GFX12

2024-10-15 Thread Sharma, Shashank
Hello Alex, On 14/10/2024 22:29, Deucher, Alexander wrote: [AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: Sharma, Shashank Sent: Thursday, October 10, 2024 2:08 PM To:amd-gfx@lists.freedesktop.org Cc: Somalapuram, Amaranath; Deucher, Alexander; Koenig

[PATCH 10/10] drm/amd/display: 3.2.306

2024-10-15 Thread Wayne Lin
From: Aric Cyr This version brings along following fixes: - Fix dcn401 idle optimization problem - Fix cursor corruption on dcn35 - Fix DP LL compliance failures - Fix SubVP Phantom VBlank End calculation Acked-by: Tom Chung Signed-off-by: Aric Cyr Signed-off-by: Wayne Lin --- drivers/gpu/dr

[PATCH 09/10] drm/amd/display: To change dcn301_init.h guard.

2024-10-15 Thread Wayne Lin
From: Bhuvanachandra Pinninti [why & How] The original guard is wrongly to be set as for dcn30. Changed it from 30 to 301. Reviewed-by: Dillon Varone Signed-off-by: Bhuvanachandra Pinninti Signed-off-by: Wayne Lin --- drivers/gpu/drm/amd/display/dc/hwss/dcn301/dcn301_init.h | 4 ++-- 1 file

[PATCH 08/10] drm/amd/display: update fullscreen status to SPL

2024-10-15 Thread Wayne Lin
From: Samson Tam [Why] Current fullscreen check in SPL using dm_helpers is out-of-sync with dc state. This causes an issue during minimal transition where we pick an invalid intermediate state because the pre and post fullscreen status are different. [How] Add sharpening_required flag to dc_stre

[PATCH 07/10] drm/amd/display: Add a Precise Delay Routine

2024-10-15 Thread Wayne Lin
From: Fangzhi Zuo Fix DP compliance failures 4.2.2.12, 4.3.1.21, 4.9.1.19 caused by imprecise delay on fsleep(). Reviewed-by: Aric Cyr Signed-off-by: Fangzhi Zuo Signed-off-by: Wayne Lin --- .../gpu/drm/amd/display/dc/link/protocols/link_dp_training.c| 2 +- 1 file changed, 1 insertion(+

[PATCH 06/10] drm/amd/display: Recalculate SubVP Phantom VBlank End in dml21

2024-10-15 Thread Wayne Lin
From: Dillon Varone [WHY] The phantom stream timing is copied from the main stream as most parameters are identical, however some need to be recalculated. Currently VBlank End is not recalculated and copied from the main incorrectly. [HOW] Recalculate VBlank End for phantom stream timing. Revie

[PATCH 05/10] drm/amd/display: temp w/a for DP Link Layer compliance

2024-10-15 Thread Wayne Lin
From: Aurabindo Pillai [Why&How] Disabling P-State support on full updates for DCN401 results in introducing additional communication with SMU. A UCLK hard min message to SMU takes 4 seconds to go through, which was due to DCN not allowing pstate switch, which was caused by incorrect value for TT

[PATCH 04/10] drm/amd/display: Adding array index check to prevent memory corruption

2024-10-15 Thread Wayne Lin
From: Leo Chen [Why & How] Array indices out of bound caused memory corruption. Adding checks to ensure that array index stays in bound. Reviewed-by: Charlene Liu Reviewed-by: Nicholas Kazlauskas Signed-off-by: Leo Chen Signed-off-by: Wayne Lin --- .../gpu/drm/amd/display/dc/clk_mgr/dcn35/d

[PATCH 03/10] drm/amd/display: Reuse subvp enable check for DCN401

2024-10-15 Thread Wayne Lin
From: Aurabindo Pillai Reuse subvp enable check from DCN32 for IGT testing of Sub-Viewport feature on DCN401 Reviewed-by: Rodrigo Siqueira Signed-off-by: Aurabindo Pillai Signed-off-by: Wayne Lin --- .../gpu/drm/amd/display/dc/resource/dcn401/dcn401_resource.c | 3 ++- 1 file changed, 2 in

[PATCH 02/10] drm/amd/display: w/a to program DISPCLK_R_GATE_DISABLE DCN35

2024-10-15 Thread Wayne Lin
From: Yihan Zhu [WHY & HOW] Cursor corruption observed on USBC display with specific system setup with a reboot. Cursor memory might still in the lightsleep state due to voltage issue, we need program DISPCLK_R_GATE_DISABLE to avoid this issue only on DCN35. Reviewed-by: Nicholas Kazlauskas Sig

[PATCH 01/10] drm/amd/display: temp w/a for dGPU to enter idle optimizations

2024-10-15 Thread Wayne Lin
From: Aurabindo Pillai [Why&How] vblank immediate disable currently does not work for all asics. On DCN401, the vblank interrupts never stop coming, and hence we never get a chance to trigger idle optimizations. Add a workaround to enable immediate disable only on APUs for now. This adds a 2-fra

[PATCH 00/10] DC Patches October 14, 2024

2024-10-15 Thread Wayne Lin
This DC patchset brings improvements in multiple areas. In summary, we have: - Fix dcn401 idle optimization problem - Fix cursor corruption on dcn35 - Fix DP LL compliance failures - Fix SubVP Phantom VBlank End calculation Cc: Daniel Wheeler --- Aric Cyr (1): drm/amd/display: 3.2.306 Aura

[PATCHES] drm/radeon issues

2024-10-15 Thread Christian Zigotzky
On 14 October 2024 at 3:11pm, Christian Zigotzky wrote: >> On 14 October 2024 at 3:00pm, Alex Deucher wrote: >> >> Can whoever wrote this send it out as a proper patch? >> >> Alex >> > Patch source: https://lists.freedesktop.org/archives/dri-devel/2024-October/473314.html + ville.syrjala

Re: [PATCH] drm/amd/display: Disable PSR-SU on Parade 08-01 TCON too

2024-10-15 Thread Stuart
> Is this on a mainline 6.11.y or 6.12-rc3 kernel? Can you please open up a > new issue with all the details? You can ping it back here. Currently a Debian 6.11.2 kernel, but I did reproduce it with a mainline 6.10 and earlier versions in the past. Issue link: https://gitlab.freedesktop.org/dr

[PATCHES] drm/radeon issues

2024-10-15 Thread Christian Zigotzky
On 14 October 2024 at 3:00pm, Alex Deucher wrote: Can whoever wrote this send it out as a proper patch? Alex Patch source: https://lists.freedesktop.org/archives/dri-devel/2024-October/473314.html

[PATCH v4 09/09] drm/amdgpu: Add separate array of read and write for BO handles

2024-10-15 Thread Arunpravin Paneer Selvam
Drop AMDGPU_USERQ_BO_WRITE as this should not be a global option of the IOCTL, It should be option per buffer. Hence adding separate array for read and write BO handles. Signed-off-by: Arunpravin Paneer Selvam Acked-by: Christian König Suggested-by: Marek Olšák Suggested-by: Christian König --

[PATCH v4 08/09] drm/amdgpu: add vm root BO lock before accessing the vm

2024-10-15 Thread Arunpravin Paneer Selvam
Add a vm root BO lock before accessing the userqueue VM. v1:(Christian) - Keep the VM locked until you are done with the mapping. - Grab a temporary BO reference, drop the VM lock and acquire the BO. When you are done with everything just drop the BO lock and then the temporary BO

[PATCH v4 07/09] drm/amdgpu: Add the missing error handling for xa_store() call

2024-10-15 Thread Arunpravin Paneer Selvam
Add the missing error handling for xa_store() call in the function amdgpu_userq_fence_driver_alloc(). Signed-off-by: Arunpravin Paneer Selvam Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_userq_fence.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git

[PATCH v4 06/09] drm/amdgpu: Few optimization and fixes for userq fence driver

2024-10-15 Thread Arunpravin Paneer Selvam
Few optimization and fixes for userq fence driver. v1:(Christian): - Remove unnecessary comments. - In drm_exec_init call give num_bo_handles as last parameter it would making allocation of the array more efficient - Handle return value of __xa_store() and improve the error handling of

[PATCH v4 05/09] drm/amdgpu: Remove the MES self test

2024-10-15 Thread Arunpravin Paneer Selvam
Remove MES self test as this conflicts the userqueue fence interrupts. v2:(Christian) - remove the amdgpu_mes_self_test() function and any now unused code. Signed-off-by: Arunpravin Paneer Selvam Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 3 - drivers/gpu/dr

[PATCH v4 01/09] drm/amdgpu: Implement userqueue signal/wait IOCTL

2024-10-15 Thread Arunpravin Paneer Selvam
This patch introduces new IOCTL for userqueue secure semaphore. The signal IOCTL called from userspace application creates a drm syncobj and array of bo GEM handles and passed in as parameter to the driver to install the fence into it. The wait IOCTL gets an array of drm syncobjs, finds the fence

[PATCH v4 04/09] drm/amdgpu: Enable userq fence interrupt support

2024-10-15 Thread Arunpravin Paneer Selvam
Add support to handle the userqueue protected fence signal hardware interrupt. Create a xarray which maps the doorbell index to the fence driver address. This would help to retrieve the fence driver information when an userq fence interrupt is triggered. Firmware sends the doorbell offset value an

[PATCH v4 02/09] drm/amdgpu: screen freeze and userq driver crash

2024-10-15 Thread Arunpravin Paneer Selvam
Screen freeze and userq fence driver crash while playing Xonotic v2: (Christian) - There is change that fence might signal in between testing and grabbing the lock. Hence we can move the lock above the if..else check and use the dma_fence_is_signaled_locked(). Signed-off-by: Arunp

[PATCH v4 03/09] drm/amdgpu: Add wait IOCTL timeline syncobj support

2024-10-15 Thread Arunpravin Paneer Selvam
Add user fence wait IOCTL timeline syncobj support. v2:(Christian) - handle dma_fence_wait() return value. - shorten the variable name syncobj_timeline_points a bit. - move num_points up to avoid padding issues. v3:(Christian) - Handle timeline drm_syncobj_find_fence() call error hand

Re: [PATCH v3 3/3] drm/amdgpu/sdma4.4.2: implement ring reset callback for sdma4.4.2

2024-10-15 Thread Lazar, Lijo
On 10/15/2024 1:08 AM, Deucher, Alexander wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > >> -Original Message- >> From: Zhu, Jiadong >> Sent: Wednesday, October 9, 2024 5:23 AM >> To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org >> Cc: Deucher, Alexander >> Subject