AW: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-22 Thread Koenig, Christian
[AMD Official Use Only - AMD Internal Distribution Only] Sorry if I mangled the reply. AMDs mail servers seem to have a hickup this morning and I have to use OWA. Von: Matthew Brost Gesendet: Freitag, 19. Juli 2024 19:39 An: Christian König Cc: Tvrtko Ur

[PATCH] drm/amdgpu: Fix gfx10 kiq ring_lock warning on full reset

2024-07-22 Thread Jesse Zhang
Fix warning about kiq ring. Unlock kiq ring when queue reset fails. [ 285.999224] amdgpu :03:00.0: amdgpu: GPU reset begin! [ 312.018425] watchdog: BUG: soft lockup - CPU#11 stuck for 26s! [kworker/u64:2:878] [ 312.018428] Modules linked in: amdgpu(E) amdxcp drm_exec gpu_sched drm_buddy d

Re: [PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-22 Thread Christian König
Thx, but in that case this patch here then won't help either it just mitigates the problem. Can you try to reduce num_hw_submission for the MES ring? Thanks, Christian. Am 22.07.24 um 05:27 schrieb Xiao, Jack: [AMD Official Use Only - AMD Internal Distribution Only] />> I think we rather n

[PATCH] drm/scheduler: remove full_recover from drm_sched_start

2024-07-22 Thread Christian König
This was basically just another one of amdgpus hacks. The parameter allowed to restart the scheduler without turning fence signaling on again. That this is absolutely not a good idea should be obvious by now since the fences will then just sit there and never signal. While at it cleanup the code

Re: 6.10/bisected/regression - Since commit e356d321d024 in the kernel log appears the message "MES failed to respond to msg=MISC (WAIT_REG_MEM)" which were never seen before

2024-07-22 Thread Christian König
That's a known issue and we are already working on it. Regards, Christian. Am 20.07.24 um 19:08 schrieb Mikhail Gavrilov: Hi, I spotted "MES failed to respond to msg=MISC (WAIT_REG_MEM)" messages in my kernel log since 6.10-rc5. After this message, usually follow "[drm:amdgpu_mes_reg_write_reg_

Re: [PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-22 Thread Xiao, Jack
[AMD Official Use Only - AMD Internal Distribution Only] >> Can you try to reduce num_hw_submission for the MES ring? Smaller num_hw_submission should not help for this issue, for Mes work without drm scheduler like legacy kiq. Smaller num_hw_submission will result in smaller mes ring size and

[PATCH v2] drm/amd/display: Add NULL check for clk_mgr and clk_mgr->funcs in dcn30_init_hw

2024-07-22 Thread Srinivasan Shanmugam
This commit addresses a potential null pointer dereference issue in the `dcn30_init_hw` function. The issue could occur when `dc->clk_mgr` or `dc->clk_mgr->funcs` is null. The fix adds a check to ensure `dc->clk_mgr` and `dc->clk_mgr->funcs` is not null before accessing its functions. This prevent

[PATCH v2] drm/amd/display: Add NULL check for clk_mgr in dcn32_init_hw

2024-07-22 Thread Srinivasan Shanmugam
This commit addresses a potential null pointer dereference issue in the `dcn32_init_hw` function. The issue could occur when `dc->clk_mgr` is null. The fix adds a check to ensure `dc->clk_mgr` is not null before accessing its functions. This prevents a potential null pointer dereference. Reported

Re: [PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-22 Thread Christian König
What I meant is that the MES ring is now to small for the number of packets written to it. Either reduce the num_hw_submission or increase the MES ring size. E.g. see this code here in amdgpu_ring_init:     if (ring->funcs->type == AMDGPU_RING_TYPE_KIQ)     sched_hw_submission

[PATCH v2] drm/amd/display: Add NULL check for clk_mgr and clk_mgr->funcs in dcn401_init_hw

2024-07-22 Thread Srinivasan Shanmugam
This commit addresses a potential null pointer dereference issue in the `dcn401_init_hw` function. The issue could occur when `dc->clk_mgr` or `dc->clk_mgr->funcs` is null. The fix adds a check to ensure `dc->clk_mgr` and `dc->clk_mgr->funcs` is not null before accessing its functions. This preven

Re: [PATCH] drm/buddy: Add start address support to trim function

2024-07-22 Thread Paneer Selvam, Arunpravin
Hi Matthew, On 7/19/2024 4:01 PM, Matthew Auld wrote: On 17/07/2024 16:02, Paneer Selvam, Arunpravin wrote: On 7/16/2024 3:34 PM, Matthew Auld wrote: On 16/07/2024 10:50, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 7/10/2024 6:20 PM, Matthew Auld wrote: On 10/07/2024 07:03, Paneer Sel

[PATCH v2] drm/amd/display: Add null check for set_output_gamma in dcn30_set_output_transfer_func

2024-07-22 Thread Srinivasan Shanmugam
This commit adds a null check for the set_output_gamma function pointer in the dcn30_set_output_transfer_func function. Previously, set_output_gamma was being checked for nullity at line 386, but then it was being dereferenced without any nullity check at line 401. This could potentially lead to a

Re: [PATCH AUTOSEL 6.9 11/22] drm/amd/display: Reset freesync config before update new state

2024-07-22 Thread Sasha Levin
On Tue, Jul 16, 2024 at 10:48:03AM -0400, Hamza Mahfooz wrote: Hi Sasha, On 7/16/24 10:24, Sasha Levin wrote: From: Tom Chung [ Upstream commit 6b8487cdf9fc7bae707519ac5b5daeca18d1e85b ] [Why] Sometimes the new_crtc_state->vrr_infopacket did not sync up with the current state. It will affect

Re: [PATCH v4 3/6] drm/i915: Make I2C terminology more inclusive

2024-07-22 Thread Andi Shyti
Hi Easwar, merged to drm-intel-next. Thanks! On Thu, Jul 11, 2024 at 05:27:31AM +, Easwar Hariharan wrote: > I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave" > with more appropriate terms. Inspired by Wolfram's series to fix drivers/i2c/, > fix the terminology for

Re: [PATCH v2 0/3] drm: backlight quirk infrastructure and lower minimum for Framework AMD 13

2024-07-22 Thread Hans de Goede
Hi Thomas, On 7/20/24 9:31 AM, Thomas Weißschuh wrote: > Hi Hans, > > On 2024-07-18 10:25:18+, Hans de Goede wrote: >> On 6/24/24 6:15 PM, Thomas Weißschuh wrote: >>> On 2024-06-24 11:11:40+, Hans de Goede wrote: On 6/23/24 10:51 AM, Thomas Weißschuh wrote: > The value of "min_in

Re: [BUG] HID: amd_sfh (drivers/hid/amd-sfh-hid/): memory/page corruption

2024-07-22 Thread Chris Hixon
On 7/21/24 00:20, Basavaraj Natikar wrote: > On 7/17/2024 4:51 PM, Linux regression tracking (Thorsten Leemhuis) wrote: >> On 15.07.24 06:39, Chris Hixon wrote: >>> System: HP ENVY x360 Convertible 15-ds1xxx; AMD Ryzen 7 4700U with >>> Radeon Graphics >>> >>> Problem commits (introduced in v6.9-rc

Re: [PATCH v9-resend 00/54] fix CONFIG_DRM_USE_DYNAMIC_DEBUG=y

2024-07-22 Thread Łukasz Bartosik
On Tue, Jul 16, 2024 at 8:58 PM Jim Cromie wrote: > > resending to fix double-copies of a dozen patches. > added 2 squash-ins to address Ville's designated-initializer comment. > > This fixes dynamic-debug support for DRM.debug, added via classmaps. > commit bb2ff6c27bc9 (drm: Disable dynamic debu

Re: [PATCH AUTOSEL 6.1 13/14] drm/amdgpu: fix dereference null return value for the function amdgpu_vm_pt_parent

2024-07-22 Thread Sasha Levin
On Tue, Jun 18, 2024 at 01:42:56PM +0200, Christian König wrote: Am 18.06.24 um 11:11 schrieb Pavel Machek: Hi! [ Upstream commit a0cf36546cc24ae1c95d72253c7795d4d2fc77aa ] The pointer parent may be NULLed by the function amdgpu_vm_pt_parent. To make the code more robust, check the pointer pa

RE: [PATCH 00/22] DC Patches for 15 July, 2024

2024-07-22 Thread Wheeler, Daniel
[Public] Hi all, This week this patchset was tested on the following systems: * Lenovo ThinkBook T13s Gen4 with AMD Ryzen 5 6600U * MSI Gaming X Trio RX 6800 * Gigabyte Gaming OC RX 7900 XTX These systems were tested on the following display/connection types: * eD

[PATCH] drm/amd/display: Add kdoc entry for 'bs_coeffs_updated' in dpp401_dscl_program_isharp

2024-07-22 Thread Srinivasan Shanmugam
Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/../display/dc/dpp/dcn401/dcn401_dpp_dscl.c:961: warning: Function parameter or struct member 'bs_coeffs_updated' not described in 'dpp401_dscl_program_isharp' Fixes: 431ae65ea4e1 ("drm/amd/display: ensure EASF and ISHARP coefficients are

Re: [PATCH 1/7] drm/amdgpu/gfx7: enable wave kill for compute queues

2024-07-22 Thread Christian König
Am 17.07.24 um 22:37 schrieb Alex Deucher: It should work the same for compute as well as gfx. Signed-off-by: Alex Deucher Reviewed-by: Christian König for the whole series. --- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/a

Re: [PATCH 1/4] drm/amdgpu/gfx10: properly handle error ints on all pipes

2024-07-22 Thread Christian König
Am 17.07.24 um 22:38 schrieb Alex Deucher: Need to handle the interrupt enables for all pipes. v2: fix indexing (Jessie) Signed-off-by: Alex Deucher Acked-by: Christian König for the whole series. --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 130 + 1 file change

Re: [PATCH 2/6] drm/amdgpu/gfx11: Enable bad opcode interrupt

2024-07-22 Thread Christian König
Am 17.07.24 um 22:40 schrieb Alex Deucher: From: Jesse Zhang For the bad opcode case, it will cause CP/ME hang. The firmware will prevent the ME side from hanging by raising a bad opcode interrupt. And the driver needs to perform a vmid reset when receiving the interrupt. v2: update irq namin

Re: [PATCH 2/6] drm/amdgpu/gfx11: Enable bad opcode interrupt

2024-07-22 Thread Alex Deucher
On Mon, Jul 22, 2024 at 9:55 AM Christian König wrote: > > Am 17.07.24 um 22:40 schrieb Alex Deucher: > > From: Jesse Zhang > > > > For the bad opcode case, it will cause CP/ME hang. > > The firmware will prevent the ME side from hanging by raising a bad opcode > > interrupt. > > And the driver

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-22 Thread Christian König
Am 22.07.24 um 15:52 schrieb Tvrtko Ursulin: On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change wa

Re: [PATCH] drm/amd/display: Add kdoc entry for 'bs_coeffs_updated' in dpp401_dscl_program_isharp

2024-07-22 Thread Rodrigo Siqueira Jordao
On 7/22/24 7:15 AM, Srinivasan Shanmugam wrote: Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/../display/dc/dpp/dcn401/dcn401_dpp_dscl.c:961: warning: Function parameter or struct member 'bs_coeffs_updated' not described in 'dpp401_dscl_program_isharp' Fixes: 431ae65ea4e1 ("drm/a

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-22 Thread Christian König
Am 22.07.24 um 16:43 schrieb Tvrtko Ursulin: On 22/07/2024 15:06, Christian König wrote: Am 22.07.24 um 15:52 schrieb Tvrtko Ursulin: On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin

Re: [PATCH v4 3/6] drm/i915: Make I2C terminology more inclusive

2024-07-22 Thread Andi Shyti
Hi Easwar, On Mon, Jul 22, 2024 at 09:15:08AM -0700, Easwar Hariharan wrote: > On 7/22/2024 5:50 AM, Andi Shyti wrote: > > Hi Easwar, > > > > merged to drm-intel-next. Thanks! > > > > On Thu, Jul 11, 2024 at 05:27:31AM +, Easwar Hariharan wrote: > >> I2C v7, SMBus 3.2, and I3C 1.1.1 specific

Re: [BUG] HID: amd_sfh (drivers/hid/amd-sfh-hid/): memory/page corruption

2024-07-22 Thread Benjamin Tissoires
On Jul 21 2024, Chris Hixon wrote: > On 7/21/24 00:20, Basavaraj Natikar wrote: > > > On 7/17/2024 4:51 PM, Linux regression tracking (Thorsten Leemhuis) wrote: > >> On 15.07.24 06:39, Chris Hixon wrote: > >>> System: HP ENVY x360 Convertible 15-ds1xxx; AMD Ryzen 7 4700U with > >>> Radeon Graphics

Re: [PATCH v2] drm/amd/display: Add NULL check for clk_mgr and clk_mgr->funcs in dcn401_init_hw

2024-07-22 Thread Alex Hung
Reviewed-by: Alex Hung On 2024-07-22 05:28, Srinivasan Shanmugam wrote: This commit addresses a potential null pointer dereference issue in the `dcn401_init_hw` function. The issue could occur when `dc->clk_mgr` or `dc->clk_mgr->funcs` is null. The fix adds a check to ensure `dc->clk_mgr` and

[PATCH] drm/amdkfd: fix debug watchpoints for logical devices

2024-07-22 Thread Jonathan Kim
The number of watchpoints should be set and constrained per logical partition device, not by the socket device. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_debug.c | 20 ++-- drivers/gpu/drm/amd/amdkfd/kfd_device.c | 4 ++-- drivers/gpu/drm/amd/amdkfd/kfd_pri

Re: [PATCH] drm/amdgpu: skip kfd init if GFX is not ready.

2024-07-22 Thread Deucher, Alexander
[Public] Acked-by: Alex Deucher From: Zhang, Yifan Sent: Sunday, July 21, 2024 10:25 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Zhang, Yifan Subject: [PATCH] drm/amdgpu: skip kfd init if GFX is not ready. avoid kfd init crash in that case.

Re: [PATCH] drm/amdgpu: Fix gfx10 kiq ring_lock warning on full reset

2024-07-22 Thread Alex Deucher
On Mon, Jul 22, 2024 at 4:16 AM Jesse Zhang wrote: > > Fix warning about kiq ring. > Unlock kiq ring when queue reset fails. > > [ 285.999224] amdgpu :03:00.0: amdgpu: GPU reset begin! > [ 312.018425] watchdog: BUG: soft lockup - CPU#11 stuck for 26s! > [kworker/u64:2:878] > [ 312.018428]

[PATCH] drm/amdgpu: set sched_hw_submission higher for MES

2024-07-22 Thread Alex Deucher
Apply KIQ logic to MES. MES doesn't really use the GPU scheduler. The base drivers generally use the MES ring directly rather than submitting IBs. However, amdgpu_sched_hw_submission (which defaults to 2) limits the number of outstanding fences to 2. KFD uses the MES for TLB flushes and the 2 f

Re: [PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-22 Thread Alex Deucher
Does this patch fix it? https://patchwork.freedesktop.org/patch/605437/ Alex On Mon, Jul 22, 2024 at 7:21 AM Christian König wrote: > > What I meant is that the MES ring is now to small for the number of packets > written to it. > > Either reduce the num_hw_submission or increase the MES ring s

Re: [PATCH v2] drm/amd/display: Add NULL check for clk_mgr and clk_mgr->funcs in dcn30_init_hw

2024-07-22 Thread Alex Hung
Reviewed-by: Alex Hung On 2024-07-22 04:51, Srinivasan Shanmugam wrote: This commit addresses a potential null pointer dereference issue in the `dcn30_init_hw` function. The issue could occur when `dc->clk_mgr` or `dc->clk_mgr->funcs` is null. The fix adds a check to ensure `dc->clk_mgr` and `

Re: 6.10/bisected/regression - Since commit e356d321d024 in the kernel log appears the message "MES failed to respond to msg=MISC (WAIT_REG_MEM)" which were never seen before

2024-07-22 Thread Alex Deucher
On Mon, Jul 22, 2024 at 4:50 AM Christian König wrote: > > That's a known issue and we are already working on it. Do either of these patches help? https://patchwork.freedesktop.org/patch/605437/ https://patchwork.freedesktop.org/patch/605201/ Alex > > Regards, > Christian. > > Am 20.07.24 um 19

Re: [PATCH v2] drm/amd/display: Add NULL check for clk_mgr in dcn32_init_hw

2024-07-22 Thread Alex Hung
Reviewed-by: Alex Hung On 2024-07-22 05:14, Srinivasan Shanmugam wrote: This commit addresses a potential null pointer dereference issue in the `dcn32_init_hw` function. The issue could occur when `dc->clk_mgr` is null. The fix adds a check to ensure `dc->clk_mgr` is not null before accessing

[PATCH v2] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-22 Thread ZhenGuo Yin
[Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when vm->genertaion is not equal to re-generation token. v2: Check vm->

Re: [PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-22 Thread Xiao, Jack
[AMD Official Use Only - AMD Internal Distribution Only] > Does this patch fix it? https://patchwork.freedesktop.org/patch/605437/ No, please do not check in the patch, it will make my fix not working. Regards, Jack From: Alex Deucher Sent: Tuesday, 23 July 2024

Re: 6.10/bisected/regression - commits bc87d666c05 and 6d4279cb99ac cause appearing green flashing bar on top of screen on Radeon 6900XT and 120Hz

2024-07-22 Thread Eric Biggers
On Tue, Jul 16, 2024 at 01:10:37PM -0400, Alex Deucher wrote: > From 8aaf8da07a8b542c0a0f4da2601da07beddfdeb0 Mon Sep 17 00:00:00 2001 > From: Alex Deucher > Date: Tue, 16 Jul 2024 12:49:25 -0400 > Subject: [PATCH] drm/amd/display: fix corruption with high refresh rates on > DCN 3.0 > > This rev

Re: [PATCH v2] drm/amd/display: Add null check for set_output_gamma in dcn30_set_output_transfer_func

2024-07-22 Thread Chung, ChiaHsuan (Tom)
Reviewed-by: Tom Chung On 7/22/2024 7:48 PM, Srinivasan Shanmugam wrote: This commit adds a null check for the set_output_gamma function pointer in the dcn30_set_output_transfer_func function. Previously, set_output_gamma was being checked for nullity at line 386, but then it was being derefer