Re: [PATCH 08/43] drm/amd/display: FEC overhead should be checked once for mst slot nums

2024-07-18 Thread Jiri Slaby
On 28. 03. 24, 20:50, roman...@amd.com wrote: From: Hersen Wu [Why] Mst slot nums equals to pbn / pbn_div. Today, pbn_div refers to dm_mst_get_pbn_divider -> dc_link_bandwidth_kbps. In dp_link_bandwidth_kbps, which includes effect of FEC overhead already. As result, we should not include effec

RE: [PATCH Review V2 1/1] drm/amdgpu: Fix eeprom max record count

2024-07-18 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Hawking Zhang Thanks for the clarification. Regards, Hawking -Original Message- From: Yang, Stanley Sent: Thursday, July 18, 2024 12:06 To: Zhang, Hawking ; amd-gfx@lists.freedesktop.org Subject: RE: [PATCH Review V2

Re: [PATCH] drm/amd/display: Add 'pstate_keepout' kdoc entry in 'optc1_program_timing'

2024-07-18 Thread Chung, ChiaHsuan (Tom)
Reviewed-by: Tom Chung On 7/18/2024 12:03 PM, Srinivasan Shanmugam wrote: Fixes the below with gcc W=1: Function parameter or struct member 'pstate_keepout' not described in 'optc1_program_timing' Cc: Tom Chung Cc: Rodrigo Siqueira Cc: Roman Li Cc: Alex Hung Cc: Aurabindo Pillai Cc: Harr

Re: [PATCH v4 3/6] drm/i915: Make I2C terminology more inclusive

2024-07-18 Thread Andi Shyti
Hi Easwar, On Thu, Jul 11, 2024 at 05:27:31AM +, Easwar Hariharan wrote: > I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave" > with more appropriate terms. Inspired by Wolfram's series to fix drivers/i2c/, > fix the terminology for users of I2C_ALGOBIT bitbanging int

[PATCH v6 1/2] drm/buddy: Add start address support to trim function

2024-07-18 Thread Arunpravin Paneer Selvam
- Add a new start parameter in trim function to specify exact address from where to start the trimming. This would help us in situations like if drivers would like to do address alignment for specific requirements. - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this flag to disab

[PATCH v6 2/2] drm/amdgpu: Add address alignment support to DCC buffers

2024-07-18 Thread Arunpravin Paneer Selvam
Add address alignment support to the DCC VRAM buffers. v2: - adjust size based on the max_texture_channel_caches values only for GFX12 DCC buffers. - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only for DCC buffers. - roundup non power of two DCC buffer adjusted size to nea

Re: [PATCH v6 2/2] drm/amdgpu: Add address alignment support to DCC buffers

2024-07-18 Thread Christian König
Am 18.07.24 um 12:32 schrieb Arunpravin Paneer Selvam: Add address alignment support to the DCC VRAM buffers. v2: - adjust size based on the max_texture_channel_caches values only for GFX12 DCC buffers. - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only for DCC buffers.

Re: [PATCH v1 6/6] drm/amdgpu: add print support for sdma_v_4_4_2 ip_dump

2024-07-18 Thread Deucher, Alexander
[Public] Series is: Reviewed-by: Alex Deucher From: Sunil Khatri Sent: Thursday, July 18, 2024 12:42 AM To: Deucher, Alexander ; Koenig, Christian Cc: amd-gfx@lists.freedesktop.org ; Khatri, Sunil Subject: [PATCH v1 6/6] drm/amdgpu: add print support for sdma

Re: [PATCH] drm: Fix documentation warning for read_mpcc_state in mpc.h

2024-07-18 Thread Abhishek Tamboli
On Mon, Jul 15, 2024 at 05:46:38PM -0400, Aurabindo Pillai wrote: > > > On 7/12/24 1:45 PM, Abhishek Tamboli wrote: > > Add detail description for the read_mpcc_state function in the > > mpc_funcs struct to resolve the documentation warning. > > > > A kernel-doc warning was addressed: > > ./driv

Re: [PATCH v9 00/53] fix CONFIG_DRM_USE_DYNAMIC_DEBUG=y

2024-07-18 Thread Łukasz Bartosik
On Mon, Jul 15, 2024 at 8:00 PM wrote: > > On Mon, Jul 15, 2024 at 4:05 AM Łukasz Bartosik wrote: > > > > On Sat, Jul 13, 2024 at 11:45 PM wrote: > > > > > > On Fri, Jul 12, 2024 at 9:44 AM Łukasz Bartosik > > > wrote: > > > > > > > > On Wed, Jul 3, 2024 at 12:14 AM wrote: > > > > > > > > > >

Re: [PATCH v2 0/3] drm: backlight quirk infrastructure and lower minimum for Framework AMD 13

2024-07-18 Thread Hans de Goede
Hi Thomas, On 6/24/24 6:15 PM, Thomas Weißschuh wrote: > Hi Hans! > > thanks for your feedback! > > On 2024-06-24 11:11:40+, Hans de Goede wrote: >> On 6/23/24 10:51 AM, Thomas Weißschuh wrote: >>> The value of "min_input_signal" returned from ATIF on a Framework AMD 13 >>> is "12". This lea

Re: [BUG] HID: amd_sfh (drivers/hid/amd-sfh-hid/): memory/page corruption

2024-07-18 Thread Linux regression tracking (Thorsten Leemhuis)
On 15.07.24 06:39, Chris Hixon wrote: > System: HP ENVY x360 Convertible 15-ds1xxx; AMD Ryzen 7 4700U with > Radeon Graphics > > Problem commits (introduced in v6.9-rc1): > 6296562f30b1 HID: amd_sfh: Extend MP2 register access to SFH > 2105e8e00da4 HID: amd_sfh: Improve boot time when SFH is avail

Re: [PATCH v2] drm/radeon: fix null pointer dereference in radeon_add_common_modes

2024-07-18 Thread Alex Deucher
Applied. Thanks! Alex On Thu, Jul 18, 2024 at 9:13 AM Ma Ke wrote: > > In radeon_add_common_modes(), the return value of drm_cvt_mode() is > assigned to mode, which will lead to a possible NULL pointer dereference > on failure of drm_cvt_mode(). Add a check to avoid npd. > > Cc: sta...@vger.ker

[PATCH] drm/amdgpu/gfx10: handle SDMA in KIQ map/unmap

2024-07-18 Thread Alex Deucher
Add support for SMDA to the KIQ map/unmap functions. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 85 +- drivers/gpu/drm/amd/amdgpu/nvd.h | 2 + 2 files changed, 73 insertions(+), 14 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/

[PATCH] drm/amdgpu/mes11: handle second gfx pipe

2024-07-18 Thread Alex Deucher
Handle the second pipe in mes_v11_0_set_hw_resources(). Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index 8ce51b9236c1..27d54ec82208 1

[PATCH 00/34] GC per queue reset

2024-07-18 Thread Alex Deucher
This adds preliminary support for GC per queue reset. In this case, only the jobs currently in the queue are lost. If this fails, we fall back to a full adapter reset. Alex Deucher (19): drm/amdgpu/mes: add API for legacy queue reset drm/amdgpu/mes11: add API for legacy queue reset drm/amd

[PATCH 02/34] drm/amdgpu/mes11: add API for legacy queue reset

2024-07-18 Thread Alex Deucher
Add API for resetting kernel queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 33 ++ 1 file changed, 33 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index 27d54ec82208..f611183

[PATCH 03/34] drm/amdgpu/mes12: add API for legacy queue reset

2024-07-18 Thread Alex Deucher
Add API for resetting kernel queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 33 ++ 1 file changed, 33 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c index c9f74231ad59..14b8c88

[PATCH 06/34] drm/amdgpu/mes12: add API for user queue reset

2024-07-18 Thread Alex Deucher
Add API for resetting user queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c index 14b8c88fb0e0..aea6225df539 1

[PATCH 01/34] drm/amdgpu/mes: add API for legacy queue reset

2024-07-18 Thread Alex Deucher
Add API for resetting kernel queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 24 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 16 2 files changed, 40 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/dr

[PATCH 08/34] drm/amdgpu: add per ring reset support (v2)

2024-07-18 Thread Alex Deucher
If a specific job is hung, try and reset just the ring associated with the job. v2: move to amdgpu_job.c Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 17 + 1 file changed, 17 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/driv

[PATCH 05/34] drm/amdgpu/mes11: add API for user queue reset

2024-07-18 Thread Alex Deucher
Add API for resetting user queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 21 + 1 file changed, 21 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c index f611183e1ebf..bf8fb6a1becb 1

[PATCH 09/34] drm/amdgpu: increase the reset counter for the queue reset

2024-07-18 Thread Alex Deucher
From: Prike Liang Update the reset counter for the amdgpu_cs_query_reset_state() Signed-off-by: Prike Liang Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_

[PATCH 12/34] drm/amdgpu/gfx11: rename gfx_v11_0_gfx_init_queue()

2024-07-18 Thread Alex Deucher
Rename to gfx_v11_0_kgq_init_queue() to better align with the other naming in the file. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/a

[PATCH 07/34] drm/amdgpu: add new ring reset callback

2024-07-18 Thread Alex Deucher
Use this to reset just a single ring. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h index 582053f1cd56..c7f15edeb367 100644 --- a/

[PATCH 16/34] drm/amdgpu/gfx10: wait for reset done before remap

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-o

[PATCH 14/34] drm/amdgpu/gfx10: add ring reset callbacks

2024-07-18 Thread Alex Deucher
Add ring reset callbacks for gfx and compute. v2: fix gfx handling v3: wait for KIQ to complete Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 91 ++ 1 file changed, 91 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/driver

[PATCH 11/34] drm/amdgpu/gfx11: fallback to driver reset compute queue directly (v2)

2024-07-18 Thread Alex Deucher
From: Prike Liang Since the MES FW resets kernel compute queue always failed, this may caused by the KIQ failed to process unmap KCQ. So, before MES FW work properly that will fallback to driver executes dequeue and resets SPI directly. Besides, rework the ring reset function and make the busy ri

[PATCH 19/34] drm/amdgpu/gfx9: remap queue after reset successfully

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 36 --- 1 file change

[PATCH 10/34] drm/amdgpu/gfx11: add ring reset callbacks

2024-07-18 Thread Alex Deucher
Add ring reset callbacks for gfx and compute. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index ce5cb60b8628..56606c

[PATCH 13/34] drm/amdgpu/gfx11: wait for reset done before remap

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- dr

[PATCH 23/34] drm/amdgpu/gfx_9.4.3: wait for reset done before remap

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-o

[PATCH 04/34] drm/amdgpu/mes: add API for user queue reset

2024-07-18 Thread Alex Deucher
Add API for resetting user queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 43 + drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 9 ++ 2 files changed, 52 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/d

[PATCH 21/34] drm/amdgpu/gfx9.4.3: add ring reset callback

2024-07-18 Thread Alex Deucher
Add ring reset callback for compute. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 38 + 1 file changed, 38 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c index 98fe6c40da64..6cf90

[PATCH 30/34] drm/amdgpu/mes: implement amdgpu_mes_reset_hw_queue_mmio

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu The reset_queue api could be used from kfd or kgd. v2: add use_mmio parameter for mes_reset_legacy_queue. Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 20 1 file changed,

[PATCH 22/34] drm/amdgpu/gfx9.4.3: remap queue after reset successfully

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 36 ++--- 1 file change

[PATCH 20/34] drm/amdgpu/gfx9: wait for reset done before remap

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu There is a racing condition that cp firmware modifies MQD in reset sequence after driver updates it for remapping. We have to wait till CP_HQD_ACTIVE becoming false then remap the queue. v2: fix KIQ locking (Alex) Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-o

[PATCH 25/34] drm/amdgpu/gfx12: fallback to driver reset compute queue directly

2024-07-18 Thread Alex Deucher
Since the MES FW resets kernel compute queue always failed, this may caused by the KIQ failed to process unmap KCQ. So, before MES FW work properly that will fallback to driver executes dequeue and resets SPI directly. Besides, rework the ring reset function and make the busy ring type reset in eac

[PATCH 15/34] drm/amdgpu/gfx10: remap queue after reset successfully

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu Kiq command unmap_queues only does the dequeueing action. We have to map the queue back with clean mqd. v2: fix up error handling (Alex) Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 46 +++

[PATCH 32/34] drm/amdgpu/gfx11: add a mutex for the gfx semaphore

2024-07-18 Thread Alex Deucher
This will be used in more places in the future so add a mutex. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h| 2 ++ drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 10 +++--- 3 files changed, 10 insertions(+), 3 d

[PATCH 27/34] drm/amdgpu/gfx9: implement reset_hw_queue for gfx9

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu Using mmio to do queue reset. Enter safe mode when writing registers. Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 37 +++ 1 file changed, 37 insertions(+) diff --git a/

[PATCH 18/34] drm/amdgpu/gfx9: add ring reset callback

2024-07-18 Thread Alex Deucher
Add ring reset callback for compute. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 38 +++ 1 file changed, 38 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 675a1a8e2515..78495df29

[PATCH 24/34] drm/amdgpu/gfx12: add ring reset callbacks

2024-07-18 Thread Alex Deucher
Add ring reset callbacks for gfx and compute. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c index 63b073fd4dc7..9ed6c8

[PATCH 31/34] drm/amdgpu/gfx11: enter safe mode before touching CP_INT_CNTL

2024-07-18 Thread Alex Deucher
Need to enter safe mode before touching GC MMIO. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index 348bc1b1784a..9bd

[PATCH 28/34] drm/amdgpu/gfx9.4.3: implement reset_hw_queue for gfx9.4.3

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu Using mmio to do queue reset. Enter safe mode before writing mmio registers. v2: set register instance offset according to xcc id. Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 36

[PATCH 26/34] drm/amdgpu/gfx: add a new kiq_pm4_funcs callback for reset_hw_queue

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu Add reset_hw_queue in kiq_pm4_funcs callbacks. Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h b/dri

[PATCH 17/34] drm/amdgpu/gfx10: rework reset sequence

2024-07-18 Thread Alex Deucher
To match other GFX IPs. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 26 +++--- 1 file changed, 19 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index e9d93bf909db..b833

[PATCH 34/34] drm/amdgpu/mes11: implement mmio queue reset for gfx11

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu Implement queue reset for graphic and compute queue. v2: use amdgpu_gfx_rlc funcs to enter/exit safe mode. v3: use gfx_v11_0_request_gfx_index_mutex() Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.

[PATCH 29/34] drm/amdgpu/mes: modify mes api for mmio queue reset

2024-07-18 Thread Alex Deucher
From: Jiadong Zhu Add me/pipe/queue parameters for queue reset input. v2: fix build (Alex) Signed-off-by: Jiadong Zhu Reviewed-by: Alex Deucher Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 14 +- driv

[PATCH 33/34] drm/amdgpu/gfx11: export gfx_v11_0_request_gfx_index_mutex()

2024-07-18 Thread Alex Deucher
It will be used by the queue reset code. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.h | 3 +++ 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/a

[PATCH 1/4] drm/amdgpu/gfx9: per queue reset only on bare metal

2024-07-18 Thread Alex Deucher
It's not supported under SR-IOV at the moment. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 +++ drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 3 +++ 2 files changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_

[PATCH 3/4] drm/amdgpu/gfx11: per queue reset only on bare metal

2024-07-18 Thread Alex Deucher
It's not supported under SR-IOV at the moment. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c index 20be1b9ecdc3..021f7394b252 100644

[PATCH 2/4] drm/amdgpu/gfx10: per queue reset only on bare metal

2024-07-18 Thread Alex Deucher
It's not supported under SR-IOV at the moment. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c index b833943faa53..a8f26a311faf 100644

[PATCH 4/4] drm/amdgpu/gfx12: per queue reset only on bare metal

2024-07-18 Thread Alex Deucher
It's not supported under SR-IOV at the moment. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c index ba121491f5a7..4ee36a172471 100644

Re: [PATCH v3] drm/amdgpu: fix a possible null pointer dereference

2024-07-18 Thread Alex Deucher
Applied. Thanks! On Thu, Jul 18, 2024 at 10:12 AM Ma Ke wrote: > > In amdgpu_connector_add_common_modes(), the return value of drm_cvt_mode() > is assigned to mode, which will lead to a NULL pointer dereference on > failure of drm_cvt_mode(). Add a check to avoid npd. > > Cc: sta...@vger.kernel.

Re: [PATCH v2] drm/amd/amdgpu: Fix uninitialized variable warnings

2024-07-18 Thread Alex Deucher
Applied. Thanks! Alex On Thu, Jul 18, 2024 at 10:17 AM Ma Ke wrote: > > Return 0 to avoid returning an uninitialized variable r. > > Cc: sta...@vger.kernel.org > Fixes: 230dd6bb6117 ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10") > Signed-off-by: Ma Ke > --- > Changes in v2: > - adde

Re: [PATCH 00/34] GC per queue reset

2024-07-18 Thread Friedrich Vock
Hi, On 18.07.24 16:06, Alex Deucher wrote: This adds preliminary support for GC per queue reset. In this case, only the jobs currently in the queue are lost. If this fails, we fall back to a full adapter reset. First of all, thank you so much for working on this! It's great to finally see pr

Re: [PATCH 00/34] GC per queue reset

2024-07-18 Thread Alex Deucher
On Thu, Jul 18, 2024 at 10:15 AM Alex Deucher wrote: > > This adds preliminary support for GC per queue reset. In this > case, only the jobs currently in the queue are lost. If this > fails, we fall back to a full adapter reset. Also available here via git: https://gitlab.freedesktop.org/agd5f/

[PATCH 2/2] drm/amdkfd: support the debugger during per-queue reset

2024-07-18 Thread Jonathan Kim
In order to allow ROCm GDB to handle reset queues, raise an EC_QUEUE_RESET exception so that the debugger can subscribe and query this exception. Reset queues should still be considered suspendable with a status flag of KFD_DBG_QUEUE_RESET_MASK. However they should not be resumable since user spac

[PATCH 1/2] drm/amdkfd: support per-queue reset on gfx9

2024-07-18 Thread Jonathan Kim
Support per-queue reset for GFX9. The recommendation is for the driver to target reset the HW queue via a SPI MMIO register write. Since this requires pipe and HW queue info and MEC FW is limited to doorbell reports of hung queues after an unmap failure, scan the HW queue slots defined by SET_RES

Re: [PATCH 3/9] drm/amdkfd: Refactor queue wptr_bo GART mapping

2024-07-18 Thread Philip Yang
On 2024-07-17 16:10, Felix Kuehling wrote: @@ -603,8 +606,6 @@ struct queue {   void *gang_ctx_bo;   uint64_t gang_ctx_gpu_addr;   void *gang_ctx_cpu_ptr; -

Re: [PATCH 3/9] drm/amdkfd: Refactor queue wptr_bo GART mapping

2024-07-18 Thread Philip Yang
On 2024-07-17 16:16, Felix Kuehling wrote: Sorry, I see that this patch still doesn't propagate errors returned from kfd_queue_releasre_buffers correctly. And the later patches in the series don't seem to fix it either. See inline.

[PATCH 2/3] drm/amdgpu/gfx8: add ring reset callback for gfx

2024-07-18 Thread Alex Deucher
Add ring reset callback for gfx. Untested. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 75 ++- drivers/gpu/drm/amd/amdgpu/vid.h | 1 + 2 files changed, 75 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c

[PATCH 1/3] drm/amdgpu/gfx9: add ring reset callback for gfx

2024-07-18 Thread Alex Deucher
Add ring reset callback for gfx. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 47 +++ 1 file changed, 47 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index ae23a7848237..5c4b0c8669b6

[PATCH 3/3] drm/amdgpu/gfx7: add ring reset callback for gfx

2024-07-18 Thread Alex Deucher
Add ring reset callback for gfx. Untested. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/cikd.h | 1 + drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 76 ++- 2 files changed, 76 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/cikd.h b/dr

Re: [PATCH 3/9] drm/amdkfd: Refactor queue wptr_bo GART mapping

2024-07-18 Thread Felix Kuehling
On 2024-07-18 15:57, Philip Yang wrote: > > On 2024-07-17 16:16, Felix Kuehling wrote: >> Sorry, I see that this patch still doesn't propagate errors returned from >> kfd_queue_releasre_buffers correctly. And the later patches in the series >> don't seem to fix it either. See inline. > kfd_qu

Re: [PATCH v2] drm/amdkfd: Correct svm prange overlapping handling at svm_range_set_attr ioctl

2024-07-18 Thread Felix Kuehling
On 2024-07-18 1:25, Chen, Xiaogang wrote: > > On 7/17/2024 6:02 PM, Felix Kuehling wrote: >> >> On 2024-06-26 11:06, Xiaogang.Chen wrote: >>> From: Xiaogang Chen >>> >>> When user adds new vm range that has overlapping with existing svm pranges >>> current kfd creats a cloned pragne and split

[PATCH v2 0/9] KFD user queue validation

2024-07-18 Thread Philip Yang
This patch series do additional queue buffers validation in the queue creation IOCTLS, fail the queue creation if buffers not mapped on the GPU with the expected size. Ensure queue buffers residency by tracking the GPUVM virtual addresses for queue buffers to return error if the user tries to free

[PATCH v2 1/9] drm/amdkfd: kfd_bo_mapped_dev support partition

2024-07-18 Thread Philip Yang
Change amdgpu_amdkfd_bo_mapped_to_dev to use drm_priv as parameter instead of adev, to support spatial partition. This is only used by CRIU checkpoint restore now. No functional change. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |

[PATCH v2 7/9] drm/amdkfd: Validate user queue update

2024-07-18 Thread Philip Yang
Ensure update queue new ring buffer is mapped on GPU with correct size. Decrease queue old ring_bo queue_refcount and increase new ring_bo queue_refcount. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- .../amd/amdkfd/kfd_process_queue_manager.c| 32 ++- 1 file c

[PATCH v2 5/9] drm/amdkfd: Ensure user queue buffers residency

2024-07-18 Thread Philip Yang
Add atomic queue_refcount to struct bo_va, return -EBUSY to fail unmap BO from the GPU if the bo_va queue_refcount is not zero. Create queue to increase the bo_va queue_refcount, destroy queue to decrease the bo_va queue_refcount, to ensure the queue buffers mapped on the GPU when queue is active.

[PATCH v2 9/9] drm/amdkfd: Validate queue cwsr area and eop buffer size

2024-07-18 Thread Philip Yang
When creating KFD user compute queue, check if queue eop buffer size, cwsr area size, ctl stack size equal to the size of KFD node properities. Check the entire cwsr area which may split into multiple svm ranges aligned to gramularity boundary. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehl

[PATCH v2 6/9] drm/amdkfd: Validate user queue svm memory residency

2024-07-18 Thread Philip Yang
Queue CWSR area maybe registered to GPU as svm memory, create queue to ensure svm mapped to GPU with KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED flag. Add queue_refcount to struct svm_range, to track queue CWSR area usage. Because unmap mmu notifier callback return value is ignored, if application unmap

[PATCH v2 4/9] drm/amdkfd: Validate user queue buffers

2024-07-18 Thread Philip Yang
Find user queue rptr, ring buf, eop buffer and cwsr area BOs, and check BOs are mapped on the GPU with correct size and take the BO reference. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 4 +++ drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 38

[PATCH v2 2/9] drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer

2024-07-18 Thread Philip Yang
Pass pointer reference to amdgpu_bo_unref to clear the correct pointer, otherwise amdgpu_bo_unref clear the local variable, the original pointer not set to NULL, this could cause use-after-free bug. Signed-off-by: Philip Yang Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_amd

[PATCH v2 8/9] drm/amdkfd: Store queue cwsr area size to node properties

2024-07-18 Thread Philip Yang
Use the queue eop buffer size, cwsr area size, ctl stack size calculation from Thunk, store the value to KFD node properties. Those will be used to validate queue eop buffer size, cwsr area size, ctl stack size when creating KFD user compute queue. Those will be exposed to user space via sysfs KF

[PATCH v2 3/9] drm/amdkfd: Refactor queue wptr_bo GART mapping

2024-07-18 Thread Philip Yang
Add helper function kfd_queue_acquire_buffers to get queue wptr_bo reference from queue write_ptr if it is mapped to the KFD node with expected size. Add wptr_bo to structure queue_properties because structure queue is allocated after queue buffers are validated, then we can remove wptr_bo paramet

Re: [PATCH v2 0/9] KFD user queue validation

2024-07-18 Thread Felix Kuehling
On 2024-07-18 17:05, Philip Yang wrote: > This patch series do additional queue buffers validation in the queue > creation IOCTLS, fail the queue creation if buffers not mapped on the GPU > with the expected size. > > Ensure queue buffers residency by tracking the GPUVM virtual addresses > for q

[pull] amdgpu drm-fixes-6.11

2024-07-18 Thread Alex Deucher
Hi Dave, Sima, Fixes for 6.11. The following changes since commit 1cff1010bef6f325d895db0306b59dc7232ed9b7: drm/amdgpu/mes12: add missing opcode string (2024-07-12 11:46:46 -0400) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-fixes-6.1

[PATCH] drm/amdkfd: allow users to target recommended SDMA engines

2024-07-18 Thread Jonathan Kim
Certain GPUs have better copy performance over xGMI on specific SDMA engines depending on the source and destination GPU. Allow users to create SDMA queues on these recommended engines. Close to 2x overall performance has been observed with this optimization. Signed-off-by: Jonathan Kim --- driv