Re: [PATCH] drm/amd/amdgpu: Add Annotations to Process Isolation functions

2024-12-02 Thread Alex Deucher
On Wed, Nov 27, 2024 at 2:57 PM Srinivasan Shanmugam wrote: > > This update adds explanations to key functions that manage how the > Kernel Fusion Driver (KFD) and Kernel Graphics Driver (KGD) share the > GPU. > > amdgpu_gfx_enforce_isolation_wait_for_kfd: Controls the waiting period > for KFD to

[PATCH v2] drm/amdkfd: Dereference null return value

2024-12-02 Thread Andrew Martin
In the function pqm_uninit there is a call-assignment of "pdd = kfd_get_process_device_data" which could be null, and this value was later dereferenced without checking. Fixes: fb91065851cd ("drm/amdkfd: Refactor queue wptr_bo GART mapping") Signed-off-by: Andrew Martin --- drivers/gpu/drm/amd/a

[PATCH] drm/amdgpu: warn on DCC in fallback copy path

2024-12-02 Thread Alex Deucher
If SDMA is not available, warn if we try and copy a GFX12 DCC buffer with the CPU. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdg

Re: [PATCH 00/10] drm/connector: add eld_mutex to protect connector->eld

2024-12-02 Thread Maxime Ripard
On Sun, 1 Dec 2024 01:55:17 +0200, Dmitry Baryshkov wrote: > The connector->eld is accessed by the .get_eld() callback. This access > can collide with the drm_edid_to_eld() updating the data at the same > time. Add drm_connector.eld_mutex to protect the data from concurrenct > access. > > > [ ...

[PATCH 1/2] drm/amdgpu: split ras_eeprom_init into init and check functions

2024-12-02 Thread Tao Zhou
Init function is for ras table header read and check function is responsible for the validation of the header. Call them in different stages. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 15 ++ .../gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c| 20

[PATCH 2/2] drm/amdgpu: correct the calculation of RAS bad page

2024-12-02 Thread Tao Zhou
After the introduction of NPS RAS, one bad page record on eeprom may be related to 1 or 16 bad pages, so the bad page record and bad page are two different concepts, define a new variable to store bad page number. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 10 +--

Re: [PATCH 01/10] drm/connector: add mutex to protect ELD from concurrent access

2024-12-02 Thread Jani Nikula
On Sun, 01 Dec 2024, Dmitry Baryshkov wrote: > The connector->eld is accessed by the .get_eld() callback. This access > can collide with the drm_edid_to_eld() updating the data at the same > time. Add drm_connector.eld_mutex to protect the data from concurrenct > access. Individual drivers are not

[PATCH 3/3] drm/amdgpu: add ACA support for jpeg v4.0.3

2024-12-02 Thread Yang Wang
Add ACA support for jpeg v4.0.3. Signed-off-by: Yang Wang Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 86 1 file changed, 86 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c in

[PATCH 2/3] drm/amdgpu: add ACA support for vcn v4.0.3

2024-12-02 Thread Yang Wang
v1: Add ACA support for vcn v4.0.3. v2: - split VCN ACA(v1) to 2 parts: vcn and jpeg. - move mmSMNAID_AID0_MCA_SMU to amdgpu_aca.h file. v3: - split JPEG ACA to another patch. Signed-off-by: Yang Wang Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c | 85

[PATCH 1/3] drm/amdgpu: move common ACA ipid defines into amdgpu_aca.h

2024-12-02 Thread Yang Wang
move common ACA ipid defines into amdgpu_aca.h file. Signed-off-by: Yang Wang Reviewed-by: Hawking Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h | 5 + drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 4 drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c | 1 - 3 files changed, 5 insertions(+), 5 de

Re: [PATCH v10 1/4] drm: Introduce device wedged event

2024-12-02 Thread Raag Jadav
On Fri, Nov 29, 2024 at 10:40:14AM -0300, André Almeida wrote: > Hi Raag, > > Em 28/11/2024 12:37, Raag Jadav escreveu: > > Introduce device wedged event, which notifies userspace of 'wedged' > > (hanged/unusable) state of the DRM device through a uevent. This is > > useful especially in cases whe

[PATCH] drm: amd: Fix potential NULL pointer dereference in atomctrl_get_smc_sclk_range_table

2024-12-02 Thread Ivan Stepchenko
The function atomctrl_get_smc_sclk_range_table() does not check the return value of smu_atom_get_data_table(). If smu_atom_get_data_table() fails to retrieve SMU_Info table, it returns NULL which is later dereferenced. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: a23ee

Re: [PATCH v10 1/4] drm: Introduce device wedged event

2024-12-02 Thread André Almeida
Hi Raag, Em 28/11/2024 12:37, Raag Jadav escreveu: Introduce device wedged event, which notifies userspace of 'wedged' (hanged/unusable) state of the DRM device through a uevent. This is useful especially in cases where the device is no longer operating as expected and has become unrecoverable f

Re: [PATCH 00/10] drm/connector: add eld_mutex to protect connector->eld

2024-12-02 Thread Dmitry Baryshkov
On Mon, Dec 02, 2024 at 10:19:41AM +, Maxime Ripard wrote: > On Sun, 1 Dec 2024 01:55:17 +0200, Dmitry Baryshkov wrote: > > The connector->eld is accessed by the .get_eld() callback. This access > > can collide with the drm_edid_to_eld() updating the data at the same > > time. Add drm_connector

Re: [PATCH] drm/amdgpu: rework resume handling for display (v2)

2024-12-02 Thread Zhang, George
[Public] From: amd-gfx on behalf of Christian König Sent: Monday, December 2, 2024 2:19 PM To: Deucher, Alexander; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: rework resume handling for display (v2) > >Am 02.12.24 um 20:18 schrieb Chri

[PATCH 06/11] drm/amdgpu: Apply gc v9_5_0 golden settings

2024-12-02 Thread Alex Deucher
From: Hawking Zhang Apply gc v9_5_0 golden settings. Signed-off-by: Hawking Zhang Reviewed-by: Asad Kamal Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/g

[PATCH 08/11] drm/amdkfd: update buffer_{store, load}_* modifiers for gfx940

2024-12-02 Thread Alex Deucher
From: Lancelot SIX Instruction modifiers of the untyped vector memory buffer instructions (MUBUF encoded) changed in gfx940. The slc, scc and glc modifiers have been replaced with sc0, sc1 and nt. The current CWSR trap handler is written using pre-gfx940 modifier names, making the source incomp

[PATCH 03/11] drm/amdgpu: add initial support for gfx950

2024-12-02 Thread Alex Deucher
From: Le Ma add gfx950 basic support Signed-off-by: Le Ma Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 5 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 3 +- drivers/g

[PATCH 11/11] drm/amdkfd: udpate the cwsr area size for gfx950

2024-12-02 Thread Alex Deucher
From: Le Ma Update cwsr area size for gfx950 to fit the new user queue buffer validation. The size of LDS calculation is referred from gfx950 thunk implementation. Signed-off-by: Le Ma Acked-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdkfd/kfd_queue.c | 10 ++-

[PATCH 10/11] drm/amdkfd: Handle save/restore of lds allocated in 1280B blocks

2024-12-02 Thread Alex Deucher
From: Lancelot SIX The gfx-9 trap handler is reading LDS allocation size in 256 bytes granularity (from SQ_WAVE_LDS_ALLOC), but it using the assumption that this value is always even (i.e. the LDS allocation is really done in multiple of 512 bytes). This was true so far, but gfx-950 allocates LD

[PATCH 01/11] drm/amd: define gc ip version local variable

2024-12-02 Thread Alex Deucher
From: Alex Sierra For better readability. Also leftover orphaned code. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Reviewed-by: Harish Kasiviswanathan Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 8 +++- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 5 +++

[PATCH 04/11] drm/amdgpu: Set proper MTYPE for GC 9.5.0

2024-12-02 Thread Alex Deucher
From: Alex Sierra GC 9.5.0 local memory MTYPE default should be set as RW. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c b/d

[PATCH 07/11] drm/amdkfd: add gc 9.5.0 support on kfd

2024-12-02 Thread Alex Deucher
From: Alex Sierra Initial support for GC 9.5.0. v2: squash in pqm_clean_queue_resource() fix from Lijo Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 1 + drivers/gpu/drm/amd/amdkfd/kfd_debug.h|

[PATCH 02/11] drm/amdgpu/gfx: add gfx950 microcode

2024-12-02 Thread Alex Deucher
From: Le Ma Add firmware declarations. Signed-off-by: Le Ma Reviewed-by: Hawking Zhang Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_

[PATCH 05/11] drm/amd: update mtype flags for gfx 9.5.0

2024-12-02 Thread Alex Deucher
From: Alex Sierra Update mtype flags to meet gfx 9.5.0 requirements for remote GPU memory and system memory. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling Reviewed-by: Harish Kasiviswanathan Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 8 driver

[PATCH 09/11] drm/amdkfd: Adjust CWSR trap handler for gfx950

2024-12-02 Thread Alex Deucher
From: Lancelot SIX In gfx950, the SQ_WAVE_LDS_ALLOC.LDS_SIZE field is extended to bits 12 to 22. The LDS_SIZE granularity remains unchanged (units of 64 dwords, or 256 bytes). This patch adjusts the CWSR trap handler to read the full extent of LDS_SIZE. Signed-off-by: Lancelot SIX Reviewed-by

Re: [PATCH 09/10] drm/sti: hdmi: use eld_mutex to protect access to connector->eld

2024-12-02 Thread Raphaël Gallais-Pou
Le 01/12/2024 à 00:55, Dmitry Baryshkov a écrit : Reading access to connector->eld can happen at the same time the drm_edid_to_eld() updates the data. Take the newly added eld_mutex in order to protect connector->eld from concurrent access. Signed-off-by: Dmitry Baryshkov Hi Dmitry, Acked

Re: [PATCH] drm: amd: Fix potential NULL pointer dereference in atomctrl_get_smc_sclk_range_table

2024-12-02 Thread Alex Deucher
On Mon, Dec 2, 2024 at 3:27 AM Ivan Stepchenko wrote: > > The function atomctrl_get_smc_sclk_range_table() does not check the return > value of smu_atom_get_data_table(). If smu_atom_get_data_table() fails to > retrieve SMU_Info table, it returns NULL which is later dereferenced. > > Found by Linu

Re: [PATCH] drm/amdgpu: device: fix spellos and punctuation

2024-12-02 Thread Alex Deucher
Applied. Thanks! On Wed, Nov 27, 2024 at 11:17 PM Randy Dunlap wrote: > > Make spelling and punctuation changes to ease reading of the comments. > > Signed-off-by: Randy Dunlap > Cc: Alex Deucher > Cc: Christian König > Cc: Xinhui Pan > Cc: amd-gfx@lists.freedesktop.org > Cc: David Airlie >

Re: [PATCH] drm/amdkfd: hard-code cacheline for gc943,gc944

2024-12-02 Thread Alex Deucher
On Wed, Nov 27, 2024 at 1:22 PM David Yat Sin wrote: > > Cacheline size is not available in IP discovery for gc943,gc944. > > Signed-off-by: David Yat Sin > Reviewed-by: Harish Kasiviswanathan Acked-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 6 ++ > 1 file changed,

Re: Kernel warning in dcn30_dpp.c; short freezing, crashes in KWin

2024-12-02 Thread Li Jiajun
I have the exact same issue with you. I'm using a laptop with 7840hs. I encountered this warning after upgrading the kerne to 6.11.10. Maybe some back-port patches break something, I think there is no warning when I got into kernel 6.11.0. And the warning also exists in 6.13-rc1. Maybe you can p

Re: [PATCH] drm/amdgpu: fix UVD contiguous CS mapping problem

2024-12-02 Thread Paneer Selvam, Arunpravin
Hi Christian, Thank you. Reviewed-by: Arunpravin Paneer Selvam Regards, Arun. On 11/29/2024 8:38 PM, Christian König wrote: When starting the mpv player, Radeon R9 users are observing the below error in dmesg. [drm:amdgpu_uvd_cs_pass2 [amdgpu]] *ERROR* msg/fb buffer ff00f7c000-ff00f7e000 out

Re: [PATCH 00/10] drm/connector: add eld_mutex to protect connector->eld

2024-12-02 Thread Maxime Ripard
On Mon, Dec 02, 2024 at 01:03:07PM +0200, Dmitry Baryshkov wrote: > On Mon, Dec 02, 2024 at 10:19:41AM +, Maxime Ripard wrote: > > On Sun, 1 Dec 2024 01:55:17 +0200, Dmitry Baryshkov wrote: > > > The connector->eld is accessed by the .get_eld() callback. This access > > > can collide with the d

[PATCH v2] drm/amd/amdgpu: Add Annotations to Process Isolation functions

2024-12-02 Thread Srinivasan Shanmugam
This update adds explanations to key functions that manage how the Kernel Fusion Driver (KFD) and Kernel Graphics Driver (KGD) share the GPU. amdgpu_gfx_enforce_isolation_wait_for_kfd: Controls the waiting period for KFD to ensure it takes turns with KGD in using the GPU. It uses a mutex to safely

Re: [PATCH] drm/amdgpu: rework resume handling for display (v2)

2024-12-02 Thread Christian König
Am 02.12.24 um 17:52 schrieb Alex Deucher: Split resume into a 3rd step to handle displays when DCC is enabled on DCN 4.0.1. Move display after the buffer funcs have been re-enabled so that the GPU will do the move and properly set the DCC metadata for DCN. v2: fix fence irq resume ordering Si

Re: [PATCH] drm/amdgpu: rework resume handling for display (v2)

2024-12-02 Thread Christian König
Am 02.12.24 um 20:18 schrieb Christian König: Am 02.12.24 um 17:52 schrieb Alex Deucher: Split resume into a 3rd step to handle displays when DCC is enabled on DCN 4.0.1.  Move display after the buffer funcs have been re-enabled so that the GPU will do the move and properly set the DCC metadata

RE: [PATCH] drm/amdgpu: Avoid to release the FW twice in the validated error

2024-12-02 Thread Liang, Prike
[AMD Official Use Only - AMD Internal Distribution Only] Thanks for the input, as offline syncing will add this remark in the function amdgpu_ucode_request() description part. Regards, Prike > -Original Message- > From: Lazar, Lijo > Sent: Monday, December 2, 2024 3:46 PM > To:

RE: [PATCH 1/3] drm/amdgpu: move common ACA ipid defines into amdgpu_aca.h

2024-12-02 Thread Zhou1, Tao
[AMD Official Use Only - AMD Internal Distribution Only] The series is: Reviewed-by: Tao Zhou > -Original Message- > From: Wang, Yang(Kevin) > Sent: Monday, December 2, 2024 4:49 PM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Zhou1, Tao > ; Yang, Stanley ; Zhang, > Hawki

RE: [PATCH 2/2] drm/amdgpu: correct the calculation of RAS bad page

2024-12-02 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] Series is Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: amd-gfx On Behalf Of Tao Zhou Sent: Monday, December 2, 2024 18:25 To: amd-gfx@lists.freedesktop.org Cc: Zhou1, Tao Subject: [PATCH 2/2] drm/amdgpu:

[PATCH] drm/amdgpu: rework resume handling for display (v2)

2024-12-02 Thread Alex Deucher
Split resume into a 3rd step to handle displays when DCC is enabled on DCN 4.0.1. Move display after the buffer funcs have been re-enabled so that the GPU will do the move and properly set the DCC metadata for DCN. v2: fix fence irq resume ordering Signed-off-by: Alex Deucher --- drivers/gpu/d

Re: [PATCH 07/10] drm/msm/dp: use eld_mutex to protect access to connector->eld

2024-12-02 Thread Abhinav Kumar
On 11/30/2024 3:55 PM, Dmitry Baryshkov wrote: Reading access to connector->eld can happen at the same time the drm_edid_to_eld() updates the data. Take the newly added eld_mutex in order to protect connector->eld from concurrent access. Signed-off-by: Dmitry Baryshkov --- drivers/gpu/drm/