[PATCH] drm/amdgpu: Driver needn't request RLC safe mode for gfx MGCG

2024-12-17 Thread Prike Liang
In accordance with the MGCG HW sequence, there is no need for the driver to request safe mode before enabling GFX MGCG. For GFX10 and later versions, maintaining safe mode is acceptable for GFX MGCG; otherwise, there will be an increased overhead during safe mode entry and exit when enabling other

RE: [PATCH v2] drm/amd/display: Fix NULL pointer dereference in dmub_tracebuffer_show

2024-12-17 Thread Li, Roman
[Public] Reviewed-by: Roman Li > -Original Message- > From: SHANMUGAM, SRINIVASAN > Sent: Thursday, December 12, 2024 6:08 AM > To: Siqueira, Rodrigo ; Pillai, Aurabindo > > Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN > ; Li, Sun peng (Leo) > ; Chung, ChiaHsuan (Tom) > ; L

[PATCH 1/3] drm/amd/display: fix page fault due to max surface definition mismatch

2024-12-17 Thread Melissa Wen
DC driver is using two different values to define the maximum number of surfaces: MAX_SURFACES and MAX_SURFACE_NUM. Consolidate MAX_SURFACES as the unique definition for surface updates across DC. It fixes page fault faced by Cosmic users on AMD display versions that support two overlay planes, si

[PATCH 3/3] drm/amd/display: fix divide error in DM plane scale calcs

2024-12-17 Thread Melissa Wen
dm_get_plane_scale doesn't take into account plane scaled size equal to zero, leading to a kernel oops due to division by zero. Fix by setting out-scale size as zero when the dst size is zero, similar to what is done by drm_calc_scale(). This issue started with the introduction of cursor ovelay mod

[PATCH 2/3] drm/amd/display: increase MAX_SURFACES to the value supported by hw

2024-12-17 Thread Melissa Wen
As the hw supports up to 4 surfaces, increase the maximum number of surfaces to prevent the DC error when trying to use more than three planes. [drm:dc_state_add_plane [amdgpu]] *ERROR* Surface: can not attach plane_state 3e2cb82c! Maximum is: 3 Link: https://gitlab.freedesktop.org/drm/a

[PATCH 0/3] drm/amd/display: fixes for kernel crashes since cursor overlay mode

2024-12-17 Thread Melissa Wen
Hi, Some issues have been found by Cosmic users of AMD display since the introduction of cursor overlay mode: page fault and divide errors causing interface freezes. Both are 100% reproducible and affects multiple HW versions. Patch 1 addresses the page fault error by resolving the definition mis

[PATCH] drm/amdgpu: Handle NULL bo->tbo.resource (again) in amdgpu_vm_bo_update

2024-12-17 Thread Michel Dänzer
From: Michel Dänzer Third time's the charm, I hope? Fixes: d3116756a710 ("drm/ttm: rename bo->mem and make it a pointer") Issue: https://gitlab.freedesktop.org/drm/amd/-/issues/3837 Signed-off-by: Michel Dänzer --- Or should amdgpu_vm_bo_evicted be called in the !bo->tbo.resource case as well?

[PATCH v2] drm/ci: uprev IGT

2024-12-17 Thread Vignesh Raman
Uprev IGT to the latest version and update expectation files. Signed-off-by: Vignesh Raman --- v1: - Pipeline link - https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1327810 Will update the flake bug report link after v1 is reviewed. v2: - Pipeline link - https://gitlab.f

Re: [PATCH v10 2/4] drm/doc: Document device wedged event

2024-12-17 Thread André Almeida
Em 17/12/2024 05:42, Raag Jadav escreveu: On Thu, Dec 12, 2024 at 03:50:29PM -0300, André Almeida wrote: Em 28/11/2024 12:37, Raag Jadav escreveu: Add documentation for device wedged event in a new 'Device wedging' chapter. The describes basic definitions, prerequisites and consumer expectation

[PATCH v6 8/9] LoongArch: Convert unreachable() to BUG()

2024-12-17 Thread Tiezhu Yang
When compiling on LoongArch, there exists the following objtool warning in arch/loongarch/kernel/machine_kexec.o: kexec_reboot() falls through to next function crash_shutdown_secondary() Avoid unreachable() as it can (and will in the absence of UBSAN) generate fallthrough code. Use BUG() so we

[PATCH v6 0/9] Add jump table support for objtool on LoongArch

2024-12-17 Thread Tiezhu Yang
This version is based on tip/tip.git objtool/core branch [1], add some weak and arch-specific functions to make the generic code more readable, tested with the latest upstream mainline Binutils, GCC and Clang. The first 6 patches are preparation for patch #7 to enable jump table for objtool on Loo

[PATCH v6 7/9] LoongArch: Enable jump table for objtool

2024-12-17 Thread Tiezhu Yang
For now, it is time to remove -fno-jump-tables to enable jump table for objtool if the compiler has -mannotate-tablejump, otherwise it is better to remain -fno-jump-tables to keep compatibility with older compilers. Signed-off-by: Tiezhu Yang --- arch/loongarch/Kconfig | 3 +++ arch/loongarch/M

[PATCH v3 0/1] drm/amdgpu: Use device wedged event

2024-12-17 Thread André Almeida
Raag Jadav is introducing a new DRM API for generating "device wedged events", to notify userspace when the device needs userspace intervention after a GPU reset[1]. I did a simple patch to add support for it for amdgpu for the telemetry aspect of the event. Tested in Steam Deck. This patch require

[PATCH v6 3/9] objtool: Handle PC relative relocation type

2024-12-17 Thread Tiezhu Yang
For the most part, an absolute relocation type is used for rodata. In the case of STT_SECTION, reloc->sym->offset is always zero, for the other symbol types, reloc_addend(reloc) is always zero, thus it can use a simple statement "reloc->sym->offset + reloc_addend(reloc)" to obtain the symbol offset

[PATCH v6 5/9] objtool/LoongArch: Add support for switch table

2024-12-17 Thread Tiezhu Yang
The objtool program need to analysis the control flow of each object file generated by compiler toolchain, it needs to know all the locations that a branch instruction may jump into, if a jump table is used, objtool has to correlate the jump instruction with the table. On x86 (which is the only po

[PATCH v6 2/9] objtool: Handle different entry size of rodata

2024-12-17 Thread Tiezhu Yang
In the most cases, the entry size of rodata is 8 bytes because the relocation type is 64 bit. There are also 32 bit relocation types, the entry size of rodata should be 4 bytes in this case. Add an arch-specific function arch_reloc_size() to assign the entry size of rodata for x86, powerpc and Loo

[PATCH v6 4/9] objtool: Handle unreachable entry of rodata

2024-12-17 Thread Tiezhu Yang
When compiling with Clang on LoongArch, there exists unreachable entry of rodata which points to a position after the function return instruction, this is generated by compiler to fill the non-existent switch case, just skip the entry when parsing the relocation section of rodata. Signed-off-by: T

Re: [PATCH v10 2/4] drm/doc: Document device wedged event

2024-12-17 Thread Raag Jadav
On Thu, Dec 12, 2024 at 03:50:29PM -0300, André Almeida wrote: > Em 28/11/2024 12:37, Raag Jadav escreveu: > > Add documentation for device wedged event in a new 'Device wedging' > > chapter. The describes basic definitions, prerequisites and consumer > > expectations along with an example. > > >

[PATCH v6 1/9] objtool: Handle various symbol types of rodata

2024-12-17 Thread Tiezhu Yang
In the relocation section ".rela.rodata" of each .o file compiled with LoongArch toolchain, there are various symbol types such as STT_NOTYPE, STT_OBJECT, STT_FUNC in addition to the usual STT_SECTION, it needs to use reloc symbol offset instead of reloc addend to find the destination instruction i

[PATCH v6 9/9] drm/amd/display: Mark dc_fixpt_from_fraction() noinline

2024-12-17 Thread Tiezhu Yang
When compiling with Clang on LoongArch, there exists the following objtool warning in drivers/gpu/drm/amd/display/dc/basics/fixpt31_32.o: dc_fixpt_recip() falls through to next function dc_fixpt_sinc() This is because dc_fixpt_from_fraction() is inlined in dc_fixpt_recip() by Clang, given dc_fi

[PATCH v3 1/1] drm/amdgpu: Use device wedged event

2024-12-17 Thread André Almeida
Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. In the future we might want to report a recovery method if the reset didn't succeed. Acked-by: Shashank Sharma Signed-off-by: André Almeida --- v3: fix if co

Re: [PATCH v2 1/1] drm/amdgpu: Use device wedged event

2024-12-17 Thread André Almeida
Em 16/12/2024 12:27, Christian König escreveu: Am 16.12.24 um 16:02 schrieb André Almeida: Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. In the future we might want to report a recovery method if the res

[PATCH v6 6/9] objtool/LoongArch: Add support for goto table

2024-12-17 Thread Tiezhu Yang
The objtool program need to analysis the control flow of each object file generated by compiler toolchain, it needs to know all the locations that a branch instruction may jump into, if a jump table is used, objtool has to correlate the jump instruction with the table. On x86 (which is the only po

[PATCH v2 1/1] drm/amdgpu: Use device wedged event

2024-12-17 Thread André Almeida
Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. In the future we might want to report a recovery method if the reset didn't succeed. Acked-by: Shashank Sharma Signed-off-by: André Almeida --- v2: Only repo

[PATCH v2 0/1] drm/amdgpu: Use device wedged event

2024-12-17 Thread André Almeida
Raag Jadav is introducing a new DRM API for generating "device wedged events", to notify userspace when the device needs userspace intervention after a GPU reset[1]. I did a simple patch to add support for it for amdgpu for the telemetry aspect of the event. Tested in Steam Deck. This patch require

Re: [PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

2024-12-17 Thread Brian Starkey
On Tue, Dec 17, 2024 at 11:13:05AM +, Michel Dänzer wrote: > On 2024-12-17 10:14, Brian Starkey wrote: > > On Sun, Dec 15, 2024 at 03:53:14PM +, Marek Olšák wrote: > >> The comment explains the problem with DRM_FORMAT_MOD_LINEAR. > >> > >> Signed-off-by: Marek Olšák > >> > >> diff --git a/

Re: [PATCH] drm/amdgpu: Make the submission path memory reclaim safe

2024-12-17 Thread Philipp Stanner
[+cc Krzysztof, who I think witnessed a possibly related Kernel crash in the wild] P. On Wed, 2024-11-13 at 13:48 +, Tvrtko Ursulin wrote: > From: Tvrtko Ursulin > > As commit 746ae46c1113 ("drm/sched: Mark scheduler work queues with > WQ_MEM_RECLAIM") > points out, ever since > a6149f03936

Re: [PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

2024-12-17 Thread Michel Dänzer
On 2024-12-17 10:14, Brian Starkey wrote: > On Sun, Dec 15, 2024 at 03:53:14PM +, Marek Olšák wrote: >> The comment explains the problem with DRM_FORMAT_MOD_LINEAR. >> >> Signed-off-by: Marek Olšák >> >> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h >> index 78abd8

Re: [PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

2024-12-17 Thread Michel Dänzer
On 2024-12-16 22:54, Marek Olšák wrote: > On Mon, Dec 16, 2024 at 5:46 AM Lucas Stach > wrote: > > Am Montag, dem 16.12.2024 um 10:27 +0100 schrieb Michel Dänzer: > > On 2024-12-15 21:53, Marek Olšák wrote: > > > The comment explains the problem with DRM

Re: [PATCH v3 1/1] drm/amdgpu: Use device wedged event

2024-12-17 Thread Christian König
Am 16.12.24 um 17:21 schrieb André Almeida: Use DRM's device wedged event to notify userspace that a reset had happened. For now, only use `none` method meant for telemetry capture. In the future we might want to report a recovery method if the reset didn't succeed. Acked-by: Shashank Sharma S

Re: [PATCH v2 4/7] drm/amdgpu: Fix out-of-bounds issue in user fence

2024-12-17 Thread Christian König
Hi Arun, Am 17.12.24 um 07:20 schrieb Paneer Selvam, Arunpravin: Hi Christian, On 12/13/2024 6:29 PM, Christian König wrote: Am 13.12.24 um 12:24 schrieb Paneer Selvam, Arunpravin: Hi Christian, On 12/13/2024 4:13 PM, Christian König wrote: Am 12.12.24 um 15:25 schrieb Arunpravin Paneer S

[PATCH v2] drm/amdgpu: Fix error handling in amdgpu_ras_add_bad_pages

2024-12-17 Thread Srinivasan Shanmugam
It ensures that appropriate error codes are returned when an error condition is detected Fixes the below; drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2849 amdgpu_ras_add_bad_pages() warn: missing error code here? 'amdgpu_umc_pages_in_a_row()' failed. drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c:2884 amdgp

Fwd: [PATCH] drm/amdgpu: Make the submission path memory reclaim safe

2024-12-17 Thread Christian König
Sending it out to the mailing lists once more because AMD mail servers tried to convert it to HTML :( Am 17.12.24 um 01:26 schrieb Matthew Brost: On Fri, Nov 22, 2024 at 02:36:59PM +, Tvrtko Ursulin wrote: [SNIP] Do we have system wide workqueues for that? It seems a bit overkill that amd

Re: [PATCH] drm/amdgpu: Make the submission path memory reclaim safe

2024-12-17 Thread Christian König
Am 17.12.24 um 01:26 schrieb Matthew Brost: On Fri, Nov 22, 2024 at 02:36:59PM +, Tvrtko Ursulin wrote: [SNIP] Do we have system wide workqueues for that? It seems a bit overkill that amdgpu has to allocate one on his own. I wondered the same but did not find any. Only ones I am aware of a

Re: [PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

2024-12-17 Thread Brian Starkey
Hi, On Sun, Dec 15, 2024 at 03:53:14PM +, Marek Olšák wrote: > The comment explains the problem with DRM_FORMAT_MOD_LINEAR. > > Signed-off-by: Marek Olšák > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > index 78abd819fd62e..8ec4163429014 100644 > --- a/inclu

Re: [PATCH] drm/fourcc: add LINEAR modifiers with an exact pitch alignment

2024-12-17 Thread Michel Dänzer
On 2024-12-16 22:29, Marek Olšák wrote: > On Mon, Dec 16, 2024 at 4:27 AM Michel Dänzer > wrote: > > On 2024-12-15 21:53, Marek Olšák wrote: > > The comment explains the problem with DRM_FORMAT_MOD_LINEAR. > >     > > Signed-off-by: Marek Olšák

RE: [PATCH] drm/amdgpu: Refine ip detection log message

2024-12-17 Thread Kamal, Asad
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Asad Kamal Thanks & Regards Asad -Original Message- From: Lazar, Lijo Sent: Tuesday, December 17, 2024 12:20 PM To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Deucher, Alexander ; Kamal, Asad Subject: [PATC