Re: [PATCH] drm/amdgpu: Fix the runtime pm mode error

2024-03-21 Thread Lazar, Lijo
On 3/21/2024 12:28 PM, Ma, Jun wrote: > > > On 3/20/2024 9:38 PM, Lazar, Lijo wrote: >> >> >> On 3/20/2024 6:54 PM, Alex Deucher wrote: >>> On Wed, Mar 20, 2024 at 6:17 AM Ma Jun wrote: Because of the logic error, Arcturus and vega20 currently use the AMDGPU_RUNPM_NONE for runt

Re: [PATCH v2] drm/amdgpu: Fix format character cut-off issues in amdgpu_vcn_early_init()

2024-03-21 Thread Lazar, Lijo
On 3/21/2024 10:29 AM, Srinivasan Shanmugam wrote: > Reducing the size of ucode_prefix to 25 in the amdgpu_vcn_early_init > function. This would ensure that the total number of characters being > written into fw_name does not exceed its size of 40. > > Fixes the below with gcc W=1: > drivers/gp

Re: [PATCH] drm/amdgpu: Fix 'fw_name' buffer size to prevent truncations in amdgpu_mes_init_microcode

2024-03-21 Thread Lazar, Lijo
On 3/21/2024 11:16 AM, Srinivasan Shanmugam wrote: > The snprintf function is used to write a formatted string into fw_name. > The format of the string is "amdgpu/%s_mes%s.bin", where %s is replaced > by the string in ucode_prefix and the second %s is replaced by either > "_2" or "1" depending o

[PATCH] drm/amd: Flush GFXOFF requests in prepare stage

2024-03-21 Thread Mario Limonciello
From: Mario Limonciello If the system hasn't entered GFXOFF when suspend starts it can cause hangs accessing GC and RLC during the suspend stage. Cc: # 6.1.y: 5095d5418193 ("drm/amd: Evict resources during PM ops prepare() callback") Cc: # 6.1.y: cb11ca3233aa ("drm/amd: Add concept of runnin

[PATCH] drm/amdgpu: Fix use after free in trace_amdgpu_bo_move

2024-03-21 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Pipelined object migration will free up the old bo->resource, meaning the tracepoint added in 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") will trigger an use after free when it dereferences the cached old_mem. Fix it by caching the m

Re: [PATCH] drm/amdgpu: Remove pci address checks from acpi_vfct_bios

2024-03-21 Thread Kurt Kartaltepe
On Wed, Mar 20, 2024 at 6:31 AM Christian König wrote: > Can you provide the full output of lspci -. As far as I can see that > doesn't looks so invalid to me. I've added the relevant pci probing debug output without assign-busses and the lspci - for a boot with all devices visible. htt

RE: [PATCH 2/2] drm/amdgpu: simplify convert_error_address interface for UMC v12

2024-03-21 Thread Yang, Stanley
[AMD Official Use Only - General] The series is Reviewed-by: Stanley.Yang Regards, Stanley > -Original Message- > From: amd-gfx On Behalf Of Tao > Zhou > Sent: Thursday, March 21, 2024 11:30 AM > To: amd-gfx@lists.freedesktop.org > Cc: Zhou1, Tao > Subject: [PATCH 2/2] drm/amdgpu: simp

Re: [PATCH] drm/amdgpu: refactor code to split devcoredump code

2024-03-21 Thread Christian König
Am 20.03.24 um 20:44 schrieb Sunil Khatri: Refractor devcoredump code into new files since its functionality is expanded further and better to slit and devcoredump to have its own file. Signed-off-by: Sunil Khatri Acked-by: Christian König --- drivers/gpu/drm/amd/amdgpu/Makefile

Re: [PATCH v2] drm/amdgpu: Fix use after free in trace_amdgpu_bo_move

2024-03-21 Thread Christian König
Am 20.03.24 um 18:12 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Pipelined object migration will free up the old bo->resource, meaning the tracepoint added in 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap") will trigger an use after free when it dereferenc

Re: [PATCH v2] drm/amdgpu: Fix use after free in trace_amdgpu_bo_move

2024-03-21 Thread Tvrtko Ursulin
Hi Christian, On 21/03/2024 10:25, Christian König wrote: Am 20.03.24 um 18:12 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Pipelined object migration will free up the old bo->resource, meaning the tracepoint added in 94aeb4117343 ("drm/amdgpu: fix ftrace event amdgpu_bo_move always move on

Re: [PATCH] drm/amdgpu: remove invalid resource->start check

2024-03-21 Thread Pierre-Eric Pelloux-Prayer
Le 20/03/2024 à 13:49, Christian König a écrit : The majority of those where removed in the patch aed01a68047b drm/amdgpu: Remove TTM resource->start visible VRAM condition v2 But this one was missed because it's working on the resource and not the BO. Since we also no longer use a fake start

[PATCH] drm/amdgpu: Refine IB schedule error logging

2024-03-21 Thread Lijo Lazar
Downgrade to debug information when IBs are skipped. Also, use dev_* to identify the device. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c b/drivers/gpu/drm

Re: [PATCH] drm/amdgpu: Refine IB schedule error logging

2024-03-21 Thread Christian König
Am 21.03.24 um 13:36 schrieb Lijo Lazar: Downgrade to debug information when IBs are skipped. Also, use dev_* to identify the device. Signed-off-by: Lijo Lazar Reviewed-by: Christian König --- drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 7 +-- 1 file changed, 5 insertions(+), 2 deleti

[PATCH] drm/amdgpu: once more fix the call oder in amdgpu_ttm_move()

2024-03-21 Thread Christian König
This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes where suggested, but essentially we should call the move notification before actually moving things because

RE: [PATCH] drm/amdgpu: Refine IB schedule error logging

2024-03-21 Thread Kamal, Asad
[AMD Official Use Only - General] Reviewed-by: Asad Kamal Thanks & Regards Asad -Original Message- From: Lazar, Lijo Sent: Thursday, March 21, 2024 6:06 PM To: amd-gfx@lists.freedesktop.org Cc: Zhang, Hawking ; Deucher, Alexander ; Koenig, Christian ; Kamal, Asad Subject: [PATCH] dr

Re: [PATCH] drm/amdgpu: Fix the runtime pm mode error

2024-03-21 Thread Alex Deucher
On Thu, Mar 21, 2024 at 2:52 AM Ma, Jun wrote: > > > > On 3/20/2024 9:24 PM, Alex Deucher wrote: > > On Wed, Mar 20, 2024 at 6:17 AM Ma Jun wrote: > >> > >> Because of the logic error, Arcturus and vega20 currently > >> use the AMDGPU_RUNPM_NONE for runtime pm even though they > >> support BACO.

Re: [PATCH] drm/amdgpu: once more fix the call oder in amdgpu_ttm_move()

2024-03-21 Thread Tvrtko Ursulin
On 21/03/2024 12:43, Christian König wrote: This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes where suggested, but essentially we should call the move no

Re: [PATCH v8] drm/amdgpu: sync page table freeing with tlb flush

2024-03-21 Thread Sharma, Shashank
On 18/03/2024 16:24, Christian König wrote: Am 18.03.24 um 16:22 schrieb Sharma, Shashank: On 18/03/2024 16:01, Christian König wrote: Am 18.03.24 um 15:44 schrieb Shashank Sharma: The idea behind this patch is to delay the freeing of PT entry objects until the TLB flush is done. This patch

[PATCH] drm/amdgpu: Add a NULL check for freeing root PT

2024-03-21 Thread Shashank Sharma
This patch adds a NULL check to fix this crash reported during the freeing of root PT entry: [ 06:55] BUG: unable to handle page fault for address: c9002d637aa0 [ +0.007689] #PF: supervisor write access in kernel mode [ +0.005833] #PF: error_code(0x0002) - not-present page [ +0.005732]

Re: [PATCH] drm/amdgpu: once more fix the call oder in amdgpu_ttm_move()

2024-03-21 Thread Christian König
Am 21.03.24 um 15:12 schrieb Tvrtko Ursulin: On 21/03/2024 12:43, Christian König wrote: This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move on same heap. The basic problem here is that after the move the old location is simply not available any more. Some fixes where suggeste

RE: [PATCH] drm/amdkfd: range check cp bad op exception interrupts

2024-03-21 Thread Kim, Jonathan
[Public] Ping for review. Thanks, Jon > -Original Message- > From: Kim, Jonathan > Sent: Wednesday, March 13, 2024 10:21 AM > To: amd-gfx@lists.freedesktop.org > Cc: Kuehling, Felix ; Huang, JinHuiEric > ; Kim, Jonathan ; > Kim, Jonathan ; Zhang, Jesse(Jie) > > Subject: [PATCH] drm/am

[linux-next:master] BUILD REGRESSION e7528c088874326d3060a46f572252be43755a86

2024-03-21 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: e7528c088874326d3060a46f572252be43755a86 Add linux-next specific files for 20240321 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202403211931.wvl6ymav-...@intel.com Error

Re: [PATCH] drm/amdkfd: range check cp bad op exception interrupts

2024-03-21 Thread Felix Kuehling
On 2024-03-13 10:21, Jonathan Kim wrote: Due to a CP interrupt bug, bad packet garbage exception codes are raised. Do a range check so that the debugger and runtime do not receive garbage codes. Update the user api to guard exception code type checking as well. Signed-off-by: Jonathan Kim Tes

Re: [PATCH] drm/amdkfd: Cleanup workqueue during module unload

2024-03-21 Thread Felix Kuehling
On 2024-03-20 18:52, Mukul Joshi wrote: Destroy the high priority workqueue that handles interrupts during KFD node cleanup. Signed-off-by: Mukul Joshi Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_interrupt.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/dri

[PATCH] drm/amdgpu: fix function implicit declaration error

2024-03-21 Thread Sunil Khatri
when CONFIG_DEV_COREDUMP is not defined in that case when amdgpu_coredump() is called it does not find it's definition and the build fails. This happens as the header is defined without the CONFIG_DEV_COREDUMP ifdef and due to which header isn't enabled. Pulling the header out of such ifdef so in

Re: [PATCH] drm/amdgpu: fix function implicit declaration error

2024-03-21 Thread Alex Deucher
Acked-by: Alex Deucher On Thu, Mar 21, 2024 at 2:14 PM Sunil Khatri wrote: > > when CONFIG_DEV_COREDUMP is not defined in that case > when amdgpu_coredump() is called it does not find it's > definition and the build fails. > > This happens as the header is defined without the > CONFIG_DEV_COREDU

Re: [PATCH] drm/amd: Flush GFXOFF requests in prepare stage

2024-03-21 Thread Alex Deucher
On Thu, Mar 21, 2024 at 5:37 AM Mario Limonciello wrote: > > From: Mario Limonciello > > If the system hasn't entered GFXOFF when suspend starts it can cause > hangs accessing GC and RLC during the suspend stage. > > Cc: # 6.1.y: 5095d5418193 ("drm/amd: Evict resources > during PM ops prepare()

Re: [PATCH] drm/amdgpu: once more fix the call oder in amdgpu_ttm_move()

2024-03-21 Thread Alex Deucher
On Thu, Mar 21, 2024 at 8:52 AM Christian König wrote: > > This reverts drm/amdgpu: fix ftrace event amdgpu_bo_move always move > on same heap. The basic problem here is that after the move the old > location is simply not available any more. > > Some fixes where suggested, but essentially we shou

Re: [PATCH 2/2] drm/amd/display: Move PRIMARY plane zpos higher

2024-03-21 Thread Harry Wentland
On 2024-03-15 13:09, sunpeng...@amd.com wrote: > From: Leo Li > > [Why] > > Compositors have different ways of assigning surfaces to DRM planes for > render offloading. It may decide between various strategies: overlay, > underlay, or a mix of both > > One way for compositors to implement th

Re: [PATCH 1/2] drm/amd/display: Introduce overlay cursor mode

2024-03-21 Thread Harry Wentland
On 2024-03-15 13:09, sunpeng...@amd.com wrote: > From: Leo Li > > [Why] > > DCN is the display hardware for amdgpu. DRM planes are backed by DCN > hardware pipes, which carry pixel data from one end (memory), to the > other (output encoder). > > Each DCN pipe has the ability to blend in a cu

[pull] amdgpu, amdkfd drm-fixes-6.9

2024-03-21 Thread Alex Deucher
Hi Dave, Sima, Fixes for 6.9. Fairly big because it's about 3 weeks of fixes. The following changes since commit 119b225f01e4d3ce974cd3b4d982c76a380c796d: Merge tag 'amd-drm-next-6.9-2024-03-08-1' of https://gitlab.freedesktop.org/agd5f/linux into drm-next (2024-03-11 13:32:12 +1000) are a

[PATCH 1/2] drm/amdgpu/umsch: update UMSCH 4.0 FW interface

2024-03-21 Thread Lang Yu
Align with FW changes. Signed-off-by: Lang Yu Reviewed-by: Veerabadhran Gopalakrishnan --- drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.h | 20 +-- .../drm/amd/include/umsch_mm_4_0_api_def.h| 13 ++-- 2 files changed, 21 insertions(+), 12 deletions(-) diff --git a/dr

[PATCH 2/2] drm/amdgpu: enable UMSCH 4.0.6

2024-03-21 Thread Lang Yu
Share same codes with 4.0.5 and enable collaborate mode for VPE. Signed-off-by: Lang Yu Reviewed-by: Veerabadhran Gopalakrishnan --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_umsch_mm.c | 12 ++-- drivers/gpu/drm/amd/amdgpu/umsch_mm_v4_0.c

[PATCH] drm/amdgpu: Fix truncation issues in gfx_v9_0.c

2024-03-21 Thread Srinivasan Shanmugam
The size of fw_name is increased to ensure that it can accommodate the maximum possible size of the string being written into it. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c: In function ‘gfx_v9_0_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:1255:52: warning: ‘%s’ dir

[PATCH] drm/amdgpu: Fix truncations in gfx_v11_0_init_microcode()

2024-03-21 Thread Srinivasan Shanmugam
Reducing the size of ucode_prefix to 25 in the gfx_v11_0_init_microcode function. This would ensure that the total number of characters being written into fw_name does not exceed its size of 40. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c: In function ‘gfx_v11_0_early_init

[PATCH] drm/amdgpu: Fix truncation in gfx_v10_0_init_microcode

2024-03-21 Thread Srinivasan Shanmugam
The total size of the fw_name buffer is 8 (for "amdgpu/") + 30 (for ucode_prefix) + 5 (for "_pfp") + 5 (for "_wks") + 5 (for ".bin") = 53 characters. Fixes the below with gcc W=1: drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c: In function ‘gfx_v10_0_early_init’: drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:398