[PATCH v2] drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 for sriov multiple vf.

2025-02-06 Thread Emily Deng
In sriov multiple vf, Set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD. v2: Add amdgpu_sriov_multi_vf_mode Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h | 2 ++ drivers/gpu/drm/amd/amdgp

Re: [PATCH] drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 for sriov multiple vf.

2025-02-06 Thread Lazar, Lijo
On 2/6/2025 1:26 PM, Emily Deng wrote: > In sriov multiple vf, Set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to > read WPTR from MQD. > > Signed-off-by: Emily Deng > --- > drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +- > .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 25

[PATCH] drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 for sriov multiple vf.

2025-02-06 Thread Emily Deng
In sriov multiple vf, Set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 25 +-- 2 files changed, 24 insertions(+), 3 deleti

Re: [PATCH v2] drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 for sriov multiple vf.

2025-02-06 Thread Lazar, Lijo
On 2/6/2025 3:11 PM, Emily Deng wrote: > In sriov multiple vf, Set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to > read WPTR from MQD. > > v2: Add amdgpu_sriov_multi_vf_mode > Signed-off-by: Emily Deng > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- > drivers/gpu/drm/amd/am

[PATCH] drm/amdgpu: Set snoop bit for SDMA for MI series

2025-02-06 Thread Harish Kasiviswanathan
SDMA writes has to probe invalidate RW lines. Set snoop bit in mmhub for this to happen. v2: Missed a few mmhub_v9_4. Added now. v3: Calculate hub offset once since it doesn't change inside the loop Modified function names based on review comments. Signed-off-by: Harish Kasiviswanathan ---

Re: [PATCH 43/44] drm/amdgpu/vcn: optimize firmware storage

2025-02-06 Thread Boyuan Zhang
On 2025-01-31 11:57, Alex Deucher wrote: If each instance uses the same fw image, only store one copy in the driver. Signed-off-by: Alex Deucher Acked-by: Boyuan Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 30 + drivers/gpu

Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-06 Thread Eric Huang
Ping .. On 2025-01-29 10:20, Eric Huang wrote: In some ASICs L2 cache info may miss in kfd topology, because the first bitmap may be empty, that means the first cu may be inactive, so to find the first active cu will solve the issue. Signed-off-by: Eric Huang --- drivers/gpu/drm/amd/amdk

RE: [PATCH] drm/amd/include : Update MES v12 API for fence update

2025-02-06 Thread Liu, Shaoyun
[AMD Official Use Only - AMD Internal Distribution Only] ping -Original Message- From: Liu, Shaoyun Sent: Wednesday, February 5, 2025 1:41 PM To: amd-gfx@lists.freedesktop.org Cc: Liu, Shaoyun Subject: [PATCH] drm/amd/include : Update MES v12 API for fence update MES fence_value will b

Re: [PATCH v3 0/3] drm/amdgpu: Explicit sync for GEM VA operations

2025-02-06 Thread Friedrich Vock
On 04.02.25 14:58, Alex Deucher wrote: On Tue, Feb 4, 2025 at 8:37 AM Christian König wrote: Hi Friedrich, adding Alex. Am 04.02.25 um 13:32 schrieb Friedrich Vock: Hi, Bumping this again - it's been quite a while, what became of that KFD bugfix and the userqueue stuff? It'd be nice

Re: [PATCH 04/11] drm/amdgpu/gfx9: use amdgpu_gfx_off_ctrl_immediate() for PG

2025-02-06 Thread Lazar, Lijo
On 2/4/2025 3:13 AM, Alex Deucher wrote: > Use amdgpu_gfx_off_ctrl_immediate() when powergating. > There's no need for the delay in gfx off allow. The > powergating is dynamically disabled/enabled as for > RV/PCO on compute queues and allowing gfx off again as > soon the job is submitted improv

Re: [PATCH 44/44] drm/amdgpu/vcn: use dev_info() for firmware information

2025-02-06 Thread Boyuan Zhang
On 2025-01-31 11:57, Alex Deucher wrote: To properly handle multiple GPUs. Signed-off-by: Alex Deucher Reviewed-by: Boyuan Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers

Re: [PATCH 05/11] drm/amdgpu/sdma5.2: use amdgpu_gfx_off_ctrl_immediate()

2025-02-06 Thread Lazar, Lijo
On 2/4/2025 3:13 AM, Alex Deucher wrote: > In begin_use/end_use use amdgpu_gfx_off_ctrl_immediate() > rather than amdgpu_gfx_off_ctrl() as we don't need the > extra delay before we allow gfxoff again. > > Signed-off-by: Alex Deucher Won't this cause unnecessary GFX allows since sdma jobs coul

RE: [PATCH v2 4/4] drm/amdgpu: Make VBIOS image read optional

2025-02-06 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] Series is Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: amd-gfx On Behalf Of Lijo Lazar Sent: Thursday, February 6, 2025 12:23 To: amd-gfx@lists.freedesktop.org; Lazar, Lijo Cc: Zhang, Hawking ; Deucher, A

Re: [PATCH 32/44] drm/amdgpu/vcn: use per instance callbacks for idle work handler

2025-02-06 Thread Boyuan Zhang
On 2025-01-31 11:57, Alex Deucher wrote: Use the vcn instance power gating callbacks rather than the IP powergating callback. This limits power gating to only the instance in use rather than all of the instances. Signed-off-by: Alex Deucher Reviewed-by: Boyuan Zhang

Re: [PATCH 33/44] drm/amdgpu/vcn: add a generic helper for set_power_gating_state

2025-02-06 Thread Boyuan Zhang
On 2025-01-31 11:57, Alex Deucher wrote: It's common for all VCN variants. Signed-off-by: Alex Deucher Patches 33-42 are Reviewed-by: Boyuan Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 23 +++ drivers/gpu/drm/amd/amdgpu/amdg

Re: [PATCH] drm/amdgpu: Set snoop bit for SDMA for MI series

2025-02-06 Thread Philip Yang
On 2025-02-05 22:07, Kasiviswanathan, Harish wrote: [Public]     From: Yang, Philip S

Re: [PATCH 22/44] drm/amdgpu/vcn: add new per instance callback for powergating

2025-02-06 Thread Boyuan Zhang
On 2025-01-31 11:57, Alex Deucher wrote: This is per instance so add a new function pointer for it. Signed-off-by: Alex Deucher Reviewed-by: Boyuan Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.h | 2 ++ 1 file changed, 2 insertions(+) diff --git a/d

[v1 3/4] drm/amdgpu: bail out when failed to load fw in psp_init_cap_microcode()

2025-02-06 Thread Jiang Liu
In function psp_init_cap_microcode(), it should bail out when failed to load firmware, otherwise it may cause invalid memory access. Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/a

[v1 4/4] drm/amdgpu: simplify invoke of psp_ta_init_shared_buf()

2025-02-06 Thread Jiang Liu
Enhance psp_ta_init_shared_buf() to check whether the shared buffer has already been allocated, and return success if it's allocated. So caller doesn't need to check the initialized flag. Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 53 ++--- 1 file

[v1 0/4] Bugfixes and minor improvements to drm/amdgpu/psp

2025-02-06 Thread Jiang Liu
Fix some bugs in error handling path in psp subsystem: 1) fix possible bugs in error handling path in psp_sw_init() 2) fix a bug in error handling path in psp_init_cap_microcode() 3) reduce duplicated code related to psp_ta_init_shared_buf() Jiang Liu (4): drm/amdgpu: reset psp->cmd to NULL afte

[v1 2/4] drm/amdgpu: enhance error handling of psp_sw_init()

2025-02-06 Thread Jiang Liu
Enhance error handling in function psp_sw_init() by: 1) bail out when failed to allocate memory 2) release allocated resource on error 3) introduce helper function psp_bo_init() Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 84 - 1 file changed, 5

Re: [v1 1/4] drm/amdgpu: avoid buffer overflow attach in smu_sys_set_pp_table()

2025-02-06 Thread Lazar, Lijo
On 2/7/2025 12:14 PM, Jiang Liu wrote: > It malicious user provides a small pptable through sysfs and then > a bigger pptable, it may cause buffer overflow attack in function > smu_sys_set_pp_table(). > > Signed-off-by: Jiang Liu Reviewed-by: Lijo Lazar Thanks, Lijo > --- > driver

RE: [PATCH 2/3] drm/amdgpu: Don't modify grace_period in helper function

2025-02-06 Thread Kasiviswanathan, Harish
[Public] From: Chen, Xiaogang Sent: Thursday, February 6, 2025 4:57 PM To: Sakhnovitch, Elena (Elen) ; amd-gfx@lists.freedesktop.org Cc: Kasiviswanathan, Harish Subject: Re: [PATCH 2/3] drm/amdgpu: Don't modify grace_period in helper function On 1/14/2025 1:52 PM, Elena Sakhnovitch wrote: Fr

[PATCH v3 2/2] drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 for sriov multiple vf.

2025-02-06 Thread Emily Deng
In sriov multiple vf, Set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 to read WPTR from MQD. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 2 +- .../gpu/drm/amd/amdkfd/kfd_mqd_manager_v9.c | 23 +-- 2 files changed, 22 insertions(+), 3 deleti

[PATCH v3 1/2] drm/amdgpu: Add amdgpu_sriov_multi_vf_mode function

2025-02-06 Thread Emily Deng
Use amdgpu_sriov_multi_vf_mode to replace amdgpu_sriov_vf(adev) && !amdgpu_sriov_is_pp_one_vf(adev). Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 2 +- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h| 2 ++ drivers/gpu/drm/amd/pm/amdgpu_pm.c | 4 ++-- driver

[v1 1/4] drm/amdgpu: reset psp->cmd to NULL after releasing the buffer

2025-02-06 Thread Jiang Liu
Reset psp->cmd to NULL after releasing the buffer in function psp_sw_fini(). Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_

[v1 4/4] drm/amdgpu: minor code style enhancement for smu

2025-02-06 Thread Jiang Liu
Minor code style enhancement for smu. Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c| 2 +- drivers/gpu/drm/amd/pm/swsmu/smu13/smu_v13_0_6_ppt.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c b

[v1 0/4] Fix a buffer overflow in drm/amdgpu/smu

2025-02-06 Thread Jiang Liu
Fix several bugs in smu subsystem: 1) a buffer overflow bug in function smu_sys_set_pp_table() 2) tune logic of is_vcn_enabled() 3) enhance handling of gfx_off_entrycount in function smu_suspend() Jiang Liu (4): drm/amdgpu: avoid buffer overflow attach in smu_sys_set_pp_table() drm/amdgpu: acc

[v1 2/4] drm/amdgpu: accumulate gfx_off_entrycount in smu_suspend()

2025-02-06 Thread Jiang Liu
As pwfw resets entrycount when device is suspended, so we should accmulate the gfx_off_entrycount value instead of save the last value of it. Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/g

[v1 1/4] drm/amdgpu: avoid buffer overflow attach in smu_sys_set_pp_table()

2025-02-06 Thread Jiang Liu
It malicious user provides a small pptable through sysfs and then a bigger pptable, it may cause buffer overflow attack in function smu_sys_set_pp_table(). Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a

[v1 3/4] drm/amdgpu: treat VCN as enabled if either VCN or JPEC is enabled

2025-02-06 Thread Jiang Liu
Function is_vcn_enabled() returns false if either the VCN or JPEG ip block is disabled, which sounds unreasonable. It should returns true when either VCN and JPEG is enabled. Signed-off-by: Jiang Liu --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 6 +++--- 1 file changed, 3 insertions(+), 3 del

RE: [PATCH 3/3] drm/amdgpu: Set lower queue retry timeout for gfx9 family

2025-02-06 Thread Russell, Kent
[AMD Official Use Only - AMD Internal Distribution Only] Ping (plus Jay) Kent > -Original Message- > From: amd-gfx On Behalf Of Elena > Sakhnovitch > Sent: Tuesday, January 14, 2025 2:53 PM > To: amd-gfx@lists.freedesktop.org > Cc: Sakhnovitch, Elena (Elen) ; Kasiviswanathan, > Harish

[PATCH] drm/amdkfd: Fix instruction hazard in gfx12 trap handler

2025-02-06 Thread Jay Cornwall
VALU instructions with SGPR source need wait states to avoid hazard with later instructions. Signed-off-by: Jay Cornwall Cc: Lancelot Six --- .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 406 -- .../amd/amdkfd/cwsr_trap_handler_gfx12.asm| 13 + 2 files changed, 392 inser

Re: [PATCH 1/3] drm/amdkfd: Use asic specific fn to configure grace period

2025-02-06 Thread Chen, Xiaogang
On 1/14/2025 1:52 PM, Elena Sakhnovitch wrote: From: Harish Kasiviswanathan Currently, grace period is modified only for gfx943 APU. In the future this might need to be set for other ASICs too. Either ways, asic specific values should be handled by asic specific functions. Signed-off-by: Haris

Re: [PATCH 2/3] drm/amdgpu: Don't modify grace_period in helper function

2025-02-06 Thread Chen, Xiaogang
On 1/14/2025 1:52 PM, Elena Sakhnovitch wrote: From: Harish Kasiviswanathan build_grace_period_packet_info is asic helper function that fetches the correct format. It is the responsibility of the caller to validate the value. but what is hurt to valid it at asic function? each asic may has its

Re: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery

2025-02-06 Thread Lazar, Lijo
On 2/7/2025 5:03 AM, Kim, Jonathan wrote: > [Public] > >> -Original Message- >> From: Lazar, Lijo >> Sent: Thursday, February 6, 2025 8:13 AM >> To: amd-gfx@lists.freedesktop.org; Lazar, Lijo >> Cc: Zhang, Hawking ; Kim, Jonathan >> >> Subject: [PATCH 3/4] drm/amdgpu: Initialize xgmi

Re: [PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info

2025-02-06 Thread Lazar, Lijo
On 2/7/2025 4:56 AM, Kim, Jonathan wrote: > [Public] > >> -Original Message- >> From: Lazar, Lijo >> Sent: Thursday, February 6, 2025 8:13 AM >> To: amd-gfx@lists.freedesktop.org; Lazar, Lijo >> Cc: Zhang, Hawking ; Kim, Jonathan >> >> Subject: [PATCH 2/4] drm/amdgpu: Add xgmi speed/

Re: [PATCH v2] drm/amd: Refactor find_system_memory()

2025-02-06 Thread Felix Kuehling
On 2025-02-06 16:48, Mario Limonciello wrote: > From: Mario Limonciello > > find_system_memory() pulls out two fields from an SMBIOS type 17 > device and sets them on KFD devices. The data offsets are counted > to find interesting data. > > Instead use a struct representation to access the mem

RE: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery

2025-02-06 Thread Kim, Jonathan
[Public] > -Original Message- > From: Lazar, Lijo > Sent: Thursday, February 6, 2025 8:13 AM > To: amd-gfx@lists.freedesktop.org; Lazar, Lijo > Cc: Zhang, Hawking ; Kim, Jonathan > > Subject: [PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery > > Initialize xgmi related stati

Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-06 Thread Lazar, Lijo
On 2/6/2025 10:18 PM, Eric Huang wrote: > I understand your concern. KFD currently only reports one L2 instance, > but not every L2 instance. If customers want to have more detail in all > available L2 info, we probably can change the logic in this function, > but it is not related to my change.

Re: [PATCH 3/3] drm/amdgpu: Set lower queue retry timeout for gfx9 family

2025-02-06 Thread Jay Cornwall
On 2/6/2025 16:27, Russell, Kent wrote: [AMD Official Use Only - AMD Internal Distribution Only] Ping (plus Jay) Sorry, I'd need the whole patch chain to review. As a general comment CP_IQ_WAIT_TIME2.QUE_SLEEP is tangential to SCH_WAVE. I'm not sure it's useful to tie these together. SCH_

Re: [PATCH v12 2/2] drm/amdgpu: Enable async flip on overlay planes

2025-02-06 Thread Harry Wentland
On 2025-01-27 14:59, André Almeida wrote: > amdgpu can handle async flips on overlay planes, so allow it for atomic > async checks. > > Signed-off-by: André Almeida Reviewed-by: Harry Wentland Harry > --- > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 10 ++ > 1 file

[PATCH v2] drm/amd: Refactor find_system_memory()

2025-02-06 Thread Mario Limonciello
From: Mario Limonciello find_system_memory() pulls out two fields from an SMBIOS type 17 device and sets them on KFD devices. The data offsets are counted to find interesting data. Instead use a struct representation to access the members and pull out the two specific fields. No intended functi

RE: [PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info

2025-02-06 Thread Kim, Jonathan
[Public] > -Original Message- > From: Lazar, Lijo > Sent: Thursday, February 6, 2025 8:13 AM > To: amd-gfx@lists.freedesktop.org; Lazar, Lijo > Cc: Zhang, Hawking ; Kim, Jonathan > > Subject: [PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info > > Add APIs to initialize XGMI speed

Re: [PATCH 05/11] drm/amdgpu/sdma5.2: use amdgpu_gfx_off_ctrl_immediate()

2025-02-06 Thread Alex Deucher
On Thu, Feb 6, 2025 at 10:17 AM Lazar, Lijo wrote: > > > > On 2/4/2025 3:13 AM, Alex Deucher wrote: > > In begin_use/end_use use amdgpu_gfx_off_ctrl_immediate() > > rather than amdgpu_gfx_off_ctrl() as we don't need the > > extra delay before we allow gfxoff again. > > > > Signed-off-by: Alex Deuc

Re: [PATCH 05/11] drm/amdgpu/sdma5.2: use amdgpu_gfx_off_ctrl_immediate()

2025-02-06 Thread Lazar, Lijo
[Public] Specifically, was talking of examples like delayed bo deletes (don't know how they could be in real world). That is an example where buffer deletion is queued up and buffer clearing jobs will start sequentially. With the new sequence, allow will be sent and could immediately be followe

Re: [PATCH 05/11] drm/amdgpu/sdma5.2: use amdgpu_gfx_off_ctrl_immediate()

2025-02-06 Thread Alex Deucher
On Thu, Feb 6, 2025 at 10:36 AM Lazar, Lijo wrote: > > [Public] > > > Specifically, was talking of examples like delayed bo deletes (don't know how > they could be in real world). That is an example where buffer deletion is > queued up and buffer clearing jobs will start sequentially. With the n

[PATCH 2/4] drm/amdgpu: Add xgmi speed/width related info

2025-02-06 Thread Lijo Lazar
Add APIs to initialize XGMI speed, width details and get to max bandwidth supported. It is assumed that a device only supports same generation of XGMI links with uniform width. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 41 drivers/gpu/drm/a

[PATCH 1/4] drm/amdgpu: Move xgmi definitions to xgmi header

2025-02-06 Thread Lijo Lazar
Move definitions related to xgmi to amdgpu_xgmi header Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 23 +--- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 8 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h | 35 +--- 3 files changed, 34 i

[PATCH 4/4] drm/amdgpu: Use xgmi APIs to get bandwidth

2025-02-06 Thread Lijo Lazar
Use xgmi API to get max bandwidth details. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c index 2c1b38c5cfc

[PATCH 3/4] drm/amdgpu: Initialize xgmi info during discovery

2025-02-06 Thread Lijo Lazar
Initialize xgmi related static information during discovery. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 20 +-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c b/drivers/gpu/drm/amd/a

Re: [PATCH 1/4] drm/scheduler: Add drm_sched_cancel_all_jobs helper

2025-02-06 Thread Christian König
Am 06.02.25 um 14:53 schrieb Tvrtko Ursulin: On 06/02/2025 13:46, Christian König wrote: Am 06.02.25 um 14:35 schrieb Philipp Stanner: On Wed, 2025-02-05 at 15:33 +, Tvrtko Ursulin wrote: The helper copies code from the existing amdgpu_job_stop_all_jobs_on_sched with the purpose of reduci

Re: [PATCH 1/4] drm/scheduler: Add drm_sched_cancel_all_jobs helper

2025-02-06 Thread Tvrtko Ursulin
On 06/02/2025 13:35, Philipp Stanner wrote: On Wed, 2025-02-05 at 15:33 +, Tvrtko Ursulin wrote: The helper copies code from the existing amdgpu_job_stop_all_jobs_on_sched with the purpose of reducing the amount of driver code which directly touch scheduler internals. If or when amdgpu ma

Re: [PATCH 1/4] drm/scheduler: Add drm_sched_cancel_all_jobs helper

2025-02-06 Thread Philipp Stanner
On Thu, 2025-02-06 at 14:46 +0100, Christian König wrote: > Am 06.02.25 um 14:35 schrieb Philipp Stanner: > > On Wed, 2025-02-05 at 15:33 +, Tvrtko Ursulin wrote: > > > The helper copies code from the existing > > > amdgpu_job_stop_all_jobs_on_sched > > > with the purpose of reducing the amount

Re: [PATCH 1/4] drm/scheduler: Add drm_sched_cancel_all_jobs helper

2025-02-06 Thread Philipp Stanner
On Thu, 2025-02-06 at 13:53 +, Tvrtko Ursulin wrote: > > On 06/02/2025 13:46, Christian König wrote: > > Am 06.02.25 um 14:35 schrieb Philipp Stanner: > > > On Wed, 2025-02-05 at 15:33 +, Tvrtko Ursulin wrote: > > > > The helper copies code from the existing > > > > amdgpu_job_stop_all_job

Re: [PATCH 1/4] drm/scheduler: Add drm_sched_cancel_all_jobs helper

2025-02-06 Thread Philipp Stanner
On Wed, 2025-02-05 at 15:33 +, Tvrtko Ursulin wrote: > The helper copies code from the existing > amdgpu_job_stop_all_jobs_on_sched > with the purpose of reducing the amount of driver code which directly > touch scheduler internals. > > If or when amdgpu manages to change the approach for hand

Re: [PATCH 3/4] drm/sched: Add internal job peek/pop API

2025-02-06 Thread Philipp Stanner
On Wed, 2025-02-05 at 15:33 +, Tvrtko Ursulin wrote: > Idea is to add helpers for peeking and poppling jobs from entities s/poppling/popping > with > the goal of decoupling the hidden assumption in the code that > queue_node > is the first element in struct drm_sched_job. > > That assumption

Re: [PATCH 1/4] drm/scheduler: Add drm_sched_cancel_all_jobs helper

2025-02-06 Thread Tvrtko Ursulin
On 06/02/2025 13:46, Christian König wrote: Am 06.02.25 um 14:35 schrieb Philipp Stanner: On Wed, 2025-02-05 at 15:33 +, Tvrtko Ursulin wrote: The helper copies code from the existing amdgpu_job_stop_all_jobs_on_sched with the purpose of reducing the amount of driver code which directly t

Re: [PATCH 1/3] drm/sched: Add internal job peek/pop API

2025-02-06 Thread Danilo Krummrich
On Thu, Feb 06, 2025 at 04:40:29PM +, Tvrtko Ursulin wrote: > Idea is to add helpers for peeking and popping jobs from entities with > the goal of decoupling the hidden assumption in the code that queue_node > is the first element in struct drm_sched_job. > > That assumption usually comes in t

Re: [PATCH v3 0/3] drm/sched: Job queue peek/pop helpers and struct job re-order

2025-02-06 Thread Danilo Krummrich
On 2/6/25 5:40 PM, Tvrtko Ursulin wrote: Lets add some helpers for peeking and popping from the job queue which allows us to re-order the fields in struct drm_sched_job and remove one hole. I think you forgot to add the dri-devel list. Can't fetch patches with b4. :( v2: * Add header file

Re: [PATCH 3/3] drm/sched: Remove a hole from struct drm_sched_job

2025-02-06 Thread Danilo Krummrich
On Thu, Feb 06, 2025 at 04:40:31PM +, Tvrtko Ursulin wrote: > We can re-order some struct members and take u32 credits outside of the > pointer sandwich and also for the last_dependency member we can get away > with an unsigned int since for dependency we use xa_limit_32b. > > Pahole report be

Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-06 Thread Lazar, Lijo
[Public] Yes, the problem is that. If a node has 2 XCCs, it should report the L2 of each separately with the number of CUs sharing each L2. In this, it appears to loop through and find the first non-zero of all XCCs of a node and not based on the first non-zero per XCC basis. It makes a differe

Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-06 Thread Eric Huang
On 2025-02-06 10:14, Lazar, Lijo wrote: On 1/29/2025 8:50 PM, Eric Huang wrote: In some ASICs L2 cache info may miss in kfd topology, because the first bitmap may be empty, that means the first cu may be inactive, so to find the first active cu will solve the issue. Signed-off-by: Eric Huang

Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-06 Thread Eric Huang
I understand your concern. KFD currently only reports one L2 instance, but not every L2 instance. If customers want to have more detail in all available L2 info, we probably can change the logic in this function, but it is not related to my change. My change is based on current kfd logic and fi

Re: [PATCH 21/44] drm/amdgpu/vcn: adjust pause_dpg_mode function signature

2025-02-06 Thread Boyuan Zhang
On 2025-01-31 11:57, Alex Deucher wrote: Change it to take a vcn instance rather than adev to align with the vcn instance changes. TODO: clean up the function internals to use the vinst state directly rather than accessing it indirectly via adev->vcn.inst[]. Signed-off-by: Alex Deucher Revi

Re: [PATCH 23/44] drm/amdgpu/vcn1.0: add set_pg_state callback

2025-02-06 Thread Boyuan Zhang
On 2025-01-31 11:57, Alex Deucher wrote: Rework the code as a vcn instance callback. Signed-off-by: Alex Deucher Patches 23-32 are Reviewed-by: Boyuan Zhang --- drivers/gpu/drm/amd/amdgpu/vcn_v1_0.c | 32 ++- 1 file changed, 22

Re: [PATCH 1/4] drm/scheduler: Add drm_sched_cancel_all_jobs helper

2025-02-06 Thread Christian König
Am 06.02.25 um 14:35 schrieb Philipp Stanner: On Wed, 2025-02-05 at 15:33 +, Tvrtko Ursulin wrote: The helper copies code from the existing amdgpu_job_stop_all_jobs_on_sched with the purpose of reducing the amount of driver code which directly touch scheduler internals. If or when amdgpu ma

Re: [PATCH 1/4] drm/scheduler: Add drm_sched_cancel_all_jobs helper

2025-02-06 Thread Danilo Krummrich
On Thu, Feb 06, 2025 at 02:46:40PM +0100, Christian König wrote: > Am 06.02.25 um 14:35 schrieb Philipp Stanner: > > On Wed, 2025-02-05 at 15:33 +, Tvrtko Ursulin wrote: > > > The helper copies code from the existing > > > amdgpu_job_stop_all_jobs_on_sched > > > with the purpose of reducing the

RE: [PATCH 1/4] drm/scheduler: Add drm_sched_cancel_all_jobs helper

2025-02-06 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only] I agree with the overall approach and support Chris's suggestion. The function amdgpu_job_stop_all_jobs_on_sched is now only applicable to a few older AMD hardware models when they encounter uncorrectable hardware errors. We have discont

Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-06 Thread Alex Deucher
Acked-by: Alex Deucher On Wed, Jan 29, 2025 at 10:37 AM Eric Huang wrote: > > In some ASICs L2 cache info may miss in kfd topology, > because the first bitmap may be empty, that means > the first cu may be inactive, so to find the first > active cu will solve the issue. > > Signed-off-by: Eric H

Re: [PATCH] drm/amdkfd: fix missing L2 cache info in topology

2025-02-06 Thread Lazar, Lijo
On 1/29/2025 8:50 PM, Eric Huang wrote: > In some ASICs L2 cache info may miss in kfd topology, > because the first bitmap may be empty, that means > the first cu may be inactive, so to find the first > active cu will solve the issue. > > Signed-off-by: Eric Huang > --- > drivers/gpu/drm/amd/