Re: [PATCH v2] drm/amdkfd: Add kfd driver function to support hot plug/unplug amdgpu devices

2024-10-18 Thread Chen, Xiaogang
On 10/18/2024 5:09 PM, Felix Kuehling wrote: On 2024-10-18 17:31, Chen, Xiaogang wrote: On 10/18/2024 12:57 PM, Felix Kuehling wrote: On 2024-10-18 10:09, Chen, Xiaogang wrote: On 10/17/2024 4:04 PM, Felix Kuehling wrote: On 2024-10-15 17:21, Xiaogang.Chen wrote: From: Xiaogang Chen

Re: [PATCH v2] drm/amdkfd: Add kfd driver function to support hot plug/unplug amdgpu devices

2024-10-18 Thread Chen, Xiaogang
On 10/18/2024 5:07 PM, Felix Kuehling wrote: On 2024-10-18 17:31, Chen, Xiaogang wrote: On 10/18/2024 12:57 PM, Felix Kuehling wrote: On 2024-10-18 10:09, Chen, Xiaogang wrote: On 10/17/2024 4:04 PM, Felix Kuehling wrote: On 2024-10-15 17:21, Xiaogang.Chen wrote: From: Xiaogang Chen

Re: [PATCH] amdgpu: Don't print L2 status if there's nothing to print

2024-10-18 Thread Felix Kuehling
On 2024-10-18 16:21, Kent Russell wrote: If a 2nd fault comes in before the 1st is handled, the 1st fault will clear out the FAULT STATUS registers before the 2nd fault is handled. Thus we get a lot of zeroes. If status=0, just skip the L2 fault status information, to avoid confusion of why som

Re: [PATCH v2] drm/amdkfd: Add kfd driver function to support hot plug/unplug amdgpu devices

2024-10-18 Thread Felix Kuehling
On 2024-10-18 17:31, Chen, Xiaogang wrote: On 10/18/2024 12:57 PM, Felix Kuehling wrote: On 2024-10-18 10:09, Chen, Xiaogang wrote: On 10/17/2024 4:04 PM, Felix Kuehling wrote: On 2024-10-15 17:21, Xiaogang.Chen wrote: From: Xiaogang Chen The purpose of this patch is having kfd driver

Re: [PATCH v2] drm/amdkfd: Add kfd driver function to support hot plug/unplug amdgpu devices

2024-10-18 Thread Felix Kuehling
On 2024-10-18 17:31, Chen, Xiaogang wrote: On 10/18/2024 12:57 PM, Felix Kuehling wrote: On 2024-10-18 10:09, Chen, Xiaogang wrote: On 10/17/2024 4:04 PM, Felix Kuehling wrote: On 2024-10-15 17:21, Xiaogang.Chen wrote: From: Xiaogang Chen The purpose of this patch is having kfd driver

Re: [PATCH v2] drm/amdkfd: change kfd process kref count at creation

2024-10-18 Thread Chen, Xiaogang
On 10/18/2024 2:14 PM, Felix Kuehling wrote: On 2024-10-11 10:41, Xiaogang.Chen wrote: From: Xiaogang Chen kfd process kref count(process->ref) is initialized to 1 by kref_init. After it is created not need to increaes its kref. Instad add kfd process kref at kfd process mmu notifier allo

Re: [PATCH] drm/amdkfd: fix the hang caused by the write reorder to fence_addr

2024-10-18 Thread Philip Yang
On 2024-10-18 14:28, Felix Kuehling wrote: On 2024-10-17 04:34, Victor Zhao wrote: make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status called, to avoid qcm fence timeout caused by incorrect ord

Re: [PATCH v2] drm/amdkfd: Add kfd driver function to support hot plug/unplug amdgpu devices

2024-10-18 Thread Chen, Xiaogang
On 10/18/2024 12:57 PM, Felix Kuehling wrote: On 2024-10-18 10:09, Chen, Xiaogang wrote: On 10/17/2024 4:04 PM, Felix Kuehling wrote: On 2024-10-15 17:21, Xiaogang.Chen wrote: From: Xiaogang Chen The purpose of this patch is having kfd driver function as expected during AMD gpu device

[PATCH] amdgpu: Don't print L2 status if there's nothing to print

2024-10-18 Thread Kent Russell
If a 2nd fault comes in before the 1st is handled, the 1st fault will clear out the FAULT STATUS registers before the 2nd fault is handled. Thus we get a lot of zeroes. If status=0, just skip the L2 fault status information, to avoid confusion of why some VM fault status prints in dmesg are all zer

Re: [PATCH v6 42/44] drm/colorop: Add 3D LUT supports to color pipeline

2024-10-18 Thread Alex Hung
On 10/13/24 09:58, Simon Ser wrote: On Thursday, October 3rd, 2024 at 22:01, Harry Wentland wrote: From: Alex Hung It is to be used to enable HDR by allowing userpace to create and pass 3D LUTs to kernel and hardware. 1. new drm_colorop_type: DRM_COLOROP_3D_LUT. 2. 3D LUT modes define h

[PATCH] drm/amdgpu: handle default profile on GC 9.4.1

2024-10-18 Thread Alex Deucher
It does not support fullscreen 3D. Fixes: 336568de918e ("drm/amdgpu/swsmu: default to fullscreen 3D profile for dGPUs") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/pm/swsmu/am

Re: [PATCH v2] drm/amdkfd: change kfd process kref count at creation

2024-10-18 Thread Felix Kuehling
On 2024-10-11 10:41, Xiaogang.Chen wrote: From: Xiaogang Chen kfd process kref count(process->ref) is initialized to 1 by kref_init. After it is created not need to increaes its kref. Instad add kfd process kref at kfd process mmu notifier allocation since we decrease the ref at free_notifier

Re: [PATCH] drm/amdgpu: enable userqueue support for GFX12

2024-10-18 Thread Alex Deucher
On Tue, Oct 15, 2024 at 12:37 PM Sharma, Shashank wrote: > > > On 15/10/2024 16:58, Alex Deucher wrote: > > On Tue, Oct 15, 2024 at 6:13 AM Sharma, Shashank > > wrote: > >> Hello Alex, > >> > >> On 14/10/2024 22:29, Deucher, Alexander wrote: > >> > >> [AMD Official Use Only - AMD Internal Distrib

Re: [PATCH] drm/amdgpu: Use SPX as default in partition config

2024-10-18 Thread Felix Kuehling
On 2024-10-14 05:19, Lijo Lazar wrote: In certain cases - ex: when a reset is required on initialization - XCP manager won't have a valid partition mode. In such cases, use SPX as the default selected mode for which partition configuration details are populated. Signed-off-by: Lijo Lazar Repo

RE: [PATCH] amdgpu: Don't print L2 status if there's nothing to print

2024-10-18 Thread Russell, Kent
[Public] > -Original Message- > From: Kuehling, Felix > Sent: Friday, October 18, 2024 2:43 PM > To: Russell, Kent ; amd-gfx@lists.freedesktop.org > Cc: Cornwall, Jay > Subject: Re: [PATCH] amdgpu: Don't print L2 status if there's nothing to print > > > On 2024-10-18 11:12, Kent Russell

Re: [PATCH] amdgpu: Don't print L2 status if there's nothing to print

2024-10-18 Thread Felix Kuehling
On 2024-10-18 11:12, Kent Russell wrote: If a 2nd fault comes in before the 1st is handled, the 1st fault will clear out the FAULT STATUS registers before the 2nd fault is handled. Thus we get a lot of zeroes. If status=0, just skip the L2 fault status information, to avoid confusion of why som

Re: [PATCH] drm/amdkfd: fix the hang caused by the write reorder to fence_addr

2024-10-18 Thread Felix Kuehling
On 2024-10-17 04:34, Victor Zhao wrote: make sure KFD_FENCE_INIT write to fence_addr before pm_send_query_status called, to avoid qcm fence timeout caused by incorrect ordering. Signed-off-by: Victor Zhao --- drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 1 + drivers/gpu/drm/amd/

Re: [PATCH v2] drm/amdkfd: Add kfd driver function to support hot plug/unplug amdgpu devices

2024-10-18 Thread Felix Kuehling
On 2024-10-18 10:09, Chen, Xiaogang wrote: On 10/17/2024 4:04 PM, Felix Kuehling wrote: On 2024-10-15 17:21, Xiaogang.Chen wrote: From: Xiaogang Chen The purpose of this patch is having kfd driver function as expected during AMD gpu device plug/unplug. When an AMD gpu device got unplug

Re: [PATCH v11 06/28] drm/amdgpu: create context space for usermode queue

2024-10-18 Thread Alex Deucher
On Mon, Sep 9, 2024 at 4:07 PM Shashank Sharma wrote: > > The MES FW expects us to allocate at least one page as context > space to process gang and process related context data. This > patch creates a joint object for the same, and calculates GPU > space offsets of these spaces. > > V1: Addressed

[PATCH v3] drm/amdkfd: Add kfd driver function to support hot plug/unplug amdgpu devices

2024-10-18 Thread Xiaogang . Chen
From: Xiaogang Chen The purpose of this patch is having kfd driver function as expected during AMD gpu device plug/unplug. When an AMD gpu device got unplug kfd driver stops all queues from this device. If there are user processes still ref the render node this device is marked as invalid. kfd d

Re: [PATCH v6 0/4] drm: Minimum backlight overrides and implementation for amdgpu

2024-10-18 Thread Alex Deucher
On Wed, Oct 16, 2024 at 1:47 PM Harry Wentland wrote: > > > > On 2024-09-16 14:23, Thomas Weißschuh wrote: > > Hi Harry, Leo and other amdgpu maintainers, > > > > On 2024-08-24 20:33:53+, Thomas Weißschuh wrote: > >> The value of "min_input_signal" returned from ATIF on a Framework AMD 13 > >>

Re: [PATCH] drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling

2024-10-18 Thread Philip Yang
It is safe to access dqm->sched status inside dqm_lock, no race with gpu reset. Reviewed-by: Philip Yang On 2024-10-18 11:10, Shaoyun Liu wrote: From: shaoyunl Add back kfd queues in start scheduling that originally been removed on stop scheduling. Sig

RE: [PATCH v2] drm/amdgpu: Save VCN shared memory with init reset

2024-10-18 Thread Liu, Leo
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Leo Liu > -Original Message- > From: amd-gfx On Behalf Of Lijo > Lazar > Sent: October 18, 2024 2:41 AM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Deucher, Alexander > ; Bhardwaj, Rajneesh > ; Errabolu

Re: [PATCH] drm/amdgpu: add ring reset messages

2024-10-18 Thread Alex Deucher
Ping? On Tue, Oct 15, 2024 at 2:28 PM Alex Deucher wrote: > > Add messages to make it clear when a per ring reset > happens. This is helpful for debugging and aligns with > other reset methods. > > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 3 +++ > 1 file ch

[PATCH] drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling

2024-10-18 Thread Shaoyun Liu
From: shaoyunl Add back kfd queues in start scheduling that originally been removed on stop scheduling. Signed-off-by: Shaoyun Liu Reviewed-by: Felix Kuehling --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 40 +-- 1 file changed, 37 insertions(+), 3 deletions(-) diff --g

Re: [PATCH] drm/amdkfd: fix the hang caused by the write reorder to fence_addr

2024-10-18 Thread Philip Yang
On 2024-10-18 01:31, Zhao, Victor wrote: [AMD Official Use Only - AMD Internal Distribution Only] [AMD Official Use Only - AMD Internal Distribution Only] Ping. Please help review. Thanks, Victor -Original Message- From: Victor Zhao Sent: Thursda

[PATCH] amdgpu: Don't print L2 status if there's nothing to print

2024-10-18 Thread Kent Russell
If a 2nd fault comes in before the 1st is handled, the 1st fault will clear out the FAULT STATUS registers before the 2nd fault is handled. Thus we get a lot of zeroes. If status=0, just skip the L2 fault status information, to avoid confusion of why some VM fault status prints in dmesg are all zer

Re: [PATCH] drm/amdgpu: refine error handling in amdgpu_ttm_tt_pin_userptr

2024-10-18 Thread Alex Deucher
On Fri, Oct 18, 2024 at 5:46 AM Lang Yu wrote: > > Free sg table when dma_map_sgtable() failed to avoid memory leak. > > Signed-off-by: Lang Yu Reviewed-by: Alex Deucher > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a

RE: [PATCH] drm/amd/amdkfd: add/remove kfd queues on start/stop KFD scheduling

2024-10-18 Thread Liu, Shaoyun
[AMD Official Use Only - AMD Internal Distribution Only] Good catch . Thanks . I will sent out another review for that . Regards Shaoyun.liu From: Yang, Philip Sent: Thursday, October 17, 2024 3:47 PM To: Liu, Shaoyun ; amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amd/amdkfd: add/re

RE: [PATCH v3] drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.2

2024-10-18 Thread Deucher, Alexander
[Public] > -Original Message- > From: SHANMUGAM, SRINIVASAN > Sent: Thursday, October 17, 2024 9:56 PM > To: Koenig, Christian ; Deucher, Alexander > > Cc: amd-gfx@lists.freedesktop.org; SHANMUGAM, SRINIVASAN > > Subject: [PATCH v3] drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.2 > > T

Re: [PATCH v2] drm/amdkfd: Add kfd driver function to support hot plug/unplug amdgpu devices

2024-10-18 Thread Chen, Xiaogang
On 10/17/2024 4:04 PM, Felix Kuehling wrote: On 2024-10-15 17:21, Xiaogang.Chen wrote: From: Xiaogang Chen The purpose of this patch is having kfd driver function as expected during AMD gpu device plug/unplug. When an AMD gpu device got unplug kfd driver stops all queues from this devic

Re: [PATCH v6 00/12] validate/clean the functions of ip funcs

2024-10-18 Thread Khatri, Sunil
On 10/18/2024 7:08 PM, Christian König wrote: Patches #2, #3 and #12 are Acked-by: Christian König The rest are Reviewed-by: Christian König Maybe give others till Monday to take a look as well, could be that Alex, Lijo or somebody else point out that we are ignoring the suspend return c

Re: [PATCH] drm/amdgpu: Add gpu_addr support to seq64 allocation

2024-10-18 Thread Christian König
Am 18.10.24 um 15:26 schrieb Arunpravin Paneer Selvam: Add gpu address support to seq64 alloc function. Looks good to me, but when adding interfaces you should probably have the user of this in the same patch set. Regards, Christian. Signed-off-by: Arunpravin Paneer Selvam --- drivers/

Re: [PATCH v6 00/12] validate/clean the functions of ip funcs

2024-10-18 Thread Christian König
Patches #2, #3 and #12 are Acked-by: Christian König The rest are Reviewed-by: Christian König Maybe give others till Monday to take a look as well, could be that Alex, Lijo or somebody else point out that we are ignoring the suspend return code during XGMI hive reset for a good reason. I

[PATCH v5 4/4] drm/amdgpu: track bo memory stats at runtime

2024-10-18 Thread Yunxiang Li
Before, every time fdinfo is queried we try to lock all the BOs in the VM and calculate memory usage from scratch. This works okay if the fdinfo is rarely read and the VMs don't have a ton of BOs. If either of these conditions is not true, we get a massive performance hit. In this new revision, we

[PATCH v5 3/4] drm/amdgpu: stop tracking visible memory stats

2024-10-18 Thread Yunxiang Li
Since on modern systems all of vram can be made visible anyways, to simplify the new implementation, drops tracking how much memory is visible for now. If this is really needed we can add it back on top of the new implementation, or just report all the BOs as visible. Signed-off-by: Yunxiang Li -

[PATCH v5 2/4] drm/amdgpu: make drm-memory-* report resident memory

2024-10-18 Thread Yunxiang Li
The old behavior reports the resident memory usage for this key and the documentation say so as well. However this was accidentally changed to include buffers that was evicted. Fixes: a2529f67e2ed ("drm/amdgpu: Use drm_print_memory_stats helper from fdinfo") Signed-off-by: Yunxiang Li --- drive

[PATCH v5 1/4] drm/amdgpu: remove unused function parameter

2024-10-18 Thread Yunxiang Li
amdgpu_vm_bo_invalidate doesn't use the adev parameter and not all callers have a reference to adev handy, so remove it for cleanliness. Signed-off-by: Yunxiang Li --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_dma_buf.c | 2 +- drivers/gpu/drm/amd/am

[PATCH v5 0/4] rework bo mem stats tracking

2024-10-18 Thread Yunxiang Li
Right now every time the fdinfo is read, we go through the vm lists and lock all the BOs to calcuate the statistics. This causes a lot of lock contention when the VM is actively used. It gets worse if there is a lot of shared BOs or if there's a lot of submissions. We have seen submissions lock-up

[PATCH] drm/amdgpu: Add gpu_addr support to seq64 allocation

2024-10-18 Thread Arunpravin Paneer Selvam
Add gpu address support to seq64 alloc function. Signed-off-by: Arunpravin Paneer Selvam --- drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.c | 10 -- drivers/gpu/drm/amd/amdgpu/amdgpu_seq64.h | 3 ++- 2 files changed, 10 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/a

[PATCH v6 05/12] drm/amdgpu: validate resume before function call

2024-10-18 Thread Sunil Khatri
Before making a function call to resume, validate the function pointer like we do in sw_init. Use the helper function amdgpu_ip_block_resume where same checks and calls are repeated. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/aldebaran.c | 13 ++--- drivers/gpu/drm/amd/amdg

[PATCH v6 11/12] drm/amdgpu: Clean the functions pointer set as NULL

2024-10-18 Thread Sunil Khatri
We dont need to set the functions to NULL which arent needed as global structure members are by default set to zero or NULL for pointers. Cc: Leo Liu Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 4 drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c |

[PATCH v6 01/12] drm/amdgpu: validate hw_fini before function call

2024-10-18 Thread Sunil Khatri
Before making a function call to hw_fini, validate the function pointer like we do in sw_init. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 38 +- 1 file changed, 22 insertions(+), 16 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu

[PATCH v6 06/12] drm/amdgpu: validate wait_for_idle before function call

2024-10-18 Thread Sunil Khatri
Before making a function call to wait_for_idle, validate the function pointer like we do in sw_init. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c

[PATCH v6 02/12] drm/amdgpu: return error if phase2 suspend fails

2024-10-18 Thread Sunil Khatri
In function amdgpu_device_ip_suspend_phase2 if suspend call fails for an IP then abort there and return error to caller. A failed functionality of IP is critical and we should not proceed. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 + 1 file changed, 1 insert

[PATCH v6 03/12] drm/amdgpu: return error on suspend failure

2024-10-18 Thread Sunil Khatri
In function amdgpu_reset_xgmi_reset_on_init_suspend if suspend call fails for an IP then abort there and return error to caller. A failed functionality of IP is critical and we should not proceed. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_reset.c | 1 + 1 file changed, 1

[PATCH v6 09/12] drm/amdgpu: clean the dummy wait_for_idle functions

2024-10-18 Thread Sunil Khatri
Remove the dummy wait_for_idle functions for all ip blocks. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 6 -- drivers/gpu/drm/amd/a

[PATCH v6 10/12] drm/amdgpu: clean the dummy soft_reset functions

2024-10-18 Thread Sunil Khatri
Remove the dummy soft_reset functions for all ip blocks. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_acp.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 6 -- drivers/gpu/drm/amd/amdg

[PATCH v6 12/12] drm/amdgpu: clean unused functions of uvd/vcn/vce

2024-10-18 Thread Sunil Khatri
Some of the functions pointers of amdgpu_ip_funcs are not used and are left commented out. Hence this cleans those up which arent used. Cc: Leo Liu Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/uvd_v7_0.c | 274 -- drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 273

[PATCH v6 08/12] drm/amdgpu: clean the dummy suspend functions

2024-10-18 Thread Sunil Khatri
Remove the dummy suspend functions for all ip blocks. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c | 6 -- drivers/gpu/drm/amd/amdgpu/amdgpu_vkms.c | 3 ++- drivers/gpu/drm/amd/amdgpu/cik.c | 6 -- drivers/gpu/drm/amd/amdgpu/si.c | 6 -- 4

[PATCH v6 07/12] drm/amdgpu: clean the dummy resume functions

2024-10-18 Thread Sunil Khatri
Remove the dummy resume functions for all ip blocks. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_isp.c index 9b98b40ac4db..1383fd1644d

[PATCH v6 00/12] validate/clean the functions of ip funcs

2024-10-18 Thread Sunil Khatri
v6: Fixed the missing return statement on suspend and update the code with V5 comments. v5: Fixed review comments. Dropped hw_fini patch and need to look further why such functions exists. hw_init/hw_fini are mandatory functions and we should have a valid definition. v4: hw_init/hw_fi

[PATCH v6 04/12] drm/amdgpu: validate suspend before function call

2024-10-18 Thread Sunil Khatri
Before making a function call to suspend, validate the function pointer like we do in sw_init. Use the helper function amdgpu_ip_block_suspend where same checks and calls are repeated. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/aldebaran.c | 11 ++ drivers/gpu/drm/amd/a

Re: [PATCH] drm/amdgpu: add the command AMDGPU_INFO_QUEUE_RESET to query queue reset

2024-10-18 Thread Alex Deucher
On Fri, Oct 18, 2024 at 7:19 AM Christian König wrote: > > Am 18.10.24 um 11:33 schrieb Zhang, Jesse(Jie): > > [AMD Official Use Only - AMD Internal Distribution Only] > > > > Hi Christian, > > > > -Original Message- > > From: Koenig, Christian > > Sent: Friday, October 18, 2024 4:47 PM >

Re: [PATCH v7 1/5] drm: Introduce device wedged event

2024-10-18 Thread Christian König
Am 18.10.24 um 14:46 schrieb Raag Jadav: As far as I can see this makes the enum how to recover the device superfluous because you will most likely always need a bus reset to get out of this again. That depends on the kind of fault the device has encountered and the bus it is sitting on. There c

Re: [PATCH v5 02/12] drm/amdgpu: add helper function amdgpu_ip_block_suspend

2024-10-18 Thread Khatri, Sunil
On 10/18/2024 4:40 PM, Christian König wrote: Am 17.10.24 um 18:25 schrieb Sunil Khatri: Use the helper function amdgpu_ip_block_suspend where same checks and calls are repeated. I strongly suggest to squash this patch and the next one together. Sure. Noted Signed-off-by: Sunil Khatri ---

Re: [PATCH] drm/amdgpu: add the command AMDGPU_INFO_QUEUE_RESET to query queue reset

2024-10-18 Thread Christian König
Am 18.10.24 um 11:33 schrieb Zhang, Jesse(Jie): [AMD Official Use Only - AMD Internal Distribution Only] Hi Christian, -Original Message- From: Koenig, Christian Sent: Friday, October 18, 2024 4:47 PM To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject

Re: [PATCH v5 00/12] validate/clean the functions of ip funcs

2024-10-18 Thread Christian König
Am 17.10.24 um 18:25 schrieb Sunil Khatri: v5: Fixed review comments. Dropped hw_fini patch and need to look further why such functions exists. hw_init/hw_fini are mandatory functions and we should have a valid definition. v4: hw_init/hw_fini functions are mandatory and raise error mes

Re: [PATCH v5 04/12] drm/amdgpu: add helper function amdgpu_ip_block_resume

2024-10-18 Thread Christian König
Am 17.10.24 um 18:25 schrieb Sunil Khatri: Use the helper function amdgpu_ip_block_resume where same checks and calls are repeated. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 2 ++ drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 17 + 2 files

Re: [PATCH v5 02/12] drm/amdgpu: add helper function amdgpu_ip_block_suspend

2024-10-18 Thread Christian König
Am 17.10.24 um 18:25 schrieb Sunil Khatri: Use the helper function amdgpu_ip_block_suspend where same checks and calls are repeated. I strongly suggest to squash this patch and the next one together. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/amdgpu.h| 1 + driver

Re: [PATCH v7 1/5] drm: Introduce device wedged event

2024-10-18 Thread Christian König
Am 17.10.24 um 18:43 schrieb Rodrigo Vivi: On Thu, Oct 17, 2024 at 09:59:10AM +0200, Christian König wrote: Purpose of this implementation is to provide drivers a generic way to recover with the help of userspace intervention. Different drivers may have different ideas of a "wedged device" depen

[PATCH] drm/amdgpu: refine error handling in amdgpu_ttm_tt_pin_userptr

2024-10-18 Thread Lang Yu
Free sg table when dma_map_sgtable() failed to avoid memory leak. Signed-off-by: Lang Yu --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c index 74a

RE: [PATCH] drm/amdgpu: add the command AMDGPU_INFO_QUEUE_RESET to query queue reset

2024-10-18 Thread Zhang, Jesse(Jie)
[AMD Official Use Only - AMD Internal Distribution Only] Hi Christian, -Original Message- From: Koenig, Christian Sent: Friday, October 18, 2024 4:47 PM To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: Re: [PATCH] drm/amdgpu: add the command AMDGPU_I

[PATCH] drm/amdgpu: Add nps_mode in RAS init_flag

2024-10-18 Thread Candice Li
Add nps_mode in RAS init_flag. Signed-off-by: Candice Li Reviewed-by: Tao Zhou --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 +++ drivers/gpu/drm/amd/amdgpu/ta_ras_if.h | 9 + 2 files changed, 12 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c b/drivers/gpu/drm/a

Re: [PATCH] drm/amdgpu: add the command AMDGPU_INFO_QUEUE_RESET to query queue reset

2024-10-18 Thread Christian König
Am 18.10.24 um 10:19 schrieb jesse.zh...@amd.com: Not all ASICs support the queue reset feature. Therefore, userspace can query this feature via AMDGPU_INFO_QUEUE_RESET before validating a queue reset. Why would UMDs need that information? Signed-off-by: Jesse Zhang --- drivers/gpu/drm/am

[PATCH] drm/amdgpu: add the command AMDGPU_INFO_QUEUE_RESET to query queue reset

2024-10-18 Thread jesse.zh...@amd.com
Not all ASICs support the queue reset feature. Therefore, userspace can query this feature via AMDGPU_INFO_QUEUE_RESET before validating a queue reset. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 27 + include/uapi/drm/amdgpu_drm.h |

[PATCH] drm/amdkfd: Workaround fix the multi-VF doorbell corruption issue

2024-10-18 Thread Samuel Zhang
In MI300 series, doorbell will get corrupted in mutil-VF scenario. This is a HW bug, see DEGGIGX90-5071 and SWDEV-480706 for details. The fix is set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 in multi-VF mode. Signed-off-by: Samuel Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c |

[PATCH next] drm/amdgpu: Fix a double lock bug

2024-10-18 Thread Dan Carpenter
This was supposed to be an unlock instead of a lock. The original code will lead to a deadlock. Fixes: ee52489d1210 ("drm/amdgpu: Place NPS mode request on unload") Signed-off-by: Dan Carpenter --- >From static analysis, not testing. --- drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 +- 1 file c

6.6.57 has a new WARNING: amdgpu/../display/dc/dcn30/dcn30_dpp.c:501 dpp3_deferred_update+0x106/0x330 [amdgpu

2024-10-18 Thread Toralf Förster
[ 22.120385] [ cut here ] [ 22.120389] WARNING: CPU: 13 PID: 11 at drivers/gpu/drm/amd/amdgpu/../display/dc/dcn30/dcn30_dpp.c:501 dpp3_deferred_update+0x106/0x330 [amdgpu] [ 22.120484] Modules linked in: fuse michael_mic hid_jabra ip6table_filter ip6_tables xt_LOG nf_

[BUG] drm/amd/display: possible null-pointer dereference or redundant null check in amdgpu_dm.c

2024-10-18 Thread Tuo Li
Hello, Our static analysis tool has identified a potential null-pointer dereference or redundant null check related to the wait-completion synchronization mechanism in amdgpu_dm.c in Linux 6.11. Consider the following execution scenario: dmub_aux_setconfig_callback() //731 if (adev->d