RE: [PATCH] drm/amdgpu/userq: move the header to amdgpu directory

2025-03-04 Thread Liang, Prike
[Public] Reviewed-by: Prike Liang Regards, Prike > -Original Message- > From: amd-gfx On Behalf Of Alex > Deucher > Sent: Saturday, March 1, 2025 3:57 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: [PATCH] drm/amdgpu/userq: move the header to amdgpu d

Re: [PATCH] amdkfd: initialize svm lists at where they are defined

2025-03-04 Thread Felix Kuehling
On 2025-03-04 2:40, Zhu Lingshan wrote: > On 3/4/2025 1:49 PM, Felix Kuehling wrote: >> On 2025-02-21 4:23, Zhu Lingshan wrote: >>> This commit initialized svm lists at where they are >>> defined. This is defensive programing for security >>> and consistency. >>> >>> Initalizing variables ensures

Re: [PATCH] drm/amdkfd: remove unnecessary cpu domain validation

2025-03-04 Thread Felix Kuehling
On 2025-03-04 5:38, Christian König wrote: > > Am 03.03.25 um 22:01 schrieb Felix Kuehling: >> On 2025-03-03 13:48, Christian König wrote: >>> Am 03.03.25 um 19:45 schrieb James Zhu: before move to GTT domain. >>> That might not be unnecessary. We sometimes intentionally move BOs to the >>> C

RE: [PATCH 2/2] drm/amdgpu: Add support for CPERs on virtualization

2025-03-04 Thread Luo, Zhigang
[AMD Official Use Only - AMD Internal Distribution Only] The series is: Reviewed-by: Zhigang Luo > -Original Message- > From: Yi, Tony > Sent: Thursday, February 27, 2025 10:12 AM > To: Yi, Tony ; Skvortsov, Victor ; > amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Luo, > Zhigang > C

[PATCH] drm/amdgpu: add initial documentation for debugfs files

2025-03-04 Thread Alex Deucher
Describes what debugfs files are available and what they are used for. v2: fix some typos (Mark Glines) Signed-off-by: Alex Deucher --- Documentation/gpu/amdgpu/debugfs.rst | 202 +++ Documentation/gpu/amdgpu/index.rst | 1 + 2 files changed, 203 insertions(+) creat

[PATCH] drm/amdgpu/vcn: fix idle work handler for VCN 2.5

2025-03-04 Thread Alex Deucher
VCN 2.5 uses the PG callback to enable VCN DPM which is a global state. As such, we need to make sure all instances are in the same state. Use amdgpu_device_ip_set_powergating_state() rather than the per instance set_pg_state() callback. Fixes: 4ce4fe27205c ("drm/amdgpu/vcn: use per instance cal

Re: [PATCH] drm/amdgpu/vcn: fix idle work handler for VCN 2.5

2025-03-04 Thread Lazar, Lijo
On 3/4/2025 7:50 PM, Alex Deucher wrote: > VCN 2.5 uses the PG callback to enable VCN DPM which is > a global state. As such, we need to make sure all instances > are in the same state. Use amdgpu_device_ip_set_powergating_state() > rather than the per instance set_pg_state() callback. > > Fi

Re: [PATCH] drm/amdgpu: Fix missing drain retry fault the last entry

2025-03-04 Thread Philip Yang
On 2025-03-03 19:44, Deng, Emily wrote: [AMD Official Use Only - AMD Internal Distribution Only] [AMD Official Use Only - AMD Internal Distribution Only] Ping.. Emily Deng Best Wishes -Original Message- From: Emily Deng Sent: Monday, March 3, 2025 5:35 PM To: amd-gfx@lists.f

RE: [PATCH] drm/amdgpu: Validate return value of pm_runtime_get_sync

2025-03-04 Thread Khatri, Sunil
[AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: Limonciello, Mario Sent: Tuesday, March 4, 2025 11:27 PM To: Khatri, Sunil ; Deucher, Alexander ; Koenig, Christian Cc: amd-gfx@lists.freedesktop.org Subject: Re: [PATCH] drm/amdgpu: Validate return value

Re: [PATCH] drm/amdgpu: add initial documentation for debugfs files

2025-03-04 Thread Mark Glines
Hi Alex, I had a few editing notes. On 3/3/25 6:01 PM, Alex Deucher wrote: > Describes what debugfs files are available and what > they are used for. > > Signed-off-by: Alex Deucher > --- > Documentation/gpu/amdgpu/debugfs.rst | 201 +++ > Documentation/gpu/amdgpu/index

Re: [PATCH] drm/amdgpu: Validate return value of pm_runtime_get_sync

2025-03-04 Thread Mario Limonciello
On 3/4/2025 12:03, Khatri, Sunil wrote: For new code can you please use drm_err() instead of DRM_ERROR()? I see drm_err is not used anywhere in amdgpu driver but display is using it. I think it would be better if I take it in a different patch to change instead. Does that sounds ok ? DRM_ERR

[PATCH v2 1/3] drm/amdgpu: Do not program AGP BAR regs under SRIOV in gfxhub_v1_0.c

2025-03-04 Thread Victor Lu
SRIOV VF does not have write access to AGP BAR regs. Skip the writes to avoid a dmesg warning. Signed-off-by: Victor Lu --- drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c b/drivers/

[PATCH v2 3/3] drm/amdgpu: Do not set power brake sequence for Aldebaran SRIOV

2025-03-04 Thread Victor Lu
Aldebaran SRIOV VF cannot access the power brake feature regs. The accesses can be skipped to avoid a dmesg warning. v2: Remove redundant asic type check Signed-off-by: Victor Lu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/driv

Re: [PATCH] drm/amdkfd: Change error handling at prange update in svm_range_set_attr

2025-03-04 Thread Chen, Xiaogang
On 3/3/2025 11:21 PM, Felix Kuehling wrote: On 2025-01-31 11:58, Xiaogang.Chen wrote: From: Xiaogang Chen When register a vm range at svm the added vm range may be split into multiple subranges and/or existing pranges got spitted. The new pranges need validated and mapped. This patch changes e

[PATCH] drm/amdgpu/vcn: fix idle work handler for VCN 2.5

2025-03-04 Thread Alex Deucher
VCN 2.5 uses the PG callback to enable VCN DPM which is a global state. As such, we need to make sure all instances are in the same state. v2: switch to a ref count (Lijo) Fixes: 4ce4fe27205c ("drm/amdgpu/vcn: use per instance callbacks for idle work handler") Signed-off-by: Alex Deucher ---

Re: [PATCH v2 3/3] drm/amdgpu: Do not set power brake sequence for Aldebaran SRIOV

2025-03-04 Thread Alex Deucher
Series is: Acked-by: Alex Deucher On Tue, Mar 4, 2025 at 11:20 AM Victor Lu wrote: > > Aldebaran SRIOV VF cannot access the power brake feature regs. > The accesses can be skipped to avoid a dmesg warning. > > v2: Remove redundant asic type check > > Signed-off-by: Victor Lu > --- > drivers/gp

Re: [PATCH] drm/amdgpu: add initial documentation for debugfs files

2025-03-04 Thread Rodrigo Siqueira
Hi Alex, I added a few suggestions and questions. On 03/03, Alex Deucher wrote: > Describes what debugfs files are available and what > they are used for. > > Signed-off-by: Alex Deucher > --- > Documentation/gpu/amdgpu/debugfs.rst | 201 +++ > Documentation/gpu/amdgpu/

[PATCH v2 2/3] drm/amdgpu: Do not write to GRBM_CNTL if Aldebaran SRIOV

2025-03-04 Thread Victor Lu
Aldebaran SRIOV VF does not have write permissions to GRBM_CTNL. This access can be skipped to avoid a dmesg warning. v2: Use GC IP version check instead of asic check Signed-off-by: Victor Lu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) d

Re: [PATCH 0/4] drm/amd/display: move from kzalloc(size * nr, ...) to kcalloc(nr, size, ...)

2025-03-04 Thread Rodrigo Siqueira
On 02/27, Ethan Carter Edwards wrote: > We are trying to get rid of all multiplications from allocation > functions to prevent integer overflows. Here the multiplications are > probably safe, but using kcalloc() is more appropriate and improves > readability. It is also safer. This series contains

RE: [PATCH 2/2] drm/amdkfd: flag per-sdma queue reset supported to user space

2025-03-04 Thread Kasiviswanathan, Harish
[Public] This series Reviewed-by: Harish Kasiviswanathan -Original Message- From: Kim, Jonathan Sent: Wednesday, February 26, 2025 3:58 PM To: amd-gfx@lists.freedesktop.org Cc: Kasiviswanathan, Harish ; Kim, Jonathan Subject: [PATCH 2/2] drm/amdkfd: flag per-sdma queue reset supporte

[PATCH V2] drm/amdgpu: validate return value of pm_runtime_get_sync

2025-03-04 Thread Sunil Khatri
An invalid return value 'r' of the pm_runtime_get_sync is r < 0, so fix the return value check and add proper failure log and exit cleanly. Successful refcount in userqueue creation to make sure device remains in active state. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/mes_userq

[PATCH 19/22] drm/amd/display: Add scoped mutexes for amdgpu_dm_dhcp

2025-03-04 Thread Tom Chung
From: Mario Limonciello [Why] Guards automatically release mutex when it goes out of scope making code easier to follow. [How] Replace all use of mutex_lock()/mutex_unlock() with guard(mutex). Reviewed-by: Alex Hung Signed-off-by: Mario Limonciello Signed-off-by: Tom Chung --- .../amd/displ

[PATCH 00/22] DC Patches Mar 10 2025

2025-03-04 Thread Tom Chung
This DC patchset brings improvements in multiple areas. In summary, we highlight: - Fix some Replay/PSR issue - Fix backlight brightness - Fix suspend issue - Fix visual confirm color - Add scoped mutexes for amdgpu_dm_dhcp Cc: Daniel Wheeler Alex Hung (1): drm/amd/display: Assign normalized

[PATCH 01/22] drm/amd/display: Fix incorrect DPCD configs while Replay/PSR switch

2025-03-04 Thread Tom Chung
From: Leon Huang [Why] When switching between PSR/Replay, the DPCD config of previous mode is not cleared, resulting in unexpected behavior in TCON. [How] Initialize the DPCD in setup function Reviewed-by: Robin Chen Signed-off-by: Leon Huang Signed-off-by: Tom Chung --- .../link/protocols/

[PATCH 10/22] drm/amd/display: assume VBIOS supports DSC as default

2025-03-04 Thread Tom Chung
From: Charlene Liu [Why & How] The clear_dsc_setting at boot logic was based on dcn version check. As such new ASIC lost this DSC clear up logic, change the assumption to BIOS support eDP DSC for new ASIC. Reviewed-by: Alvin Lee Signed-off-by: Charlene Liu Signed-off-by: Tom Chung --- driver

[PATCH 11/22] drm/amd/display: Add Support for reg inbox0 for host->DMUB CMDs

2025-03-04 Thread Tom Chung
From: Dillon Varone [WHY] DCN4+ supports a new register based mailbox for sending messages from host to DMCUB. This mailbox supports 64 byte commands, which makes it compatible with the same structure as the frame buffer based mailbox. [HOW] The intention for reg_inbox0 is to be slot in replacem

[PATCH 06/22] drm/amd/display: Add and use new dm_prepare_suspend() callback

2025-03-04 Thread Tom Chung
From: Mario Limonciello [Why] The displays currently don't get turned off until after other IP blocks have been suspended. However turning off the displays first gives a very visible response that the system is on it's way down. [How] Turn off displays in a prepare_suspend() callback instead wh

[PATCH 14/22] drm/amd/display: Fix visual confirm color not updating

2025-03-04 Thread Tom Chung
From: Leo Zeng [WHY] Sometimes visual confirm color is updated, but the background color is not changed. This causes visual confrim to show incorrect colors. [HOW] Update background color when visual confirm color changes. Reviewed-by: Dillon Varone Signed-off-by: Leo Zeng Signed-off-by: Tom

[PATCH 18/22] drm/amd/display: Fix slab-use-after-free on hdcp_work

2025-03-04 Thread Tom Chung
From: Mario Limonciello [Why] A slab-use-after-free is reported when HDCP is destroyed but the property_validate_dwork queue is still running. [How] Cancel the delayed work when destroying workqueue. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4006 Fixes: da3fd7ac0bcf ("drm/amd/disp

[PATCH 17/22] drm/amd/display: Prevent VStartup Overflow

2025-03-04 Thread Tom Chung
From: Ryan Seto [Why] For some VR headsets with large blanks, it's possible to overflow the OTG_VSTARTUP_PARAM:VSTARTUP_START register. This can lead to incorrect DML calculations and underflow downstream. [How] Min the calcualted max_vstartup_lines with the max value of the register. Reviewed-

[PATCH 21/22] drm/amd/display: remove minimum Dispclk and apply oem panel timing.

2025-03-04 Thread Tom Chung
From: Charlene Liu [why & how] 1. apply oem panel timing (not only on OLED) 2. remove MIN_DPP_DISP_CLK request in driver. This fix will apply for dcn31x but not sync with DML's output. Reviewed-by: Ovidiu Bunea Signed-off-by: Charlene Liu Signed-off-by: Tom Chung --- drivers/gpu/drm/amd/dis

[PATCH 20/22] drm/amd/display: Drop unnecessary ret variable for enable_assr()

2025-03-04 Thread Tom Chung
From: Mario Limonciello [Why] enable_assr() has a res variable that only is changed in one block with no cleanup necessary. [How] Remove variable and return early from failure cases. Reviewed-by: Alex Hung Signed-off-by: Mario Limonciello Signed-off-by: Tom Chung --- drivers/gpu/drm/amd/dis

[PATCH 22/22] drm/amd/display: Promote DAL to 3.2.324

2025-03-04 Thread Tom Chung
From: Taimur Hassan This version brings along following fixes: - Fix some Replay/PSR issue - Fix backlight brightness - Fix suspend issue - Fix visual confirm color - Add scoped mutexes for amdgpu_dm_dhcp Reviewed-by: Ovidiu Bunea Reviewed-by: Nicholas Kazlauskas Signed-off-by: Taimur Hassan

[PATCH 12/22] drm/amd/display: Assign normalized_pix_clk when color depth = 14

2025-03-04 Thread Tom Chung
From: Alex Hung [WHY & HOW] A warning message "WARNING: CPU: 4 PID: 459 at ... /dc_resource.c:3397 calculate_phy_pix_clks+0xef/0x100 [amdgpu]" occurs because the display_color_depth == COLOR_DEPTH_141414 is not handled. This is observed in Radeon RX 6600 XT. It is fixed by assigning pix_clk * (1

[PATCH 05/22] drm/amd/display: Restore correct backlight brightness after a GPU reset

2025-03-04 Thread Tom Chung
From: Mario Limonciello [Why] GPU reset will attempt to restore cached state, but brightness doesn't get restored. It will come back at 100% brightness, but userspace thinks it's the previous value. [How] When running resume sequence if GPU is in reset restore brightness to previous value. Acke

[PATCH] drm/amd/amdkfd: Evict all queues even HWS remove queue failed

2025-03-04 Thread Yifan Zha
[Why] If reset is detected and kfd need to evict working queues, HWS moving queue will be failed. Then remaining queues are not evicted and in active state. After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted. So remove queue will be failed again. [How]

[PATCH 16/22] drm/amd/display: Correct timing_adjust_pending flag setting.

2025-03-04 Thread Tom Chung
From: Zhongwei Zhang [Why&How] stream->adjust will be overwritten by update->crtc_timing_adjust. We should set update->crtc_timing_adjust->timing_adjust_pending and then overwrite stream->adjust. Reset update->crtc_timing_adjust->timing_adjust_pending after the assignment. Reviewed-by: Charlene

[PATCH 1/2] drm/amdgpu: retire ip init code specific for A0 rev

2025-03-04 Thread Shiwu Zhang
For aqua_vanjaram, A0 HW is retired so remove the code specific for it in gfx ip init. Signed-off-by: Shiwu Zhang --- drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c | 13 + 1 file changed, 1 insertion(+), 12 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c b/drivers/gpu/d

Re: [PATCH] drm/amdkfd: Change error handling at prange update in svm_range_set_attr

2025-03-04 Thread Chen, Xiaogang
On 3/4/2025 12:32 PM, Felix Kuehling wrote: On 2025-03-04 13:23, Chen, Xiaogang wrote: On 3/3/2025 11:21 PM, Felix Kuehling wrote: On 2025-01-31 11:58, Xiaogang.Chen wrote: From: Xiaogang Chen When register a vm range at svm the added vm range may be split into multiple subranges and/or exi

[PATCH] drm/amdgpu/vcn: fix idle work handler for VCN 2.5

2025-03-04 Thread Alex Deucher
VCN 2.5 uses the PG callback to enable VCN DPM which is a global state. As such, we need to make sure all instances are in the same state. v2: switch to a ref count (Lijo) v3: switch to its own idle work handler Fixes: 4ce4fe27205c ("drm/amdgpu/vcn: use per instance callbacks for idle work hand

Re: [PATCH] drm/amdgpu/vcn: fix idle work handler for VCN 2.5

2025-03-04 Thread Alex Deucher
On Tue, Mar 4, 2025 at 11:29 AM Alex Deucher wrote: > > VCN 2.5 uses the PG callback to enable VCN DPM which is > a global state. As such, we need to make sure all instances > are in the same state. Actually ref counting won't work because the gate and ungate calls may not be balanced. I just s

Re: [PATCH] drm/amdgpu: Validate return value of pm_runtime_get_sync

2025-03-04 Thread Alex Deucher
On Tue, Mar 4, 2025 at 1:14 PM Mario Limonciello wrote: > > On 3/4/2025 12:03, Khatri, Sunil wrote: > > For new code can you please use drm_err() instead of DRM_ERROR()? > > I see drm_err is not used anywhere in amdgpu driver but display is using > > it. I think it would be better if I take it in

[PATCH V7 2/3] drm/amdgpu: Optimize VM invalidation engine allocation and synchronize GPU TLB flush

2025-03-04 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" - Modify the VM invalidation engine allocation logic to handle SDMA page rings. SDMA page rings now share the VM invalidation engine with SDMA gfx rings instead of allocating a separate engine. This change ensures efficient resource management and avoids the is

[PATCH 03/22] drm/amd/display: Add more debug data to dmub_srv

2025-03-04 Thread Tom Chung
From: Joshua Aberback [Why] When analyzing some crash dumps, not all of the expected DMUB info was available, so we want to add in-object storage for this data. [How] - dmub_srv_debug (renamed to dmub_timeout_info) is already a member of dmub_diagnostic_data, therefore keep a dmub_diagnostic_da

[PATCH 04/22] drm/amd/display: fix default brightness

2025-03-04 Thread Tom Chung
From: Mario Limonciello [Why] To avoid flickering during boot default brightness level set by BIOS should be maintained for as much of the boot as feasible. commit 2fe87f54abdc ("drm/amd/display: Set default brightness according to ACPI") attempted to set the right levels for AC vs DC, but bright

[PATCH 15/22] drm/amd/display: calculate the remain segments for all pipes

2025-03-04 Thread Tom Chung
From: Zhikai Zhai [WHY] In some cases the remain de-tile buffer segments will be greater than zero if we don't add the non-top pipe to calculate, at this time the override de-tile buffer size will be valid and used. But it makes the de-tile buffer segments used finally for all of pipes exceed the

[PATCH 2/2] drm/amdgpu: fix the gb_addr_config_fields init value mismatch

2025-03-04 Thread Shiwu Zhang
For gfx_v9_4_3 specifically, before regGB_ADDR_CONFIG is overwritten in gfx hw_init it is read out to popluate the gb_addr_config_fields in the sw_init stage, which causes mismatch. Fix it temporarily by using the golden value in sw_init as well. The final fix should be by vBIOS/IFWI. Signed-off-

Re: [PATCH] drm/amdkfd: remove unnecessary cpu domain validation

2025-03-04 Thread Christian König
Am 03.03.25 um 22:01 schrieb Felix Kuehling: > On 2025-03-03 13:48, Christian König wrote: >> Am 03.03.25 um 19:45 schrieb James Zhu: >>> before move to GTT domain. >> That might not be unnecessary. We sometimes intentionally move BOs to the >> CPU domain to invalidate all VM mappings. > We dis

Re: [PATCH v4] drm/amdgpu: fix the memleak caused by fence not released

2025-03-04 Thread Christian König
Am 03.03.25 um 15:47 schrieb Arvind Yadav: > Encountering a taint issue during the unloading of gpu_sched > due to the fence not being released/put. In this context, > amdgpu_vm_clear_freed is responsible for creating a job to > update the page table (PT). It allocates kmem_cache for > drm_sched_fe

[PATCH 0/6] drm/amdgpu: Deadcode - the P's

2025-03-04 Thread linux
From: "Dr. David Alan Gilbert" Hi, Here's another blob of deadcoding in the amdgpu's; this set is all the symbols I noticed that start with 'P'. Most, as normal are whole function removals, but I also nuke the powerdown_uvd member in one patch. Dave Signed-off-by: Dr. David Alan Gilbert

[PATCH 2/6] drm/amdgpu: Remove phm_powerdown_uvd

2025-03-04 Thread linux
From: "Dr. David Alan Gilbert" phm_powerdown_uvd() has been unused since 2017's commit 47047263c527 ("drm/amd/powerplay: delete eventmgr related files.") Remove it. Signed-off-by: Dr. David Alan Gilbert --- .../gpu/drm/amd/pm/powerplay/hwmgr/hardwaremanager.c | 10 -- drivers/gpu/dr

[PATCH 5/6] drm/amdgpu: Remove unused print__rq_dlg_params_st

2025-03-04 Thread linux
From: "Dr. David Alan Gilbert" print__rq_dlg_params_st() was added in 2017 by commit 061bfa06a42a ("drm/amdgpu/display: Add dml support for DCN") but has remained unused. Remove it. Signed-off-by: Dr. David Alan Gilbert --- .../drm/amd/display/dc/dml/display_rq_dlg_helpers.c | 11 --

[PATCH 4/6] drm/amdgpu: Remove unused pre_surface_trace

2025-03-04 Thread linux
From: "Dr. David Alan Gilbert" pre_surface_trace() has been unused since 2017's commit 745cc746da42 ("drm/amd/display: remove dc_pre_update_surfaces_to_stream from dc use") Remove it. Signed-off-by: Dr. David Alan Gilbert --- .../gpu/drm/amd/display/dc/core/dc_debug.c| 120 ---

[PATCH 1/6] drm/amdgpu: Remove ppatomfwctrl deadcode

2025-03-04 Thread linux
From: "Dr. David Alan Gilbert" pp_atomfwctrl_get_pp_assign_pin() and pp_atomfwctrl_get_pp_assign_pin() were added in 2017 by commit 0d2c7569e196 ("drm/amdgpu: add new atomfirmware based helpers for powerplay") but have remained unused. Remove them, and the helper functions they used. Signed-off

[PATCH 3/6] drm/amdgpu: Remove powerdown_uvd member

2025-03-04 Thread linux
From: "Dr. David Alan Gilbert" With phm_powerdown_uvd() gone in the previous patch, there's now no longer anything that reads the powerdown_uvd member of the pp_hwmgr_func. Remove it. There are a few assignments to it; a boring NULL which can just go, and two functions, but those functions are

[PATCH 6/6] drm/amdgpu: Remove unused pqm_get_kernel_queue

2025-03-04 Thread linux
From: "Dr. David Alan Gilbert" pqm_get_kernel_queue() has been unused since 2022's commit 5bdd3eb25354 ("drm/amdkfd: Remove unused old debugger implementation") Remove it. Signed-off-by: Dr. David Alan Gilbert --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 2 -- .../gpu/drm/amd/am

RE: [PATCH 2/2] drm/amdgpu: Add support for CPERs on virtualization

2025-03-04 Thread Zhou1, Tao
[AMD Official Use Only - AMD Internal Distribution Only] The series is: Reviewed-by: Tao Zhou > -Original Message- > From: amd-gfx On Behalf Of Tony Yi > Sent: Thursday, February 27, 2025 11:12 PM > To: Yi, Tony ; Skvortsov, Victor ; > amd-gfx@lists.freedesktop.org; Zhang, Hawking ; Lu

Re: [PATCH v3] drm/amdgpu: Trigger a wedged event for ring reset

2025-03-04 Thread Christian König
Am 25.02.25 um 02:02 schrieb André Almeida: > Instead of only triggering a wedged event for complete GPU resets, > trigger for ring resets. Regardless of the reset, it's useful for > userspace to know that it happened because the kernel will reject > further submissions from that app. > > Signed-of

[PATCH] drm/amdgpu: Validate return value of pm_runtime_get_sync

2025-03-04 Thread Sunil Khatri
An invalid return value 'r' of the pm_runtime_get_sync is r < 0 so fix the return value and add proper logging of failure and exit cleanly. Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/

Re: [PATCH] drm/amdgpu/vcn: fix idle work handler for VCN 2.5

2025-03-04 Thread Boyuan Zhang
On 2025-03-04 11:22, Alex Deucher wrote: VCN 2.5 uses the PG callback to enable VCN DPM which is a global state. As such, we need to make sure all instances are in the same state. v2: switch to a ref count (Lijo) Fixes: 4ce4fe27205c ("drm/amdgpu/vcn: use per instance callbacks for idle work

Re: [PATCH] drm/amdgpu/vcn: fix idle work handler for VCN 2.5

2025-03-04 Thread Alex Deucher
On Tue, Mar 4, 2025 at 4:00 PM Boyuan Zhang wrote: > > > On 2025-03-04 11:22, Alex Deucher wrote: > > VCN 2.5 uses the PG callback to enable VCN DPM which is > > a global state. As such, we need to make sure all instances > > are in the same state. > > > > v2: switch to a ref count (Lijo) > > > >

Re: [PATCH] drm/amdgpu: add initial documentation for debugfs files

2025-03-04 Thread Alex Deucher
On Tue, Mar 4, 2025 at 12:37 PM Rodrigo Siqueira wrote: > > Hi Alex, > > I added a few suggestions and questions. > > On 03/03, Alex Deucher wrote: > > Describes what debugfs files are available and what > > they are used for. > > > > Signed-off-by: Alex Deucher > > --- > > Documentation/gpu/amd

Re: [PATCH V2] drm/amdgpu: validate return value of pm_runtime_get_sync

2025-03-04 Thread Alex Deucher
On Tue, Mar 4, 2025 at 2:32 PM Sunil Khatri wrote: > > An invalid return value 'r' of the pm_runtime_get_sync > is r < 0, so fix the return value check and add proper > failure log and exit cleanly. > > Successful refcount in userqueue creation to make sure > device remains in active state. > Fix

Re: [PATCH] drm/amdgpu: Validate return value of pm_runtime_get_sync

2025-03-04 Thread Mario Limonciello
On 3/4/2025 11:51, Sunil Khatri wrote: An invalid return value 'r' of the pm_runtime_get_sync is r < 0 so fix the return value and add proper logging of failure and exit cleanly. You have an extra space between "failure" and "and". Signed-off-by: Sunil Khatri --- drivers/gpu/drm/amd/amdg

RE: [PATCH V2] drm/amdgpu: validate return value of pm_runtime_get_sync

2025-03-04 Thread Khatri, Sunil
[AMD Official Use Only - AMD Internal Distribution Only] Sure Alex. Will do those changes and push the change. Regards Sunil Khatri -Original Message- From: Alex Deucher Sent: Wednesday, March 5, 2025 1:11 AM To: Khatri, Sunil Cc: Deucher, Alexander ; Koenig, Christian ; amd-gfx@lists

RE: [PATCH] drm/amdgpu: Fix missing drain retry fault the last entry

2025-03-04 Thread Deng, Emily
[AMD Official Use Only - AMD Internal Distribution Only] >-Original Message- >From: Yang, Philip >Sent: Tuesday, March 4, 2025 11:00 PM >To: Deng, Emily ; amd-gfx@lists.freedesktop.org >Subject: Re: [PATCH] drm/amdgpu: Fix missing drain retry fault the last entry > > >On 2025-03-03 19:44,

Re: [PATCH] amdkfd: initialize svm lists at where they are defined

2025-03-04 Thread Zhu Lingshan
On 3/4/2025 11:16 PM, Felix Kuehling wrote: > On 2025-03-04 2:40, Zhu Lingshan wrote: >> On 3/4/2025 1:49 PM, Felix Kuehling wrote: >>> On 2025-02-21 4:23, Zhu Lingshan wrote: This commit initialized svm lists at where they are defined. This is defensive programing for security and c

[PATCH V7 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA

2025-03-04 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" This commit updates the VM flush implementation for the SDMA engine. - Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the VM_INVALIDATE_ENG0_REQ register value for the specified VMID and flush type. This function ensures that all relevant pag

[PATCH v7 1/3] drm/amd/amdgpu: Increase max rings to enable SDMA page ring

2025-03-04 Thread jesse.zh...@amd.com
From: "jesse.zh...@amd.com" Increase the maximum number of rings supported by the AMDGPU driver from 133 to 149. This change is necessary to enable support for the SDMA page ring. Signed-off-by: Jesse Zhang --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 2 +- 1 file changed, 1 insertion(+), 1

[PATCH v2] drm/amdgpu: Fix missing drain retry fault the last entry

2025-03-04 Thread Emily Deng
While the entry get in svm_range_unmap_from_cpu is the last entry, and the entry is page fault, it also need to be dropped. So for equal case, it also need to be dropped. v2: Only modify the svm_range_restore_pages. Signed-off-by: Emily Deng --- drivers/gpu/drm/amd/amdgpu/amdgpu_ih.h | 3 +++ d