[PATCH v4] drm/amdgpu: Fix the race condition for draining retry fault

2025-03-05 Thread Emily Deng
Issue: In the scenario where svm_range_restore_pages is called, but svm->checkpoint_ts has not been set and the retry fault has not been drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free. Meanwhile, svm_range_restore_pages continues execution and reaches svm_range_from_addr.

RE: [PATCH v3] drm/amdgpu: Fix the race condition for draining retry fault

2025-03-05 Thread Deng, Emily
[AMD Official Use Only - AMD Internal Distribution Only] >-Original Message- >From: Lazar, Lijo >Sent: Thursday, March 6, 2025 1:12 PM >To: Deng, Emily ; amd-gfx@lists.freedesktop.org >Subject: Re: [PATCH v3] drm/amdgpu: Fix the race condition for draining retry >fault > > > >On 3/6/2025

RE: [PATCH] drm/amdgpu: Use unique CPER record id across devices

2025-03-05 Thread Zhou1, Tao
[AMD Official Use Only - AMD Internal Distribution Only] > -Original Message- > From: Liu, Xiang(Dean) > Sent: Thursday, March 6, 2025 3:25 PM > To: amd-gfx@lists.freedesktop.org > Cc: Zhang, Hawking ; Zhou1, Tao > ; Liu, Xiang(Dean) > Subject: [PATCH] drm/amdgpu: Use unique CPER record

Re: [PATCH 02/11] drm/amdgpu: add ring flag for no user submissions

2025-03-05 Thread Khatri, Sunil
On 3/6/2025 2:17 AM, Alex Deucher wrote: This would be set by IPs which only accept submissions from the kernel, not userspace, such as when kernel queues are disabled. Don't expose the rings to userspace and reject any submissions in the CS IOCTL. Signed-off-by: Alex Deucher --- drivers/gp

RE: [PATCH 2/2] drm/amdgpu: fix the gb_addr_config_fields init value mismatch

2025-03-05 Thread Zhang, Morris
[AMD Official Use Only - AMD Internal Distribution Only] Thanks @Lazar, Lijo for the review. And initializing the gb_addr_config_fields is part of sw_init although the callback has the name of early_init in it. Will remove the part of " Fix it temporarily by using th

[PATCH] drm/amdgpu: Use unique CPER record id across devices

2025-03-05 Thread Xiang Liu
Encode socket id to CPER record id to be unique across devices. Signed-off-by: Xiang Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c inde

Re: [PATCH v2] drm/amdgpu: Fix missing drain retry fault the last entry

2025-03-05 Thread Felix Kuehling
On 2025-03-04 22:54, Emily Deng wrote: While the entry get in svm_range_unmap_from_cpu is the last entry, and the entry is page fault, it also need to be dropped. So for equal case, it also need to be dropped. v2: Only modify the svm_range_restore_pages. Signed-off-by: Emily Deng --- drive

Re: [PATCH 03/11] drm/amdgpu/gfx: add generic handling for disable_kq

2025-03-05 Thread Felix Kuehling
On 2025-03-05 15:47, Alex Deucher wrote: Add proper checks for disable_kq functionality in gfx helper functions. Add special logic for families that require the clear state setup. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 92 + drive

Re: [PATCH v2] drm/amdgpu: Fix the race condition for draining retry fault

2025-03-05 Thread Felix Kuehling
On 2025-03-05 20:10, Emily Deng wrote: Issue: In the scenario where svm_range_restore_pages is called, but svm->checkpoint_ts has not been set and the retry fault has not been drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free. Meanwhile, svm_range_restore_pages continue

[PATCH v3 4/4] drm/amdgpu: fix warning and errors caused by duplicated defines in sid.h

2025-03-05 Thread Alexandre Demers
Let's finish the cleanup in sid.h to calm down things after wiring it into dce_v6_0.c. This is a bigger cleanup. Many defines found under sid.h have already been properly moved into the different "_d.h" and "_sh_mask.h", so they should have been already removed from sid.h and properly linked in wh

RE: [PATCH v2] drm/amdgpu: Fix the race condition for draining retry fault

2025-03-05 Thread Deng, Emily
[AMD Official Use Only - AMD Internal Distribution Only] >-Original Message- >From: Kuehling, Felix >Sent: Thursday, March 6, 2025 9:25 AM >To: amd-gfx@lists.freedesktop.org; Deng, Emily >Subject: Re: [PATCH v2] drm/amdgpu: Fix the race condition for draining retry >fault > > >On 2025-0

Re: [PATCH] drm/amd: Fail initialization earlier when DC is disabled

2025-03-05 Thread Mario Limonciello
On 3/5/2025 17:04, Alex Deucher wrote: On Wed, Mar 5, 2025 at 4:53 PM Mario Limonciello wrote: On 3/5/2025 15:37, Mario Limonciello wrote: Modern APU and dGPU require DC support to be able to light up the display. If DC support has been disabled either by kernel config or by kernel command l

[PATCH] drm/amdgpu: Fix the race condition for draining retry fault

2025-03-05 Thread Emily Deng
Issue: In the scenario where svm_range_restore_pages is called, but svm->checkpoint_ts has not been set and the retry fault has not been drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free. Meanwhile, svm_range_restore_pages continues execution and reaches svm_range_from_addr.

Re: [PATCH] drm/amdgpu: Fix the race condition for draining retry fault

2025-03-05 Thread Felix Kuehling
On 2025-03-05 19:49, Emily Deng wrote: Issue: In the scenario where svm_range_restore_pages is called, but svm->checkpoint_ts has not been set and the retry fault has not been drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free. Meanwhile, svm_range_restore_pages continue

RE: [PATCH] drm/amdgpu: Fix the race condition for draining retry fault

2025-03-05 Thread Deng, Emily
[AMD Official Use Only - AMD Internal Distribution Only] >-Original Message- >From: Kuehling, Felix >Sent: Thursday, March 6, 2025 8:53 AM >To: Deng, Emily ; amd-gfx@lists.freedesktop.org >Subject: Re: [PATCH] drm/amdgpu: Fix the race condition for draining retry >fault > > >On 2025-03-0

[PATCH v2] drm/amdgpu: Fix the race condition for draining retry fault

2025-03-05 Thread Emily Deng
Issue: In the scenario where svm_range_restore_pages is called, but svm->checkpoint_ts has not been set and the retry fault has not been drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free. Meanwhile, svm_range_restore_pages continues execution and reaches svm_range_from_addr.

[PATCH v3 3/4] drm/amdgpu: move and fix X_GB_ADDR_CONFIG_GOLDEN values

2025-03-05 Thread Alexandre Demers
By wiring up sid.h in the previous commit, we ended up with many duplicated defines. Let's clean this up. First and easy cleanup. [TAHITI,VERDE, HAINAN]_GB_ADDR_CONFIG_GOLDEN were defined in sid.h and under si_enums.h, with different values. Keep the values used under radeon and move them under g

[PATCH v3 1/4] drm/amdgpu: add or move defines for DCE6 in sid.h

2025-03-05 Thread Alexandre Demers
For coherence with DCE8 et DCE10, add or move some values under sid.h. Signed-off-by: Alexandre Demers --- drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 63 ++- drivers/gpu/drm/amd/amdgpu/si_enums.h | 7 --- drivers/gpu/drm/amd/amdgpu/sid.h | 29 +--- 3 files chan

[PATCH v3 0/4] Uniformize defines between DCE6, DCE8 and DCE10

2025-03-05 Thread Alexandre Demers
Keep a uniform way of where and how variables are defined between DCE6, DCE8 and DCE10. It is easier to understand the code, their similarities and their modifications. Since sid.h is being wired up in dce_v6_0.c, duplicated defines need to be cleaned up. Alexandre Demers (5): drm/amdgpu: add o

[PATCH v3 2/4] drm/amdgpu: add defines for pin_offsets in DCE8

2025-03-05 Thread Alexandre Demers
Define pin_offsets values in the same way it is done in DCE8 Signed-off-by: Alexandre Demers --- drivers/gpu/drm/amd/amdgpu/cikd.h | 9 + drivers/gpu/drm/amd/amdgpu/dce_v8_0.c | 14 +++--- 2 files changed, 16 insertions(+), 7 deletions(-) diff --git a/drivers/gpu/drm/amd/am

[PATCH v3] drm/amdgpu: Fix the race condition for draining retry fault

2025-03-05 Thread Emily Deng
Issue: In the scenario where svm_range_restore_pages is called, but svm->checkpoint_ts has not been set and the retry fault has not been drained, svm_range_unmap_from_cpu is triggered and calls svm_range_free. Meanwhile, svm_range_restore_pages continues execution and reaches svm_range_from_addr.

Re: [PATCH v3] drm/amdgpu: Fix the race condition for draining retry fault

2025-03-05 Thread Lazar, Lijo
On 3/6/2025 7:23 AM, Emily Deng wrote: > Issue: > In the scenario where svm_range_restore_pages is called, but > svm->checkpoint_ts > has not been set and the retry fault has not been drained, > svm_range_unmap_from_cpu > is triggered and calls svm_range_free. Meanwhile, svm_range_restore_pag

[PATCH 09/22] drm/amd/display: Implement PCON regulated autonomous mode handling

2025-03-05 Thread Tom Chung
From: George Shen [Why/How] DP spec has been updated recently to make regulated autonomous mode more well-defined. In case any PCON vendors choose to implement regulated autonomous mode in the future, pre-emptively add handling for the regulated autonomous mode based on current spec. Reviewed-by

[PATCH 08/22] drm/amd/display: not abort link train when bw is low

2025-03-05 Thread Tom Chung
From: Peichen Huang [WHY] DP tunneling should not abort link train even bandwidth become too low after downgrade. Otherwise, it would fail compliance test. [HOW} Do link train with downgrade settings even bandwidth is not enough Reviewed-by: Cruise Hung Reviewed-by: Meenakshikumar Somasundaram

Re: [PATCH 2/2] drm/amdgpu: fix the gb_addr_config_fields init value mismatch

2025-03-05 Thread Lazar, Lijo
On 3/5/2025 12:14 PM, Shiwu Zhang wrote: > For gfx_v9_4_3 specifically, before regGB_ADDR_CONFIG is overwritten > in gfx hw_init it is read out to popluate the gb_addr_config_fields > in the sw_init stage, which causes mismatch. > > Fix it temporarily by using the golden value in sw_init as wel

[PATCH 13/22] drm/amd/display: dml2 soc dscclk use DPM table clk setting.

2025-03-05 Thread Tom Chung
From: Charlene Liu [why & how] The dml2 will calculate the minimum required clocks. Use DPM table clk setting for dml2 soc dscclk. Reviewed-by: Alvin Lee Signed-off-by: Charlene Liu Signed-off-by: Tom Chung --- drivers/gpu/drm/amd/display/dc/dml2/dml2_translation_helper.c | 2 +- 1 file chan

RE: [PATCH] drm/amdgpu: add initial documentation for debugfs files

2025-03-05 Thread Russell, Kent
[Public] > -Original Message- > From: amd-gfx On Behalf Of Alex > Deucher > Sent: Tuesday, March 4, 2025 11:50 AM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander > Subject: [PATCH] drm/amdgpu: add initial documentation for debugfs files > > Describes what debugfs files are a

Re: [PATCH v2 0/2] Uniformize defines between DCE6, DCE8 and DCE10

2025-03-05 Thread Alexandre Demers
Ok, so wiring up sid.h in dce_v6_0.c brought a lot of redefinitions. Fixing them is not the problem, but it spreads out a bit over the two files. I'm having an issue with the following: In si_enums.h, we have : #define TAHITI_GB_ADDR_CONFIG_GOLDEN0x12011003 #define VERDE_GB_ADDR_CONFIG_GOL

Re: [PATCH] drm/amdgpu: fix inconsistent indenting warning

2025-03-05 Thread Alex Deucher
Applied. Thanks! Alex On Wed, Mar 5, 2025 at 5:41 AM Charles Han wrote: > > Fix below inconsistent indenting smatch warning. > smatch warnings: > drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c:582 amdgpu_sdma_reset_engine() warn: > inconsistent indenting > > Signed-off-by: Charles Han > --- > driv

Re: [PATCH v2 0/2] Uniformize defines between DCE6, DCE8 and DCE10

2025-03-05 Thread Alex Deucher
On Wed, Mar 5, 2025 at 1:04 PM Alexandre Demers wrote: > > Ok, so wiring up sid.h in dce_v6_0.c brought a lot of redefinitions. > Fixing them is not the problem, but it spreads out a bit over the two > files. > > I'm having an issue with the following: > In si_enums.h, we have : > #define TAHITI_G

Re: [PATCH] drm/amd/amdgpu: Add missing parameter for amdgpu_sdma_register_on_reset_callbacks

2025-03-05 Thread SRINIVASAN SHANMUGAM
On 2/24/2025 6:54 PM, Christian König wrote: Am 24.02.25 um 12:45 schrieb Srinivasan Shanmugam: This commit updates the documentation for the function amdgpu_sdma_register_on_reset_callbacks to include a description for the 'adev' parameter. The 'adev' parameter is a pointer to the amdgpu_dev

Re: [PATCH 6.1.y] drm/amd/display: fixed integer types and null check locations

2025-03-05 Thread Greg KH
On Thu, Feb 27, 2025 at 11:26:33AM +0800, jianqi.ren...@windriver.com wrote: > From: Sohaib Nadeem > > [ Upstream commit 0484e05d048b66d01d1f3c1d2306010bb57d8738 ] > > [why]: > issues fixed: > - comparison with wider integer type in loop condition which can cause > infinite loops > - pointer der

Re: [PATCH] drm/amdgpu: handle amdgpu_cgs_create_device() errors in amd_powerplay_create()

2025-03-05 Thread Alex Deucher
On Tue, Mar 4, 2025 at 4:29 AM Wentao Liang wrote: > > Add error handling to propagate amdgpu_cgs_create_device() failures > to the caller. When amdgpu_cgs_create_device() fails, immediately > return -EINVAL to stop further processing and prevent null pointer > dereference. > > Signed-off-by: Went

[PATCH] drm/amdgpu/vcn: fix idle work handler for VCN 2.5

2025-03-05 Thread Alex Deucher
VCN 2.5 uses the PG callback to enable VCN DPM which is a global state. As such, we need to make sure all instances are in the same state. v2: switch to a ref count (Lijo) v3: switch to its own idle work handler v4: fix logic in DPG handling Fixes: 4ce4fe27205c ("drm/amdgpu/vcn: use per instance

[PATCH] drm/amdgpu: fix inconsistent indenting warning

2025-03-05 Thread Charles Han
Fix below inconsistent indenting smatch warning. smatch warnings: drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c:582 amdgpu_sdma_reset_engine() warn: inconsistent indenting Signed-off-by: Charles Han --- drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) d

Re: [PATCH] drm/amdgpu/vcn: fix idle work handler for VCN 2.5

2025-03-05 Thread Boyuan Zhang
On 2025-03-04 18:01, Alex Deucher wrote: VCN 2.5 uses the PG callback to enable VCN DPM which is a global state. As such, we need to make sure all instances are in the same state. v2: switch to a ref count (Lijo) v3: switch to its own idle work handler Fixes: 4ce4fe27205c ("drm/amdgpu/vcn: us

Re: [PATCH] amdkfd: initialize svm lists at where they are defined

2025-03-05 Thread Felix Kuehling
On 2025-03-04 22:11, Zhu Lingshan wrote: > On 3/4/2025 11:16 PM, Felix Kuehling wrote: >> On 2025-03-04 2:40, Zhu Lingshan wrote: >>> On 3/4/2025 1:49 PM, Felix Kuehling wrote: On 2025-02-21 4:23, Zhu Lingshan wrote: > This commit initialized svm lists at where they are > defined. Th

[PATCH 07/22] drm/amd/display: Do not enable replay when vtotal update is pending.

2025-03-05 Thread Tom Chung
From: Danny Wang [Why&How] Vtotal is not applied to HW when handling vsync interrupt. Make sure vtotal is aligned before enable replay. Reviewed-by: Anthony Koo Reviewed-by: Robin Chen Signed-off-by: Danny Wang Signed-off-by: Zhongwei Zhang Signed-off-by: Tom Chung --- drivers/gpu/drm/amd/

[PATCH 02/22] drm/amd/display: Disable unneeded hpd interrupts during dm_init

2025-03-05 Thread Tom Chung
From: Leo Li [Why] It seems HPD interrupts are enabled by default for all connectors, even if the hpd source isn't valid. An eDP for example, does not have a valid hpd source (but does have a valid hpdrx source; see construct_phy()). Thus, eDPs should have their hpd interrupt disabled. In the p

[PATCH] drm/amdgpu: Fix annotation for dce_v6_0_line_buffer_adjust function

2025-03-05 Thread Srinivasan Shanmugam
Updated description for the 'other_mode' parameter. This parameter is used to determine the display mode of another display controller that may be sharing the line buffer. Cc: Ken Wang Cc: Christian König Cc: Alex Deucher Signed-off-by: Srinivasan Shanmugam --- drivers/gpu/drm/amd/amdgpu/dce_

Re: [PATCH] drm/amdkfd: Change error handling at prange update in svm_range_set_attr

2025-03-05 Thread Felix Kuehling
On 2025-03-04 13:23, Chen, Xiaogang wrote: > > > On 3/3/2025 11:21 PM, Felix Kuehling wrote: >> On 2025-01-31 11:58, Xiaogang.Chen wrote: >>> From: Xiaogang Chen >>> >>> When register a vm range at svm the added vm range may be split into >>> multiple >>> subranges and/or existing pranges got s

[PATCH 1/2] drm/amdgpu/gfx11: don't read registers in mqd init

2025-03-05 Thread Alex Deucher
Just use the default values. There's not need to get the value from hardware and it could cause problems if we do that at runtime and gfxoff is active. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 47 ++ 1 file changed, 32 insertions(+), 15 de

Re: [PATCH] drm/amd: Fail initialization earlier when DC is disabled

2025-03-05 Thread Mario Limonciello
On 3/5/2025 15:37, Mario Limonciello wrote: Modern APU and dGPU require DC support to be able to light up the display. If DC support has been disabled either by kernel config or by kernel command line fail init early so that the system won't freeze with a lack of display. Signed-off-by: Mario L

Re: [PATCH] drm/amd: Fail initialization earlier when DC is disabled

2025-03-05 Thread Alex Deucher
On Wed, Mar 5, 2025 at 4:53 PM Mario Limonciello wrote: > > On 3/5/2025 15:37, Mario Limonciello wrote: > > Modern APU and dGPU require DC support to be able to light up the > > display. If DC support has been disabled either by kernel config > > or by kernel command line fail init early so that

Re: [PATCH] drm/amdkfd: remove unused debug gws support status variable

2025-03-05 Thread Amber Lin
Reviewed-by: Amber Lin Regards, Amber On 2025-02-27 12:31, Jonathan Kim wrote: Remove unused declaration of gws_debug_workaround. Signed-off-by: Jonathan Kim --- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h

[PATCH] drm/amd: Fail initialization earlier when DC is disabled

2025-03-05 Thread Mario Limonciello
Modern APU and dGPU require DC support to be able to light up the display. If DC support has been disabled either by kernel config or by kernel command line fail init early so that the system won't freeze with a lack of display. Signed-off-by: Mario Limonciello --- drivers/gpu/drm/amd/amdgpu/am

[PATCH 10/11] drm/amdgpu/sdma6: add support for disable_kq

2025-03-05 Thread Alex Deucher
When the parameter is set, disable user submissions to kernel queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v6_0.c index 3aa4fec4d9e4

[PATCH 00/11] Add disable kernel queue support

2025-03-05 Thread Alex Deucher
To better evaluate user queues, add a module parameter to disable kernel queues. With this set kernel queues are disabled and only user queues are available. This frees up hardware resources for use in user queues which would otherwise be used by kernel queues and provides a way to validate user

[PATCH 06/11] drm/amdgpu/mes: make more vmids available when disable_kq=1

2025-03-05 Thread Alex Deucher
If we don't have kernel queues, the vmids can be used by the MES for user queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 2 +- 3 files changed, 3 insertions(+), 3 de

[PATCH 08/11] drm/amdgpu/gfx12: add support for disable_kq

2025-03-05 Thread Alex Deucher
Plumb in support for disabling kernel queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 114 ++--- 1 file changed, 65 insertions(+), 49 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c i

[PATCH 02/11] drm/amdgpu: add ring flag for no user submissions

2025-03-05 Thread Alex Deucher
This would be set by IPs which only accept submissions from the kernel, not userspace, such as when kernel queues are disabled. Don't expose the rings to userspace and reject any submissions in the CS IOCTL. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 4 driv

[PATCH 09/11] drm/amdgpu/sdma: add flag for tracking disable_kq

2025-03-05 Thread Alex Deucher
For SDMA, we still need kernel queues for paging so they need to be initialized, but we no not want to accept submissions from userspace when disable_kq is set. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.h | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/gp

[PATCH 04/11] drm/amdgpu/mes: centralize gfx_hqd mask management

2025-03-05 Thread Alex Deucher
Move it to amdgpu_mes to align with the compute and sdma hqd masks. No functional change. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 24 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 16 +++- drivers/gpu/drm/amd/amdgpu/mes_v12_0.c

[PATCH 07/11] drm/amdgpu/gfx11: add support for disable_kq

2025-03-05 Thread Alex Deucher
Plumb in support for disabling kernel queues in GFX11. We have to bring up a GFX queue briefly in order to initialize the clear state. After that we can disable it. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 77 +- 1 file changed, 51 insert

[PATCH 01/11] drm/amdgpu: add parameter to disable kernel queues

2025-03-05 Thread Alex Deucher
On chips that support user queues, setting this option will disable kernel queues to be used to validate user queues without kernel queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 + 2 files changed, 10 in

[PATCH 11/11] drm/amdgpu/sdma7: add support for disable_kq

2025-03-05 Thread Alex Deucher
When the parameter is set, disable user submissions to kernel queues. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c b/drivers/gpu/drm/amd/amdgpu/sdma_v7_0.c index 92a79296708a

[PATCH 03/11] drm/amdgpu/gfx: add generic handling for disable_kq

2025-03-05 Thread Alex Deucher
Add proper checks for disable_kq functionality in gfx helper functions. Add special logic for families that require the clear state setup. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 92 + drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 + 2 fi

[PATCH 05/11] drm/amdgpu/mes: update hqd masks when disable_kq is set

2025-03-05 Thread Alex Deucher
Make all resources available to user queues. Suggested-by: Sunil Khatri Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_

[PATCH] drm/amdgpu/mes: remove unused functions

2025-03-05 Thread Alex Deucher
Leftover from the MES self tests that were removed previously. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 800 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h | 41 -- 2 files changed, 841 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdg

[PATCH 2/2] drm/amdgpu/gfx12: don't read registers in mqd init

2025-03-05 Thread Alex Deucher
Just use the default values. There's not need to get the value from hardware and it could cause problems if we do that at runtime and gfxoff is active. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 48 ++ 1 file changed, 33 insertions(+), 15 de

Re: [PATCH v2] drm/amdgpu: Fix missing drain retry fault the last entry

2025-03-05 Thread Chen, Xiaogang
Reviewed-by: Xiaogang Chen On 3/4/2025 9:54 PM, Emily Deng wrote: While the entry get in svm_range_unmap_from_cpu is the last entry, and the entry is page fault, it also need to be dropped. So for equal case, it also need to be dropped. v2: Only modify the svm_range_restore_pages. Signed-off-

Re: [PATCH] drm: amdkfd: Replace (un)register_chrdev() by (unregister/alloc)_chrdev_region()

2025-03-05 Thread Felix Kuehling
On 2025-03-05 16:08, Salah Triki wrote: Replace (un)register_chrdev() by (unregister/alloc)_chrdev_region() as they are deprecated since kernel 2.6. Where is that information coming from? I see __register_chrdev documented in the current kernel documentation. I see no indication that it's d

Re: [PATCH] drm/amd/amdkfd: Evict all queues even HWS remove queue failed

2025-03-05 Thread Felix Kuehling
On 2025-03-05 00:42, Yifan Zha wrote: [Why] If reset is detected and kfd need to evict working queues, HWS moving queue will be failed. Then remaining queues are not evicted and in active state. After reset done, kfd uses HWS to termination remaining activated queues but HWS is resetted. So