date:20250310

[PATCH] drm/amdgpu/gfx: delete stray tabs

2025-03-10 Thread Dan Carpenter

These lines are indented one tab too far.  Delete the extra tabs.

Signed-off-by: Dan Carpenter 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index a194bf3347cb..984e6ff6e463 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -2002,8 +2002,8 @@ void amdgpu_gfx_enforce_isolation_handler(struct 
work_struct *work)
if (adev->kfd.init_complete) {
WARN_ON_ONCE(!adev->gfx.kfd_sch_inactive[idx]);
WARN_ON_ONCE(adev->gfx.kfd_sch_req_count[idx]);
-   amdgpu_amdkfd_start_sched(adev, idx);
-   adev->gfx.kfd_sch_inactive[idx] = false;
+   amdgpu_amdkfd_start_sched(adev, idx);
+   adev->gfx.kfd_sch_inactive[idx] = false;
}
}
mutex_unlock(&adev->enforce_isolation_mutex);
-- 
2.47.2

RE: [PATCH 01/11] drm/amdgpu: add parameter to disable kernel queues

2025-03-10 Thread Liang, Prike

[Public]

Reviewed-by: Prike Liang 

Regards,
  Prike

> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Friday, March 7, 2025 11:16 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Khatri, Sunil
> 
> Subject: [PATCH 01/11] drm/amdgpu: add parameter to disable kernel queues
>
> On chips that support user queues, setting this option will disable kernel 
> queues to
> be used to validate user queues without kernel queues.
>
> Reviewed-by: Sunil Khatri 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 9 +
>  2 files changed, 10 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 87062c1adcdf7..45437a8f29d3b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -270,6 +270,7 @@ extern int amdgpu_user_partt_mode;  extern int
> amdgpu_agp;
>
>  extern int amdgpu_wbrf;
> +extern int amdgpu_disable_kq;
>
>  #define AMDGPU_VM_MAX_NUM_CTX4096
>  #define AMDGPU_SG_THRESHOLD  (256*1024*1024)
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index b161daa900198..42a7619592ab9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -237,6 +237,7 @@ int amdgpu_agp = -1; /* auto */  int amdgpu_wbrf = -1;
> int amdgpu_damage_clips = -1; /* auto */  int amdgpu_umsch_mm_fwlog;
> +int amdgpu_disable_kq = -1;
>
>  DECLARE_DYNDBG_CLASSMAP(drm_debug_classes,
> DD_CLASS_TYPE_DISJOINT_BITS, 0,
>   "DRM_UT_CORE",
> @@ -1083,6 +1084,14 @@ MODULE_PARM_DESC(wbrf,
>   "Enable Wifi RFI interference mitigation (0 = disabled, 1 = enabled, -1 
> =
> auto(default)");  module_param_named(wbrf, amdgpu_wbrf, int, 0444);
>
> +/**
> + * DOC: disable_kq (int)
> + * Disable kernel queues on systems that support user queues.
> + * (0 = kernel queues enabled, 1 = kernel queues disabled, -1 = auto
> +(default setting))  */ MODULE_PARM_DESC(disable_kq, "Disable kernel
> +queues (-1 = auto (default), 0 = enable KQ, 1 = disable KQ)");
> +module_param_named(disable_kq, amdgpu_disable_kq, int, 0444);
> +
>  /* These devices are not supported by amdgpu.
>   * They are supported by the mach64, r128, radeon drivers
>   */
> --
> 2.48.1

RE: [PATCH 00/22] DC Patches Mar 10 2025

2025-03-10 Thread Wheeler, Daniel

[Public]

Hi all,

This week this patchset was tested on 4 systems, two dGPU and two APU based, 
and tested across multiple display and connection types.

APU
* Single Display eDP -> 1080p 60hz, 2560x1600 120hz, 1920x1200 165hz
* Single Display DP (SST DSC) -> 4k144hz, 4k240hz
* Multi display -> eDP + DP/HDMI/USB-C -> 1080p 60hz eDP + 4k 144hz, 4k 
240hz (Includes USB-C to DP/HDMI adapters)
* Thunderbolt -> LG Ultrafine 5k
* MST DSC -> Cable Matters 101075 (DP to 3x DP) with 3x 4k60hz 
displays, HP Hook G2 with 2x 4k60hz displays
* USB 4 -> HP Hook G4, Lenovo Thunderbolt Dock, both with 2x 4k60hz DP 
and 1x 4k60hz HDMI displays
* SST PCON -> Club3D CAC-1085 + 1x 4k 144hz, FRL3, at a max resolution 
supported by the dongle of 4k 120hz YUV420 12bpc.
* MST PCON -> 1x 4k 144hz, FRL3, at a max resolution supported by the 
adapter of 4k 120hz RGB 8bpc.

DGPU
* Single Display DP (SST DSC) -> 4k144hz, 4k240hz
* Multiple Display DP -> 4k240hz + 4k144hz
* MST (Startech MST14DP123DP [DP to 3x DP] and 2x 4k 60hz displays)
* MST DSC (with Cable Matters 101075 [DP to 3x DP] with 3x 4k60hz 
displays)

The testing is a mix of automated and manual tests. Manual testing includes 
(but is not limited to)
* Changing display configurations and settings
* Video/Audio playback
* Benchmark testing
* Suspend/Resume testing
* Feature testing (Freesync, HDCP, etc.)

Automated testing includes (but is not limited to)
* Script testing (scripts to automate some of the manual checks)
* IGT testing

The testing is mainly tested on the following displays, but occasionally there 
are tests with other displays
* Samsung G8 Neo 4k240hz
* Samsung QN55QN95B 4k 120hz
* Acer XV322QKKV 4k144hz
* HP U27 4k Wireless 4k60hz
* LG 27UD58B 4k60hz
* LG 32UN650WA 4k60hz
* LG Ultrafine 5k 5k60hz
* AU Optronics B140HAN01.1 1080p 60hz eDP
* AU Optronics B160UAN01.J 1920x1200 165hz eDP
* AU Optronics B160QAN02.L 2560x1600 120hz eDP

The patchset consists of the amd-staging-drm-next branch (Head commit - 
51cf6a0ec03e66081e2889644f248acab0965430 -> drm/amd/display: Promote DAL to 
3.2.323) with new patches added on top of it.

Tested on Ubuntu 24.04.1, on Wayland and X11, using KDE Plasma and Gnome.

Tested-by: Daniel Wheeler 



Thank you,

Dan Wheeler
Sr. Technologist | AMD
SW Display
--
1 Commerce Valley Dr E, Thornhill, ON L3T 7X6
amd.com



-Original Message-
From: Tom Chung 
Sent: Wednesday, March 5, 2025 12:14 AM
To: amd-gfx@lists.freedesktop.org
Cc: Wentland, Harry ; Li, Sun peng (Leo) 
; Pillai, Aurabindo ; Li, Roman 
; Lin, Wayne ; Chung, ChiaHsuan (Tom) 
; Zuo, Jerry ; Mohamed, Zaeem 
; Chiu, Solomon ; Wheeler, Daniel 
; Hung, Alex 
Subject: [PATCH 00/22] DC Patches Mar 10 2025

This DC patchset brings improvements in multiple areas. In summary, we 
highlight:

- Fix some Replay/PSR issue
- Fix backlight brightness
- Fix suspend issue
- Fix visual confirm color
- Add scoped mutexes for amdgpu_dm_dhcp

Cc: Daniel Wheeler 

Alex Hung (1):
  drm/amd/display: Assign normalized_pix_clk when color depth = 14

Charlene Liu (3):
  drm/amd/display: assume VBIOS supports DSC as default
  drm/amd/display: dml2 soc dscclk use DPM table clk setting.
  drm/amd/display: remove minimum Dispclk and apply oem panel timing.

Danny Wang (1):
  drm/amd/display: Do not enable replay when vtotal update is pending.

Dillon Varone (1):
  drm/amd/display: Add Support for reg inbox0 for host->DMUB CMDs

George Shen (1):
  drm/amd/display: Implement PCON regulated autonomous mode handling

Joshua Aberback (1):
  drm/amd/display: Add more debug data to dmub_srv

Leo Li (1):
  drm/amd/display: Disable unneeded hpd interrupts during dm_init

Leo Zeng (1):
  drm/amd/display: Fix visual confirm color not updating

Leon Huang (1):
  drm/amd/display: Fix incorrect DPCD configs while Replay/PSR switch

Mario Limonciello (6):
  drm/amd/display: fix default brightness
  drm/amd/display: Restore correct backlight brightness after a GPU
reset
  drm/amd/display: Add and use new dm_prepare_suspend() callback
  drm/amd/display: Fix slab-use-after-free on hdcp_work
  drm/amd/display: Add scoped mutexes for amdgpu_dm_dhcp
  drm/amd/display: Drop unnecessary ret variable for enable_assr()

Peichen Huang (1):
  drm/amd/display: not abort link train when bw is low

Ryan Seto (1):
  drm/amd/display: Prevent VStartup Overflow

Taimur Hassan (1):
  drm/amd/display: Promote DAL to 3.2.324

Zhikai Zhai (1):
  drm/amd/display: calculate the remain segments for all pipes

Zhongwei Zhang (1):
  drm/amd/display: Correct timing_adjust_pending flag setting.

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c |  37 ++-
 .../amd/display/amdgpu_dm/amdgpu_dm_hdcp.c

[PATCH] drm/amd/amdgpu: Fix MES init sequence

2025-03-10 Thread Shaoyun Liu

When MES is been used , the set_hw_resource_1 API is required to
initialize MES internal context correctly

Signed-off-by: Shaoyun Liu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h  |  6 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |  9 ++--
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c   | 59 
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c   | 43 -
 4 files changed, 57 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 4391b3383f0c..78362a838212 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -143,9 +143,9 @@ struct amdgpu_mes {
const struct amdgpu_mes_funcs   *funcs;
 
/* mes resource_1 bo*/
-   struct amdgpu_bo*resource_1;
-   uint64_tresource_1_gpu_addr;
-   void*resource_1_addr;
+   struct amdgpu_bo*resource_1[AMDGPU_MAX_MES_PIPES];
+   uint64_tresource_1_gpu_addr[AMDGPU_MAX_MES_PIPES];
+   void*resource_1_addr[AMDGPU_MAX_MES_PIPES];
 
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
index ab7e73d0e7b1..0bb8cbe0dcc0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
@@ -614,10 +614,11 @@ static int amdgpu_virt_write_vf2pf_data(struct 
amdgpu_device *adev)
vf2pf_info->decode_usage = 0;
 
vf2pf_info->dummy_page_addr = (uint64_t)adev->dummy_page_addr;
-   vf2pf_info->mes_info_addr = (uint64_t)adev->mes.resource_1_gpu_addr;
-
-   if (adev->mes.resource_1) {
-   vf2pf_info->mes_info_size = adev->mes.resource_1->tbo.base.size;
+   if (amdgpu_sriov_is_mes_info_enable(adev)) {
+   vf2pf_info->mes_info_addr =
+   (uint64_t)(adev->mes.resource_1_gpu_addr[0] + 
AMDGPU_GPU_PAGE_SIZE);
+   vf2pf_info->mes_info_size =
+   adev->mes.resource_1[0]->tbo.base.size - 
AMDGPU_GPU_PAGE_SIZE;
}
vf2pf_info->checksum =
amd_sriov_msg_checksum(
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index a569d09a1a74..9cec2bb2f9ca 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -751,10 +751,13 @@ static int mes_v11_0_set_hw_resources_1(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
mes_set_hw_res_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
mes_set_hw_res_pkt.enable_mes_info_ctx = 1;
-   mes_set_hw_res_pkt.mes_info_ctx_mc_addr = mes->resource_1_gpu_addr;
-   mes_set_hw_res_pkt.mes_info_ctx_size = MES11_HW_RESOURCE_1_SIZE;
-   mes_set_hw_res_pkt.cleaner_shader_fence_mc_addr =
-   mes->resource_1_gpu_addr + MES11_HW_RESOURCE_1_SIZE;
+
+   mes_set_hw_res_pkt.cleaner_shader_fence_mc_addr = 
mes->resource_1_gpu_addr[0];
+   if (amdgpu_sriov_is_mes_info_enable(adev)) {
+   mes_set_hw_res_pkt.mes_info_ctx_mc_addr =
+   mes->resource_1_gpu_addr[0] + AMDGPU_GPU_PAGE_SIZE;
+   mes_set_hw_res_pkt.mes_info_ctx_size = MES11_HW_RESOURCE_1_SIZE;
+   }
 
return mes_v11_0_submit_pkt_and_poll_completion(mes,
&mes_set_hw_res_pkt, sizeof(mes_set_hw_res_pkt),
@@ -1392,7 +1395,7 @@ static int mes_v11_0_mqd_sw_init(struct amdgpu_device 
*adev,
 static int mes_v11_0_sw_init(struct amdgpu_ip_block *ip_block)
 {
struct amdgpu_device *adev = ip_block->adev;
-   int pipe, r;
+   int pipe, r, bo_size;
 
adev->mes.funcs = &mes_v11_0_funcs;
adev->mes.kiq_hw_init = &mes_v11_0_kiq_hw_init;
@@ -1427,19 +1430,21 @@ static int mes_v11_0_sw_init(struct amdgpu_ip_block 
*ip_block)
if (r)
return r;
 
-   if (amdgpu_sriov_is_mes_info_enable(adev) ||
-   adev->gfx.enable_cleaner_shader) {
-   r = amdgpu_bo_create_kernel(adev,
-   MES11_HW_RESOURCE_1_SIZE + 
AMDGPU_GPU_PAGE_SIZE,
-   PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   &adev->mes.resource_1,
-   &adev->mes.resource_1_gpu_addr,
-   &adev->mes.resource_1_addr);
-   if (r) {
-   dev_err(adev->dev, "(%d) failed to create mes 
resource_1 bo\n", r);
-   return r;
-   }
+   bo_size = AMDGPU_GPU_PAGE_SIZE;
+   if (amdgpu_sriov_is_mes_info_enable(adev))
+   bo_size += MES11_HW_RESOURCE_1_SIZE;
+
+   /* Only needed for AMDGPU_MES_SCHED_PIPE on MES 11*/
+   r = amdgpu_bo_create_kernel(adev,
+   bo_size,
+

RE: [PATCH V7 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA

2025-03-10 Thread Zhang, Jesse(Jie)

[AMD Official Use Only - AMD Internal Distribution Only]

Ping on this series?

-Original Message-
From: jesse.zh...@amd.com 
Sent: Wednesday, March 5, 2025 11:25 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Koenig, Christian 
; Lazar, Lijo ; Zhu, Jiadong 
; Zhang, Jesse(Jie) ; Zhang, 
Jesse(Jie) 
Subject: [PATCH V7 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation 
for SDMA

From: "jesse.zh...@amd.com" 

This commit updates the VM flush implementation for the SDMA engine.

- Added a new function `sdma_v4_4_2_get_invalidate_req` to construct the 
VM_INVALIDATE_ENG0_REQ
  register value for the specified VMID and flush type. This function ensures 
that all relevant
  page table cache levels (L1 PTEs, L2 PTEs, and L2 PDEs) are invalidated.

- Modified the `sdma_v4_4_2_ring_emit_vm_flush` function to use the new 
`sdma_v4_4_2_get_invalidate_req`
  function. The updated function emits the necessary register writes and waits 
to perform a VM flush
  for the specified VMID. It updates the PTB address registers and issues a VM 
invalidation request
  using the specified VM invalidation engine.

- Included the necessary header file `gc/gc_9_0_sh_mask.h` to provide access to 
the required register
  definitions.

v2: vm flush by the vm inalidation packet (Lijo)
v3: code stle and define thh macro for the vm invalidation packet (Christian)

Suggested-by: Lijo Lazar 
Signed-off-by: Jesse Zhang 
---
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c  | 77 +++
 .../gpu/drm/amd/amdgpu/vega10_sdma_pkt_open.h | 55 +
 2 files changed, 118 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c 
b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
index fd34dc138081..554e14b56c31 100644
--- a/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
+++ b/drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c
@@ -31,6 +31,7 @@
 #include "amdgpu_ucode.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_reset.h"
+#include "gc/gc_9_0_sh_mask.h"

 #include "sdma/sdma_4_4_2_offset.h"
 #include "sdma/sdma_4_4_2_sh_mask.h"
@@ -1292,21 +1293,71 @@ static void sdma_v4_4_2_ring_emit_pipeline_sync(struct 
amdgpu_ring *ring)
   seq, 0x, 4);
 }

-
-/**
- * sdma_v4_4_2_ring_emit_vm_flush - vm flush using sDMA
+/*
+ * sdma_v4_4_2_get_invalidate_req - Construct the
+VM_INVALIDATE_ENG0_REQ register value
+ * @vmid: The VMID to invalidate
+ * @flush_type: The type of flush (0 = legacy, 1 = lightweight, 2 =
+heavyweight)
  *
- * @ring: amdgpu_ring pointer
- * @vmid: vmid number to use
- * @pd_addr: address
+ * This function constructs the VM_INVALIDATE_ENG0_REQ register value
+for the specified VMID
+ * and flush type. It ensures that all relevant page table cache levels
+(L1 PTEs, L2 PTEs, and
+ * L2 PDEs) are invalidated.
+ */
+static uint32_t sdma_v4_4_2_get_invalidate_req(unsigned int vmid,
+   uint32_t flush_type)
+{
+   u32 req = 0;
+
+   req = REG_SET_FIELD(req, VM_INVALIDATE_ENG0_REQ,
+   PER_VMID_INVALIDATE_REQ, 1 << vmid);
+   req = REG_SET_FIELD(req, VM_INVALIDATE_ENG0_REQ, FLUSH_TYPE, 
flush_type);
+   req = REG_SET_FIELD(req, VM_INVALIDATE_ENG0_REQ, INVALIDATE_L2_PTES, 1);
+   req = REG_SET_FIELD(req, VM_INVALIDATE_ENG0_REQ, INVALIDATE_L2_PDE0, 1);
+   req = REG_SET_FIELD(req, VM_INVALIDATE_ENG0_REQ, INVALIDATE_L2_PDE1, 1);
+   req = REG_SET_FIELD(req, VM_INVALIDATE_ENG0_REQ, INVALIDATE_L2_PDE2, 1);
+   req = REG_SET_FIELD(req, VM_INVALIDATE_ENG0_REQ, INVALIDATE_L1_PTES, 1);
+   req = REG_SET_FIELD(req, VM_INVALIDATE_ENG0_REQ,
+   CLEAR_PROTECTION_FAULT_STATUS_ADDR, 0);
+
+   return req;
+}
+
+/*
+ * sdma_v4_4_2_ring_emit_vm_flush - Emit VM flush commands for SDMA
+ * @ring: The SDMA ring
+ * @vmid: The VMID to flush
+ * @pd_addr: The page directory address
  *
- * Update the page table base and flush the VM TLB
- * using sDMA.
+ * This function emits the necessary register writes and waits to
+ perform a VM flush for the
+ * specified VMID. It updates the PTB address registers and issues a VM
+ invalidation request
+ * using the specified VM invalidation engine.
  */
 static void sdma_v4_4_2_ring_emit_vm_flush(struct amdgpu_ring *ring,
-unsigned vmid, uint64_t pd_addr)
+   unsigned int vmid, uint64_t pd_addr)
 {
-   amdgpu_gmc_emit_flush_gpu_tlb(ring, vmid, pd_addr);
+   struct amdgpu_device *adev = ring->adev;
+   uint32_t req = sdma_v4_4_2_get_invalidate_req(vmid, 0);
+   unsigned int eng = ring->vm_inv_eng;
+   struct amdgpu_vmhub *hub = &adev->vmhub[ring->vm_hub];
+
+   amdgpu_ring_emit_wreg(ring, hub->ctx0_ptb_addr_lo32 +
+  (hub->ctx_addr_distance * vmid),
+  lower_32_bits(pd_addr));
+
+amdgpu_ring_emit_wreg(ring, hub->ctx0_ptb_addr_hi32 +
+

Re: [PATCH] drm/amd/amdgpu: Fix MES init sequence

2025-03-10 Thread Alex Deucher

On Mon, Mar 10, 2025 at 1:58 PM Shaoyun Liu  wrote:
>
> When MES is been used , the set_hw_resource_1 API is required to
> initialize MES internal context correctly
>
> Signed-off-by: Shaoyun Liu 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h  |  6 +--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c |  6 +--
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c   | 52 +++-
>  drivers/gpu/drm/amd/amdgpu/mes_v12_0.c   | 40 --
>  4 files changed, 48 insertions(+), 56 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> index 4391b3383f0c..78362a838212 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
> @@ -143,9 +143,9 @@ struct amdgpu_mes {
> const struct amdgpu_mes_funcs   *funcs;
>
> /* mes resource_1 bo*/
> -   struct amdgpu_bo*resource_1;
> -   uint64_tresource_1_gpu_addr;
> -   void*resource_1_addr;
> +   struct amdgpu_bo*resource_1[AMDGPU_MAX_MES_PIPES];
> +   uint64_tresource_1_gpu_addr[AMDGPU_MAX_MES_PIPES];
> +   void*resource_1_addr[AMDGPU_MAX_MES_PIPES];
>
>  };
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> index ab7e73d0e7b1..980dfb8935b6 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c
> @@ -614,10 +614,10 @@ static int amdgpu_virt_write_vf2pf_data(struct 
> amdgpu_device *adev)
> vf2pf_info->decode_usage = 0;
>
> vf2pf_info->dummy_page_addr = (uint64_t)adev->dummy_page_addr;
> -   vf2pf_info->mes_info_addr = (uint64_t)adev->mes.resource_1_gpu_addr;
> +   vf2pf_info->mes_info_addr = 
> (uint64_t)adev->mes.resource_1_gpu_addr[0];
>
> -   if (adev->mes.resource_1) {
> -   vf2pf_info->mes_info_size = 
> adev->mes.resource_1->tbo.base.size;
> +   if (adev->mes.resource_1[0]) {
> +   vf2pf_info->mes_info_size = 
> adev->mes.resource_1[0]->tbo.base.size;
> }
> vf2pf_info->checksum =
> amd_sriov_msg_checksum(
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> index a569d09a1a74..299f17868822 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> @@ -751,10 +751,10 @@ static int mes_v11_0_set_hw_resources_1(struct 
> amdgpu_mes *mes)
> mes_set_hw_res_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
> mes_set_hw_res_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
> mes_set_hw_res_pkt.enable_mes_info_ctx = 1;
> -   mes_set_hw_res_pkt.mes_info_ctx_mc_addr = mes->resource_1_gpu_addr;
> +   mes_set_hw_res_pkt.mes_info_ctx_mc_addr = mes->resource_1_gpu_addr[0];
> mes_set_hw_res_pkt.mes_info_ctx_size = MES11_HW_RESOURCE_1_SIZE;
> mes_set_hw_res_pkt.cleaner_shader_fence_mc_addr =
> -   mes->resource_1_gpu_addr + MES11_HW_RESOURCE_1_SIZE;
> +   mes->resource_1_gpu_addr[0] + MES11_HW_RESOURCE_1_SIZE;

This offset here will need to be adjusted if MES11_HW_RESOURCE_1_SIZE
depends on SR-IOV.  See below.

>
> return mes_v11_0_submit_pkt_and_poll_completion(mes,
> &mes_set_hw_res_pkt, sizeof(mes_set_hw_res_pkt),
> @@ -1392,7 +1392,7 @@ static int mes_v11_0_mqd_sw_init(struct amdgpu_device 
> *adev,
>  static int mes_v11_0_sw_init(struct amdgpu_ip_block *ip_block)
>  {
> struct amdgpu_device *adev = ip_block->adev;
> -   int pipe, r;
> +   int pipe, r, bo_size;
>
> adev->mes.funcs = &mes_v11_0_funcs;
> adev->mes.kiq_hw_init = &mes_v11_0_kiq_hw_init;
> @@ -1427,19 +1427,21 @@ static int mes_v11_0_sw_init(struct amdgpu_ip_block 
> *ip_block)
> if (r)
> return r;
>
> -   if (amdgpu_sriov_is_mes_info_enable(adev) ||
> -   adev->gfx.enable_cleaner_shader) {
> -   r = amdgpu_bo_create_kernel(adev,
> -   MES11_HW_RESOURCE_1_SIZE + 
> AMDGPU_GPU_PAGE_SIZE,
> -   PAGE_SIZE,
> -   AMDGPU_GEM_DOMAIN_VRAM,
> -   &adev->mes.resource_1,
> -   &adev->mes.resource_1_gpu_addr,
> -   &adev->mes.resource_1_addr);
> -   if (r) {
> -   dev_err(adev->dev, "(%d) failed to create mes 
> resource_1 bo\n", r);
> -   return r;
> -   }
> +   bo_size = AMDGPU_GPU_PAGE_SIZE;
> +   if (amdgpu_sriov_is_mes_info_enable(adev)
> +   bo_size += MES11_HW_RESOURCE_1_SIZE;

if you make the size depend on amdgpu_sriov_is_mes_info_enable(), it
will break the address for
mes_set_hw_res_pkt.cleaner_shader_fence_mc_addr above when SR

Re: [PATCH] drm/amdgpu/gfx: delete stray tabs

2025-03-10 Thread SRINIVASAN SHANMUGAM



On 3/10/2025 4:17 PM, Dan Carpenter wrote:

These lines are indented one tab too far.  Delete the extra tabs.

Signed-off-by: Dan Carpenter
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index a194bf3347cb..984e6ff6e463 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -2002,8 +2002,8 @@ void amdgpu_gfx_enforce_isolation_handler(struct 
work_struct *work)
if (adev->kfd.init_complete) {
WARN_ON_ONCE(!adev->gfx.kfd_sch_inactive[idx]);
WARN_ON_ONCE(adev->gfx.kfd_sch_req_count[idx]);
-   amdgpu_amdkfd_start_sched(adev, idx);
-   adev->gfx.kfd_sch_inactive[idx] = false;
+   amdgpu_amdkfd_start_sched(adev, idx);
+   adev->gfx.kfd_sch_inactive[idx] = false;
}
}
mutex_unlock(&adev->enforce_isolation_mutex);


Thanks!

Reviewed-by: Srinivasan Shanmugam

[PATCH] drm/amdgpu: Use tabs for indenting in amdgpu_sdma_reset_engine()

2025-03-10 Thread Dan Carpenter

This line has a seven space indent instead of a tab.

Signed-off-by: Dan Carpenter 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
index 39669f8788a7..3a4cef896018 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sdma.c
@@ -621,5 +621,5 @@ int amdgpu_sdma_reset_engine(struct amdgpu_device *adev, 
uint32_t instance_id, b
if (suspend_user_queues)
amdgpu_amdkfd_resume(adev, false);
 
-   return ret;
+   return ret;
 }
-- 
2.47.2

[PATCH] drm/amdgpu: correct the runtime deference for mes_userq_mqd_create()

2025-03-10 Thread Prike Liang

When the runtime resume failed, then the runtime uage decreased at
free_mqd. So the runtime resume error handler doesn't need to decrease
the runtime usage separately.

Fixes: 4baa0dcac737 ("drm/amdgpu: validate return value of pm_runtime_get_sync")
Signed-off-by: Prike Liang 
---
 drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c 
b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
index 35ae4125cd83..b469b800119f 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
@@ -291,8 +291,7 @@ static int mes_userq_mqd_create(struct amdgpu_userq_mgr 
*uq_mgr,
r = pm_runtime_get_sync(adev_to_drm(adev)->dev);
if (r < 0) {
dev_err(adev->dev, "pm_runtime_get_sync() failed for userqueue 
mqd create\n");
-   pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
-   goto free_mqd;
+   goto deference_pm;
}
 
r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, 
userq_props);
@@ -330,6 +329,7 @@ static int mes_userq_mqd_create(struct amdgpu_userq_mgr 
*uq_mgr,
 free_mqd:
amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
+deference_pm:
pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
 
 free_props:
-- 
2.34.1

Re: [PATCH] drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flags

2025-03-10 Thread Alex Deucher

Applied.  Thanks!

On Mon, Mar 10, 2025 at 2:03 PM Christian König
 wrote:
>
> Am 10.03.25 um 18:08 schrieb Natalie Vock:
> > PRT BOs may not have any backing store, so bo->tbo.resource will be
> > NULL. Check for that before dereferencing.
> >
> > Fixes: 0cce5f285d9ae8 ("drm/amdkfd: Check correct memory types for 
> > is_system variable")
> > Signed-off-by: Natalie Vock 
>
> Reviewed-by: Christian König 
>
> > ---
> >  drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c 
> > b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
> > index ea7c32d8380ba..bf8d01da88154 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/gmc_v12_0.c
> > @@ -528,8 +528,9 @@ static void gmc_v12_0_get_vm_pte(struct amdgpu_device 
> > *adev,
> >
> >   bo_adev = amdgpu_ttm_adev(bo->tbo.bdev);
> >   coherent = bo->flags & AMDGPU_GEM_CREATE_COHERENT;
> > - is_system = (bo->tbo.resource->mem_type == TTM_PL_TT) ||
> > - (bo->tbo.resource->mem_type == AMDGPU_PL_PREEMPT);
> > + is_system = bo->tbo.resource &&
> > + (bo->tbo.resource->mem_type == TTM_PL_TT ||
> > +  bo->tbo.resource->mem_type == AMDGPU_PL_PREEMPT);
> >
> >   if (bo && bo->flags & AMDGPU_GEM_CREATE_GFX12_DCC)
> >   *flags |= AMDGPU_PTE_DCC;
> > --
> > 2.48.1
> >
>

Re: [PATCH 2/2] drm/amdgpu: Make use of drm_wedge_app_info

2025-03-10 Thread Alex Deucher

On Mon, Mar 10, 2025 at 5:54 PM André Almeida  wrote:
>
> Em 01/03/2025 03:04, Raag Jadav escreveu:
> > On Fri, Feb 28, 2025 at 06:49:43PM -0300, André Almeida wrote:
> >> Hi Raag,
> >>
> >> On 2/28/25 11:58, Raag Jadav wrote:
> >>> On Fri, Feb 28, 2025 at 09:13:53AM -0300, André Almeida wrote:
>  To notify userspace about which app (if any) made the device get in a
>  wedge state, make use of drm_wedge_app_info parameter, filling it with
>  the app PID and name.
> 
>  Signed-off-by: André Almeida 
>  ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 19 +--
> drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  6 +-
> 2 files changed, 22 insertions(+), 3 deletions(-)
> 
>  diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
>  b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>  index 00b9b87dafd8..e06adf6f34fd 100644
>  --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>  +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
>  @@ -6123,8 +6123,23 @@ int amdgpu_device_gpu_recover(struct 
>  amdgpu_device *adev,
> atomic_set(&adev->reset_domain->reset_res, r);
>  -  if (!r)
>  -  drm_dev_wedged_event(adev_to_drm(adev), 
>  DRM_WEDGE_RECOVERY_NONE, NULL);
>  +  if (!r) {
>  +  struct drm_wedge_app_info aux, *info = NULL;
>  +
>  +  if (job) {
>  +  struct amdgpu_task_info *ti;
>  +
>  +  ti = amdgpu_vm_get_task_info_pasid(adev, job->pasid);
>  +  if (ti) {
>  +  aux.pid = ti->pid;
>  +  aux.comm = ti->process_name;
>  +  info = &aux;
>  +  amdgpu_vm_put_task_info(ti);
>  +  }
>  +  }
> >>> Is this guaranteed to be guilty app and not some scheduled worker?
> >>
> >> This is how amdgpu decides which app is the guilty one earlier in the code
> >> as in the print:
> >>
> >>  ti = amdgpu_vm_get_task_info_pasid(ring->adev, job->pasid);
> >>
> >>  "Process information: process %s pid %d thread %s pid %d\n"
> >>
> >> So I think it's consistent with what the driver thinks it's the guilty
> >> process.
> >
> > Sure, but with something like app_info we're kind of hinting to userspace
> > that an application was _indeed_ involved with reset. Is that also 
> > guaranteed?
> >
> > Is it possible that an application needlessly suffers from a false positive
> > scenario (reset due to other factors)?
> >
>
> I asked Alex Deucher in IRC about that and yes, there's a chance that
> this is a false positive. However, for the majority of cases this is the
> right app that caused the hang. This is what amdgpu is doing for GL
> robustness as well and devcoredump, so it's very consistent with how
> amdgpu deals with this scenario even if the mechanism is still not perfect.

It's usually the guilty one, but it's not guaranteed.  For example,
say you have a ROCm user queue and a gfx job submitted to a kernel
queue.  The actual guilty job may be the ROCm user queue, but the
driver may not detect that the ROCm queue was hung until some other
event (e.g., memory pressure).  However, the timer for the gfx job may
timeout before that happens on the ROCm queue so in that case the gfx
job would be incorrectly considered guilty.

Alex

Re: [PATCH v3] drm/amdgpu: Trigger a wedged event for ring reset

2025-03-10 Thread Alex Deucher

Applied.  Thanks


On Tue, Mar 4, 2025 at 4:29 AM Christian König  wrote:
>
> Am 25.02.25 um 02:02 schrieb André Almeida:
> > Instead of only triggering a wedged event for complete GPU resets,
> > trigger for ring resets. Regardless of the reset, it's useful for
> > userspace to know that it happened because the kernel will reject
> > further submissions from that app.
> >
> > Signed-off-by: André Almeida 
>
> Reviewed-by: Christian König 
>
> Sorry for the delay, have been on sick leave for nearly two weeks.
>
> Regards,
> Christian.
>
> > ---
> > v3: do only for ring resets, no soft recoveries
> > v2: Keep the wedge event in amdgpu_device_gpu_recover() and add and
> > extra check to avoid triggering two events.
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > index 698e5799e542..760a720c842e 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> > @@ -150,6 +150,7 @@ static enum drm_gpu_sched_stat 
> > amdgpu_job_timedout(struct drm_sched_job *s_job)
> >   if (amdgpu_ring_sched_ready(ring))
> >   drm_sched_start(&ring->sched, 0);
> >   dev_err(adev->dev, "Ring %s reset succeeded\n", 
> > ring->sched.name);
> > + drm_dev_wedged_event(adev_to_drm(adev), 
> > DRM_WEDGE_RECOVERY_NONE);
> >   goto exit;
> >   }
> >   dev_err(adev->dev, "Ring %s reset failure\n", 
> > ring->sched.name);
>

Re: [PATCH] drm/amdgpu: correct the runtime deference for mes_userq_mqd_create()

2025-03-10 Thread Alex Deucher

On Mon, Mar 10, 2025 at 9:33 AM Prike Liang  wrote:
>
> When the runtime resume failed, then the runtime uage decreased at
> free_mqd. So the runtime resume error handler doesn't need to decrease
> the runtime usage separately.
>
> Fixes: 4baa0dcac737 ("drm/amdgpu: validate return value of 
> pm_runtime_get_sync")
> Signed-off-by: Prike Liang 

Reviewed-by: Alex Deucher 

> ---
>  drivers/gpu/drm/amd/amdgpu/mes_userqueue.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c 
> b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> index 35ae4125cd83..b469b800119f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_userqueue.c
> @@ -291,8 +291,7 @@ static int mes_userq_mqd_create(struct amdgpu_userq_mgr 
> *uq_mgr,
> r = pm_runtime_get_sync(adev_to_drm(adev)->dev);
> if (r < 0) {
> dev_err(adev->dev, "pm_runtime_get_sync() failed for 
> userqueue mqd create\n");
> -   pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
> -   goto free_mqd;
> +   goto deference_pm;
> }
>
> r = mqd_hw_default->init_mqd(adev, (void *)queue->mqd.cpu_ptr, 
> userq_props);
> @@ -330,6 +329,7 @@ static int mes_userq_mqd_create(struct amdgpu_userq_mgr 
> *uq_mgr,
>  free_mqd:
> amdgpu_userqueue_destroy_object(uq_mgr, &queue->mqd);
> pm_runtime_mark_last_busy(adev_to_drm(adev)->dev);
> +deference_pm:
> pm_runtime_put_autosuspend(adev_to_drm(adev)->dev);
>
>  free_props:
> --
> 2.34.1
>

[PATCH] drm/amdgpu/gfx: delete stray tabs

RE: [PATCH 01/11] drm/amdgpu: add parameter to disable kernel queues

RE: [PATCH 00/22] DC Patches Mar 10 2025

[PATCH] drm/amd/amdgpu: Fix MES init sequence

RE: [PATCH V7 3/3] drm/amdgpu/sdma_v4_4_2: update VM flush implementation for SDMA

Re: [PATCH] drm/amd/amdgpu: Fix MES init sequence

Re: [PATCH] drm/amdgpu/gfx: delete stray tabs

[PATCH] drm/amdgpu: Use tabs for indenting in amdgpu_sdma_reset_engine()

[PATCH] drm/amdgpu: correct the runtime deference for mes_userq_mqd_create()

Re: [PATCH] drm/amdgpu: NULL-check BO's backing store when determining GFX12 PTE flags

Re: [PATCH 2/2] drm/amdgpu: Make use of drm_wedge_app_info

Re: [PATCH v3] drm/amdgpu: Trigger a wedged event for ring reset

Re: [PATCH] drm/amdgpu: correct the runtime deference for mes_userq_mqd_create()

13 matches

Site Navigation

Mail list logo

Footer information