date:20250214

Re: [PATCH] drm/amdgpu: Fix crashes in enforce_isolation sysfs handling on non-supported systems

2025-02-14 Thread Christian König

Am 13.02.25 um 18:50 schrieb Srinivasan Shanmugam:
> By adding these NULL pointer checks and improving error handling, we can
> prevent crashes when the enforce_isolation sysfs file is accessed on
> non-supported systems.
>
> Cc: Christian König 
> Cc: Alex Deucher 
> Signed-off-by: Srinivasan Shanmugam 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 17 -
>  1 file changed, 12 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 27f5318c3a26..bf0bf6382b65 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -1777,20 +1777,27 @@ static int 
> amdgpu_gfx_sysfs_isolation_shader_init(struct amdgpu_device *adev)
>  {
>   int r;
>  
> + if (!adev->gfx.enable_cleaner_shader)
> + return -EINVAL;
> +

NAK to that, enforce isolation should be available even without the cleaner 
shader.

Christian.

>   r = device_create_file(adev->dev, &dev_attr_enforce_isolation);
>   if (r)
>   return r;
> - if (adev->gfx.enable_cleaner_shader)
> - r = device_create_file(adev->dev, &dev_attr_run_cleaner_shader);
>  
> - return r;
> + r = device_create_file(adev->dev, &dev_attr_run_cleaner_shader);
> + if (r)
> + return r;
> +
> + return 0;
>  }
>  
>  static void amdgpu_gfx_sysfs_isolation_shader_fini(struct amdgpu_device 
> *adev)
>  {
> + if (!adev->gfx.enable_cleaner_shader)
> + return;
> +
>   device_remove_file(adev->dev, &dev_attr_enforce_isolation);
> - if (adev->gfx.enable_cleaner_shader)
> - device_remove_file(adev->dev, &dev_attr_run_cleaner_shader);
> + device_remove_file(adev->dev, &dev_attr_run_cleaner_shader);
>  }
>  
>  static int amdgpu_gfx_sysfs_reset_mask_init(struct amdgpu_device *adev)

Re: [PATCH 2/3] drm/amdgpu: Do not write to GRBM_CNTL if Aldebaran SRIOV

2025-02-14 Thread Lazar, Lijo




On 2/14/2025 5:43 AM, Victor Lu wrote:
> Aldebaran SRIOV VF does not have write permissions to GRBM_CTNL.
> This access can be skipped to avoid a dmesg warning.
> 
> Signed-off-by: Victor Lu 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index ccdfe7c37517..569a76835918 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -2637,7 +2637,9 @@ static void gfx_v9_0_constants_init(struct 
> amdgpu_device *adev)
>   u32 tmp;
>   int i;
>  
> - WREG32_FIELD15_RLC(GC, 0, GRBM_CNTL, READ_TIMEOUT, 0xff);
> + if (!amdgpu_sriov_vf(adev) || (adev->asic_type != CHIP_ALDEBARAN)) {

Please switch to IP version checks -
amdgpu_ip_version(adev, GC_HWIP, 0) != IP_VERSION(9, 4, 2)

Thanks,
Lijo

> + WREG32_FIELD15_RLC(GC, 0, GRBM_CNTL, READ_TIMEOUT, 0xff);
> + }
>  
>   gfx_v9_0_tiling_mode_table_init(adev);
>

[PATCH][next] drm/amd/pm: Avoid multiple -Wflex-array-member-not-at-end warnings

2025-02-14 Thread Gustavo A. R. Silva

-Wflex-array-member-not-at-end was introduced in GCC-14, and we are
getting ready to enable it, globally.

So, in order to avoid ending up with a flexible-array member in the
middle of other structs, we use the `struct_group_tagged()` helper
to create a new tagged `struct NISLANDS_SMC_SWSTATE_HDR` (and `struct
SISLANDS_SMC_SWSTATE_HDR`). This structures group together all the
members of the flexible `struct NISLANDS_SMC_SWSTATE` (and `struct
SISLANDS_SMC_SWSTATE`) except the flexible array.

As a result, the array is effectively separated from the rest of the
members without modifying the memory layout of the flexible structure.
We then change the type of the middle struct members currently causing
trouble from `struct NISLANDS_SMC_SWSTATE` to `struct
NISLANDS_SMC_SWSTATE_HDR` (and from `struct SISLANDS_SMC_SWSTATE` to
`struct SISLANDS_SMC_SWSTATE_HDR`).

We also want to ensure that when new members need to be added to the
flexible structure, they are always included within the newly created
tagged struct. For this, we use `static_assert()`. This ensures that
the memory layout for both the flexible structure and the new tagged
struct is the same after any changes.

This approach avoids having to implement `struct NISLANDS_SMC_SWSTATE_HDR`
(and `struct SISLANDS_SMC_SWSTATE_HDR`) as a completely separate structure,
thus preventing having to maintain two independent but basically identical
structures, closing the door to potential bugs in the future.

We also use `container_of()` whenever we need to retrieve a pointer to
the flexible structure, through which we can access the flexible-array
member, if necessary.

So, with this changes, fix the following warnings:

drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/sislands_smc.h:218:49: warning: 
structure containing a flexible array member is not at the end of another 
structure [-Wflex-array-member-not-at-end]
drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/si_dpm.h:819:41: warning: structure 
containing a flexible array member is not at the end of another structure 
[-Wflex-array-member-not-at-end]
drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/si_dpm.h:818:41: warning: structure 
containing a flexible array member is not at the end of another structure 
[-Wflex-array-member-not-at-end]
drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/si_dpm.h:817:41: warning: structure 
containing a flexible array member is not at the end of another structure 
[-Wflex-array-member-not-at-end]
drivers/gpu/drm/amd/amdgpu/../pm/legacy-dpm/si_dpm.h:816:41: warning: structure 
containing a flexible array member is not at the end of another structure 
[-Wflex-array-member-not-at-end]

Signed-off-by: Gustavo A. R. Silva 
---
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c|  7 --
 drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.h| 23 +++
 .../gpu/drm/amd/pm/legacy-dpm/sislands_smc.h  | 15 
 3 files changed, 29 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c 
b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
index a87dcf0974bc..2c9d473d122f 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.c
@@ -5234,7 +5234,8 @@ static int si_init_smc_table(struct amdgpu_device *adev)
 
table->driverState.flags = table->initialState.flags;
table->driverState.levelCount = table->initialState.levelCount;
-   table->driverState.levels[0] = table->initialState.level;
+   container_of(&table->driverState, SISLANDS_SMC_SWSTATE, 
__hdr)->levels[0] =
+   
table->initialState.level;
 
ret = si_do_program_memory_timing_parameters(adev, amdgpu_boot_state,
 
SISLANDS_INITIAL_STATE_ARB_INDEX);
@@ -5755,7 +5756,9 @@ static int si_upload_sw_state(struct amdgpu_device *adev,
int ret;
u32 address = si_pi->state_table_start +
offsetof(SISLANDS_SMC_STATETABLE, driverState);
-   SISLANDS_SMC_SWSTATE *smc_state = &si_pi->smc_statetable.driverState;
+   SISLANDS_SMC_SWSTATE *smc_state =
+   container_of(&si_pi->smc_statetable.driverState,
+SISLANDS_SMC_SWSTATE, __hdr);
size_t state_size = struct_size(smc_state, levels,
new_state->performance_level_count);
memset(smc_state, 0, state_size);
diff --git a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.h 
b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.h
index 11cb7874a6bb..62530f89ebdf 100644
--- a/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.h
+++ b/drivers/gpu/drm/amd/pm/legacy-dpm/si_dpm.h
@@ -784,12 +784,17 @@ typedef struct NISLANDS_SMC_HW_PERFORMANCE_LEVEL 
NISLANDS_SMC_HW_PERFORMANCE_LEV
 
 struct NISLANDS_SMC_SWSTATE
 {
-uint8_t flags;
-uint8_t levelCount;
-uint8_t padding2;
-uint8_t

Re: [PATCH] drm/radeon/ci_dpm: Remove needless NULL checks of dpm tables

2025-02-14 Thread Nikita Zhandarovich

Gentle ping :)

On 1/14/25 16:58, Nikita Zhandarovich wrote:
> This patch removes useless NULL pointer checks in functions like
> ci_set_private_data_variables_based_on_pptable() and
> ci_setup_default_dpm_tables().
> 
> The pointers in question are initialized as addresses to existing
> structures such as rdev->pm.dpm.dyn_state.vddc_dependency_on_sclk by
> utilizing & operator and therefore are not in danger of being NULL.
> 
> Fix this by removing extra checks thus cleaning the code a tiny bit.
> 
> Found by Linux Verification Center (linuxtesting.org) with static
> analysis tool SVACE.
> 
> Fixes: cc8dbbb4f62a ("drm/radeon: add dpm support for CI dGPUs (v2)")
> Signed-off-by: Nikita Zhandarovich 
> ---
>  drivers/gpu/drm/radeon/ci_dpm.c | 34 ++
>  1 file changed, 10 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/radeon/ci_dpm.c b/drivers/gpu/drm/radeon/ci_dpm.c
> index abe9d65cc460..7c3a960f486a 100644
> --- a/drivers/gpu/drm/radeon/ci_dpm.c
> +++ b/drivers/gpu/drm/radeon/ci_dpm.c
> @@ -3405,12 +3405,8 @@ static int ci_setup_default_dpm_tables(struct 
> radeon_device *rdev)
>   &rdev->pm.dpm.dyn_state.cac_leakage_table;
>   u32 i;
>  
> - if (allowed_sclk_vddc_table == NULL)
> - return -EINVAL;
>   if (allowed_sclk_vddc_table->count < 1)
>   return -EINVAL;
> - if (allowed_mclk_table == NULL)
> - return -EINVAL;
>   if (allowed_mclk_table->count < 1)
>   return -EINVAL;
>  
> @@ -3468,24 +3464,20 @@ static int ci_setup_default_dpm_tables(struct 
> radeon_device *rdev)
>   pi->dpm_table.vddc_table.count = allowed_sclk_vddc_table->count;
>  
>   allowed_mclk_table = &rdev->pm.dpm.dyn_state.vddci_dependency_on_mclk;
> - if (allowed_mclk_table) {
> - for (i = 0; i < allowed_mclk_table->count; i++) {
> - pi->dpm_table.vddci_table.dpm_levels[i].value =
> - allowed_mclk_table->entries[i].v;
> - pi->dpm_table.vddci_table.dpm_levels[i].enabled = true;
> - }
> - pi->dpm_table.vddci_table.count = allowed_mclk_table->count;
> + for (i = 0; i < allowed_mclk_table->count; i++) {
> + pi->dpm_table.vddci_table.dpm_levels[i].value =
> + allowed_mclk_table->entries[i].v;
> + pi->dpm_table.vddci_table.dpm_levels[i].enabled = true;
>   }
> + pi->dpm_table.vddci_table.count = allowed_mclk_table->count;
>  
>   allowed_mclk_table = &rdev->pm.dpm.dyn_state.mvdd_dependency_on_mclk;
> - if (allowed_mclk_table) {
> - for (i = 0; i < allowed_mclk_table->count; i++) {
> - pi->dpm_table.mvdd_table.dpm_levels[i].value =
> - allowed_mclk_table->entries[i].v;
> - pi->dpm_table.mvdd_table.dpm_levels[i].enabled = true;
> - }
> - pi->dpm_table.mvdd_table.count = allowed_mclk_table->count;
> + for (i = 0; i < allowed_mclk_table->count; i++) {
> + pi->dpm_table.mvdd_table.dpm_levels[i].value =
> + allowed_mclk_table->entries[i].v;
> + pi->dpm_table.mvdd_table.dpm_levels[i].enabled = true;
>   }
> + pi->dpm_table.mvdd_table.count = allowed_mclk_table->count;
>  
>   ci_setup_default_pcie_tables(rdev);
>  
> @@ -4880,16 +4872,10 @@ static int 
> ci_set_private_data_variables_based_on_pptable(struct radeon_device *
>   struct radeon_clock_voltage_dependency_table *allowed_mclk_vddci_table =
>   &rdev->pm.dpm.dyn_state.vddci_dependency_on_mclk;
>  
> - if (allowed_sclk_vddc_table == NULL)
> - return -EINVAL;
>   if (allowed_sclk_vddc_table->count < 1)
>   return -EINVAL;
> - if (allowed_mclk_vddc_table == NULL)
> - return -EINVAL;
>   if (allowed_mclk_vddc_table->count < 1)
>   return -EINVAL;
> - if (allowed_mclk_vddci_table == NULL)
> - return -EINVAL;
>   if (allowed_mclk_vddci_table->count < 1)
>   return -EINVAL;
>

Re: [PATCH] Documentation/gpu: Add acronyms for some firmware components

2025-02-14 Thread Alex Deucher

On Fri, Feb 14, 2025 at 6:00 PM Rodrigo Siqueira  wrote:
>
> Users can check the file "/sys/kernel/debug/dri/0/amdgpu_firmware_info"
> to get information on the firmware loaded in the system. This file has
> multiple acronyms that are not documented in the glossary. This commit
> introduces some missing acronyms to the AMD glossary documentation. The
> meaning of each acronym in this commit was extracted from code
> documentation available in the following files:
>
> - drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> - drivers/gpu/drm/amd/include/amd_shared.h
>
> Cc: Mario Limonciello 
> Signed-off-by: Rodrigo Siqueira 
> ---
>  Documentation/gpu/amdgpu/amdgpu-glossary.rst | 21 
>  1 file changed, 21 insertions(+)
>
> diff --git a/Documentation/gpu/amdgpu/amdgpu-glossary.rst 
> b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> index 00a47ebb0b0f..3242db32b020 100644
> --- a/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> +++ b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> @@ -12,6 +12,9 @@ we have a dedicated glossary for Display Core at
>The number of CUs that are active on the system.  The number of active
>CUs may be less than SE * SH * CU depending on the board configuration.
>
> +CE
> +  Constant Engine
> +
>  CP
>Command Processor
>
> @@ -80,6 +83,9 @@ we have a dedicated glossary for Display Core at
>  KIQ
>Kernel Interface Queue
>
> +ME
> +  Micro Engine

This is part of Graphics so maybe something like:

ME
MicroEngine (Graphics)

> +
>  MEC
>MicroEngine Compute
>
> @@ -92,6 +98,9 @@ we have a dedicated glossary for Display Core at
>  MQD
>Memory Queue Descriptor
>
> +PFP
> +  Pre-Fetch Parser

This is also part of GFX.

PFP
Pre-Fetch Parser (Graphics)

> +
>  PPLib
>PowerPlay Library - PowerPlay is the power management component.
>
> @@ -110,14 +119,26 @@ we have a dedicated glossary for Display Core at
>  SH
>SHader array
>
> +SMC
> +  System Management Controller
> +
>  SMU
>System Management Unit

These two are synonyms.

How about
SMU / SMC
System Management Unit / System Management Controller

Other than that, looks good.

Alex

>
>  SS
>Spread Spectrum
>
> +TA
> +  Trusted Application
> +
> +UVD
> +  Unified Video Decoder
> +
>  VCE
>Video Compression Engine
>
>  VCN
>Video Codec Next
> +
> +VPE
> +  Video Processing Engine
> --
> 2.48.1
>

Re: [PATCH] Documentation/gpu: Add acronyms for some firmware components

2025-02-14 Thread Rodrigo Siqueira

On 02/14, Alex Deucher wrote:
> On Fri, Feb 14, 2025 at 6:00 PM Rodrigo Siqueira  wrote:
> >
> > Users can check the file "/sys/kernel/debug/dri/0/amdgpu_firmware_info"
> > to get information on the firmware loaded in the system. This file has
> > multiple acronyms that are not documented in the glossary. This commit
> > introduces some missing acronyms to the AMD glossary documentation. The
> > meaning of each acronym in this commit was extracted from code
> > documentation available in the following files:
> >
> > - drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > - drivers/gpu/drm/amd/include/amd_shared.h
> >
> > Cc: Mario Limonciello 
> > Signed-off-by: Rodrigo Siqueira 
> > ---
> >  Documentation/gpu/amdgpu/amdgpu-glossary.rst | 21 
> >  1 file changed, 21 insertions(+)
> >
> > diff --git a/Documentation/gpu/amdgpu/amdgpu-glossary.rst 
> > b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> > index 00a47ebb0b0f..3242db32b020 100644
> > --- a/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> > +++ b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> > @@ -12,6 +12,9 @@ we have a dedicated glossary for Display Core at
> >The number of CUs that are active on the system.  The number of 
> > active
> >CUs may be less than SE * SH * CU depending on the board 
> > configuration.
> >
> > +CE
> > +  Constant Engine
> > +
> >  CP
> >Command Processor
> >
> > @@ -80,6 +83,9 @@ we have a dedicated glossary for Display Core at
> >  KIQ
> >Kernel Interface Queue
> >
> > +ME
> > +  Micro Engine
> 
> This is part of Graphics so maybe something like:
> 
> ME
> MicroEngine (Graphics)
> 
> > +
> >  MEC
> >MicroEngine Compute
> >
> > @@ -92,6 +98,9 @@ we have a dedicated glossary for Display Core at
> >  MQD
> >Memory Queue Descriptor
> >
> > +PFP
> > +  Pre-Fetch Parser
> 
> This is also part of GFX.
> 
> PFP
> Pre-Fetch Parser (Graphics)
> 
> > +
> >  PPLib
> >PowerPlay Library - PowerPlay is the power management component.
> >
> > @@ -110,14 +119,26 @@ we have a dedicated glossary for Display Core at
> >  SH
> >SHader array
> >
> > +SMC
> > +  System Management Controller
> > +
> >  SMU
> >System Management Unit
> 
> These two are synonyms.
> 
> How about
> SMU / SMC
> System Management Unit / System Management Controller
> 
> Other than that, looks good.
>

Thanks a lot for all the suggestions; I'll make those changes for the
V2.

btw, from the amdgpu_firmware_info, I did not find the meaning of the
below acronyms, could you help me with that?

MC
SRL(C|G|S)
IMU
ASD
TOC

Thanks
Siqueira
 
> Alex
> 
> >
> >  SS
> >Spread Spectrum
> >
> > +TA
> > +  Trusted Application
> > +
> > +UVD
> > +  Unified Video Decoder
> > +
> >  VCE
> >Video Compression Engine
> >
> >  VCN
> >Video Codec Next
> > +
> > +VPE
> > +  Video Processing Engine
> > --
> > 2.48.1
> >

RE: [PATCH] drm/amdgpu: simplify xgmi peer info calls

2025-02-14 Thread Kim, Jonathan

[Public]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Friday, February 14, 2025 12:58 AM
> To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: simplify xgmi peer info calls
>
>
>
> On 2/13/2025 9:20 PM, Kim, Jonathan wrote:
> > [Public]
> >
> >> -Original Message-
> >> From: Lazar, Lijo 
> >> Sent: Thursday, February 13, 2025 1:35 AM
> >> To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org
> >> Subject: Re: [PATCH] drm/amdgpu: simplify xgmi peer info calls
> >>
> >>
> >>
> >> On 2/12/2025 9:27 PM, Jonathan Kim wrote:
> >>> Deprecate KFD XGMI peer info calls in favour of calling directly from
> >>> simplified XGMI peer info functions.
> >>>
> >>> Signed-off-by: Jonathan Kim 
> >>> ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 42 --
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  5 ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 51 +-
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  6 +--
> >>>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c  | 11 +++--
> >>>  5 files changed, 48 insertions(+), 67 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >>> index 0312231b703e..4cec3a873995 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >>> @@ -555,48 +555,6 @@ int amdgpu_amdkfd_get_dmabuf_info(struct
> >> amdgpu_device *adev, int dma_buf_fd,
> >>> return r;
> >>>  }
> >>>
> >>> -uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct amdgpu_device *dst,
> >>> - struct amdgpu_device *src)
> >>> -{
> >>> -   struct amdgpu_device *peer_adev = src;
> >>> -   struct amdgpu_device *adev = dst;
> >>> -   int ret = amdgpu_xgmi_get_hops_count(adev, peer_adev);
> >>> -
> >>> -   if (ret < 0) {
> >>> -   DRM_ERROR("amdgpu: failed to get  xgmi hops count between
> >> node %d and %d. ret = %d\n",
> >>> -   adev->gmc.xgmi.physical_node_id,
> >>> -   peer_adev->gmc.xgmi.physical_node_id, ret);
> >>> -   ret = 0;
> >>> -   }
> >>> -   return  (uint8_t)ret;
> >>> -}
> >>> -
> >>> -int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
> >>> -   struct amdgpu_device *src,
> >>> -   bool is_min)
> >>> -{
> >>> -   struct amdgpu_device *adev = dst, *peer_adev;
> >>> -   int num_links;
> >>> -
> >>> -   if (amdgpu_ip_version(adev, GC_HWIP, 0) < IP_VERSION(9, 4, 2))
> >>> -   return 0;
> >>> -
> >>> -   if (src)
> >>> -   peer_adev = src;
> >>> -
> >>> -   /* num links returns 0 for indirect peers since indirect route is 
> >>> unknown. */
> >>> -   num_links = is_min ? 1 : amdgpu_xgmi_get_num_links(adev, peer_adev);
> >>> -   if (num_links < 0) {
> >>> -   DRM_ERROR("amdgpu: failed to get xgmi num links between
> >> node %d and %d. ret = %d\n",
> >>> -   adev->gmc.xgmi.physical_node_id,
> >>> -   peer_adev->gmc.xgmi.physical_node_id, num_links);
> >>> -   num_links = 0;
> >>> -   }
> >>> -
> >>> -   /* Aldebaran xGMI DPM is defeatured so assume x16 x 25Gbps for
> >> bandwidth. */
> >>> -   return (num_links * 16 * 25000)/BITS_PER_BYTE;
> >>> -}
> >>> -
> >>>  int amdgpu_amdkfd_get_pcie_bandwidth_mbytes(struct amdgpu_device *adev,
> >> bool is_min)
> >>>  {
> >>> int num_lanes_shift = (is_min ? ffs(adev->pm.pcie_mlw_mask) :
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> >>> index 092dbd8bec97..28eb1cd0eb5a 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> >>> @@ -255,11 +255,6 @@ int amdgpu_amdkfd_get_dmabuf_info(struct
> >> amdgpu_device *adev, int dma_buf_fd,
> >>>   uint64_t *bo_size, void *metadata_buffer,
> >>>   size_t buffer_size, uint32_t *metadata_size,
> >>>   uint32_t *flags, int8_t *xcp_id);
> >>> -uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct amdgpu_device *dst,
> >>> - struct amdgpu_device *src);
> >>> -int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
> >>> -   struct amdgpu_device *src,
> >>> -   bool is_min);
> >>>  int amdgpu_amdkfd_get_pcie_bandwidth_mbytes(struct amdgpu_device *adev,
> >> bool is_min);
> >>>  int amdgpu_amdkfd_send_close_event_drain_irq(struct amdgpu_device *adev,
> >>> uint32_t *payload);
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >>> index 74b4349e345a..d18d2a26cc91 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
> >>> +++ b/drivers/gpu/drm/amd/am

[PATCH 07/16] drm/amd/display: Add clear DCC and Tiling callback for DCE

2025-02-14 Thread Roman.Li

From: Rodrigo Siqueira 

Introduce the DCC and Tiling reset callback to all DCE versions that can
call it.

Reviewed-by: Alvin Lee 
Signed-off-by: Rodrigo Siqueira 
Signed-off-by: Roman Li 
---
 .../gpu/drm/amd/display/dc/core/dc_surface.c   | 18 ++
 .../amd/display/dc/dce60/dce60_hw_sequencer.c  |  1 +
 .../amd/display/dc/hwss/dce100/dce100_hwseq.c  |  1 +
 .../amd/display/dc/hwss/dce110/dce110_hwseq.c  |  2 ++
 .../amd/display/dc/hwss/dce120/dce120_hwseq.c  |  2 ++
 .../amd/display/dc/hwss/dce80/dce80_hwseq.c|  1 +
 6 files changed, 9 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_surface.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_surface.c
index 691b4a68d8ac..e6fcc21bb9bc 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_surface.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_surface.c
@@ -290,21 +290,7 @@ void dc_plane_force_dcc_and_tiling_disable(struct 
dc_plane_state *plane_state,
if (!pipe_ctx)
continue;
 
-   if (dc->ctx->dce_version >= DCE_VERSION_MAX) {
-   if (dc->hwss.clear_surface_dcc_and_tiling)
-   dc->hwss.clear_surface_dcc_and_tiling(pipe_ctx, 
plane_state, clear_tiling);
-   } else {
-   struct mem_input *mi = pipe_ctx->plane_res.mi;
-   if (!mi)
-   continue;
-   /* if framebuffer is tiled, disable tiling */
-   if (clear_tiling && mi->funcs->mem_input_clear_tiling)
-   mi->funcs->mem_input_clear_tiling(mi);
-
-   /* force page flip to see the new content of the 
framebuffer */
-   mi->funcs->mem_input_program_surface_flip_and_addr(mi,
-  
&plane_state->address,
-  
true);
-   }
+   if (dc->hwss.clear_surface_dcc_and_tiling)
+   dc->hwss.clear_surface_dcc_and_tiling(pipe_ctx, 
plane_state, clear_tiling);
}
 }
diff --git a/drivers/gpu/drm/amd/display/dc/dce60/dce60_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/dce60/dce60_hw_sequencer.c
index 1fdeef47e4dc..44b56490e152 100644
--- a/drivers/gpu/drm/amd/display/dc/dce60/dce60_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/dce60/dce60_hw_sequencer.c
@@ -428,5 +428,6 @@ void dce60_hw_sequencer_construct(struct dc *dc)
dc->hwss.pipe_control_lock = dce60_pipe_control_lock;
dc->hwss.prepare_bandwidth = dce100_prepare_bandwidth;
dc->hwss.optimize_bandwidth = dce100_optimize_bandwidth;
+   dc->hwss.clear_surface_dcc_and_tiling = 
dce100_reset_surface_dcc_and_tiling;
 }
 
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.c
index b76350a9cf5f..0d7e28260db1 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.c
@@ -138,6 +138,7 @@ void dce100_hw_sequencer_construct(struct dc *dc)
dc->hwseq->funcs.enable_display_power_gating = 
dce100_enable_display_power_gating;
dc->hwss.prepare_bandwidth = dce100_prepare_bandwidth;
dc->hwss.optimize_bandwidth = dce100_optimize_bandwidth;
+   dc->hwss.clear_surface_dcc_and_tiling = 
dce100_reset_surface_dcc_and_tiling;
 }
 
 /**
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
index 935d08d3a670..8280e3652171 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
@@ -33,6 +33,7 @@
 #include "dce110_hwseq.h"
 #include "dce110/dce110_timing_generator.h"
 #include "dce/dce_hwseq.h"
+#include "dce100/dce100_hwseq.h"
 #include "gpio_service_interface.h"
 
 #include "dce110/dce110_compressor.h"
@@ -3332,6 +,7 @@ static const struct hw_sequencer_funcs dce110_funcs = {
.post_unlock_program_front_end = dce110_post_unlock_program_front_end,
.update_plane_addr = update_plane_addr,
.update_pending_status = dce110_update_pending_status,
+   .clear_surface_dcc_and_tiling = dce100_reset_surface_dcc_and_tiling,
.enable_accelerated_mode = dce110_enable_accelerated_mode,
.enable_timing_synchronization = dce110_enable_timing_synchronization,
.enable_per_frame_crtc_position_reset = 
dce110_enable_per_frame_crtc_position_reset,
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce120/dce120_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dce120/dce120_hwseq.c
index 22ee304ef9cf..2a62f63d0357 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce120/dce120_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce120/dce120_hwseq.c
@@ -29,6 +29,7 @@
 #include "dce120_hwseq.h"
 #i

[PATCH 02/16] drm/amd/display: Don't treat wb connector as physical in create_validate_stream_for_sink

2025-02-14 Thread Roman.Li

From: Harry Wentland 

Don't try to operate on a drm_wb_connector as an amdgpu_dm_connector.
While dereferencing aconnector->base will "work" it's wrong and
might lead to unknown bad things. Just... don't.

Reviewed-by: Alex Hung 
Signed-off-by: Harry Wentland 
Signed-off-by: Roman Li 
---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 26 ---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +-
 .../display/amdgpu_dm/amdgpu_dm_mst_types.c   |  6 ++---
 3 files changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index b26ae1dd1fd7..b1b5f352b9aa 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -7448,12 +7448,12 @@ static enum dc_status 
dm_validate_stream_and_context(struct dc *dc,
 }
 
 struct dc_stream_state *
-create_validate_stream_for_sink(struct amdgpu_dm_connector *aconnector,
+create_validate_stream_for_sink(struct drm_connector *connector,
const struct drm_display_mode *drm_mode,
const struct dm_connector_state *dm_state,
const struct dc_stream_state *old_stream)
 {
-   struct drm_connector *connector = &aconnector->base;
+   struct amdgpu_dm_connector *aconnector = NULL;
struct amdgpu_device *adev = drm_to_adev(connector->dev);
struct dc_stream_state *stream;
const struct drm_connector_state *drm_state = dm_state ? 
&dm_state->base : NULL;
@@ -7464,8 +7464,12 @@ create_validate_stream_for_sink(struct 
amdgpu_dm_connector *aconnector,
if (!dm_state)
return NULL;
 
-   if (aconnector->dc_link->connector_signal == SIGNAL_TYPE_HDMI_TYPE_A ||
-   aconnector->dc_link->dpcd_caps.dongle_type == 
DISPLAY_DONGLE_DP_HDMI_CONVERTER)
+   if (connector->connector_type != DRM_MODE_CONNECTOR_WRITEBACK)
+   aconnector = to_amdgpu_dm_connector(connector);
+
+   if (aconnector &&
+   (aconnector->dc_link->connector_signal == SIGNAL_TYPE_HDMI_TYPE_A ||
+aconnector->dc_link->dpcd_caps.dongle_type == 
DISPLAY_DONGLE_DP_HDMI_CONVERTER))
bpc_limit = 8;
 
do {
@@ -7477,10 +7481,11 @@ create_validate_stream_for_sink(struct 
amdgpu_dm_connector *aconnector,
break;
}
 
-   if (aconnector->base.connector_type == 
DRM_MODE_CONNECTOR_WRITEBACK)
+   dc_result = dc_validate_stream(adev->dm.dc, stream);
+
+   if (!aconnector) /* writeback connector */
return stream;
 
-   dc_result = dc_validate_stream(adev->dm.dc, stream);
if (dc_result == DC_OK && stream->signal == 
SIGNAL_TYPE_DISPLAY_PORT_MST)
dc_result = dm_dp_mst_is_port_support_mode(aconnector, 
stream);
 
@@ -7510,7 +7515,7 @@ create_validate_stream_for_sink(struct 
amdgpu_dm_connector *aconnector,
 __func__, __LINE__);
 
aconnector->force_yuv420_output = true;
-   stream = create_validate_stream_for_sink(aconnector, drm_mode,
+   stream = create_validate_stream_for_sink(connector, drm_mode,
dm_state, old_stream);
aconnector->force_yuv420_output = false;
}
@@ -7525,6 +7530,9 @@ enum drm_mode_status 
amdgpu_dm_connector_mode_valid(struct drm_connector *connec
struct dc_sink *dc_sink;
/* TODO: Unhardcode stream count */
struct dc_stream_state *stream;
+   /* we always have an amdgpu_dm_connector here since we got
+* here via the amdgpu_dm_connector_helper_funcs
+*/
struct amdgpu_dm_connector *aconnector = 
to_amdgpu_dm_connector(connector);
 
if ((mode->flags & DRM_MODE_FLAG_INTERLACE) ||
@@ -7549,7 +7557,7 @@ enum drm_mode_status 
amdgpu_dm_connector_mode_valid(struct drm_connector *connec
 
drm_mode_set_crtcinfo(mode, 0);
 
-   stream = create_validate_stream_for_sink(aconnector, mode,
+   stream = create_validate_stream_for_sink(connector, mode,
 
to_dm_connector_state(connector->state),
 NULL);
if (stream) {
@@ -10600,7 +10608,7 @@ static int dm_update_crtc_state(struct 
amdgpu_display_manager *dm,
if (!drm_atomic_crtc_needs_modeset(new_crtc_state))
goto skip_modeset;
 
-   new_stream = create_validate_stream_for_sink(aconnector,
+   new_stream = create_validate_stream_for_sink(connector,
 
&new_crtc_state->mode,
 dm_new_conn_state,
 
dm_old

[PATCH 12/16] drm/amd/display: Support BT2020 YCbCr fullrange

2025-02-14 Thread Roman.Li

From: Ilya Bakoulin 

[Why/How]
Need to add support for full-range quantization for YCbCr in BT2020
color space.

Reviewed-by: Krunoslav Kovac 
Signed-off-by: Ilya Bakoulin 
Signed-off-by: Roman Li 
Tested-by: Robert Mader 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c   | 6 +++---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c   | 2 +-
 drivers/gpu/drm/amd/display/dc/basics/dc_common.c   | 3 ++-
 drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c   | 5 +++--
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c   | 4 ++--
 drivers/gpu/drm/amd/display/dc/dc_hw_types.h| 4 +++-
 drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c | 3 ++-
 .../gpu/drm/amd/display/dc/dio/dcn10/dcn10_stream_encoder.c | 3 ++-
 .../amd/display/dc/dio/dcn401/dcn401_dio_stream_encoder.c   | 3 ++-
 .../amd/display/dc/hpo/dcn31/dcn31_hpo_dp_stream_encoder.c  | 3 ++-
 drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h | 6 +-
 .../gpu/drm/amd/display/modules/info_packet/info_packet.c   | 4 ++--
 12 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index b1b5f352b9aa..4ae54b3573ba 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5616,9 +5616,9 @@ fill_plane_color_attributes(const struct drm_plane_state 
*plane_state,
 
case DRM_COLOR_YCBCR_BT2020:
if (full_range)
-   *color_space = COLOR_SPACE_2020_YCBCR;
+   *color_space = COLOR_SPACE_2020_YCBCR_FULL;
else
-   return -EINVAL;
+   *color_space = COLOR_SPACE_2020_YCBCR_LIMITED;
break;
 
default:
@@ -6114,7 +6114,7 @@ get_output_color_space(const struct dc_crtc_timing 
*dc_crtc_timing,
if (dc_crtc_timing->pixel_encoding == PIXEL_ENCODING_RGB)
color_space = COLOR_SPACE_2020_RGB_FULLRANGE;
else
-   color_space = COLOR_SPACE_2020_YCBCR;
+   color_space = COLOR_SPACE_2020_YCBCR_LIMITED;
break;
case DRM_MODE_COLORIMETRY_DEFAULT: // ITU601
default:
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
index 049046c60462..c7d13e743e6c 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
@@ -1169,7 +1169,7 @@ static int amdgpu_current_colorspace_show(struct seq_file 
*m, void *data)
case COLOR_SPACE_2020_RGB_FULLRANGE:
seq_puts(m, "BT2020_RGB");
break;
-   case COLOR_SPACE_2020_YCBCR:
+   case COLOR_SPACE_2020_YCBCR_LIMITED:
seq_puts(m, "BT2020_YCC");
break;
default:
diff --git a/drivers/gpu/drm/amd/display/dc/basics/dc_common.c 
b/drivers/gpu/drm/amd/display/dc/basics/dc_common.c
index b2fc4f8e6482..a51c2701da24 100644
--- a/drivers/gpu/drm/amd/display/dc/basics/dc_common.c
+++ b/drivers/gpu/drm/amd/display/dc/basics/dc_common.c
@@ -40,7 +40,8 @@ bool is_rgb_cspace(enum dc_color_space output_color_space)
case COLOR_SPACE_YCBCR709:
case COLOR_SPACE_YCBCR601_LIMITED:
case COLOR_SPACE_YCBCR709_LIMITED:
-   case COLOR_SPACE_2020_YCBCR:
+   case COLOR_SPACE_2020_YCBCR_LIMITED:
+   case COLOR_SPACE_2020_YCBCR_FULL:
return false;
default:
/* Add a case to switch */
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c
index 6eb9bae3af91..6b514fd03f16 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c
@@ -176,7 +176,7 @@ static bool is_ycbcr2020_type(
 {
bool ret = false;
 
-   if (color_space == COLOR_SPACE_2020_YCBCR)
+   if (color_space == COLOR_SPACE_2020_YCBCR_LIMITED || color_space == 
COLOR_SPACE_2020_YCBCR_FULL)
ret = true;
return ret;
 }
@@ -247,7 +247,8 @@ void color_space_to_black_color(
case COLOR_SPACE_YCBCR709_BLACK:
case COLOR_SPACE_YCBCR601_LIMITED:
case COLOR_SPACE_YCBCR709_LIMITED:
-   case COLOR_SPACE_2020_YCBCR:
+   case COLOR_SPACE_2020_YCBCR_LIMITED:
+   case COLOR_SPACE_2020_YCBCR_FULL:
*black_color = black_color_format[BLACK_COLOR_FORMAT_YUV_CV];
break;
 
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index e6bc479497e8..7eb91612b60d 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -4488,7 +4488,7 @@ static void set_avi_info_frame(

[PATCH 08/16] drm/amd/display: Print seamless boot message in mark_seamless_boot_stream

2025-02-14 Thread Roman.Li

From: Alex Hung 

[WHAT & HOW]
Add a message so users know the stream will be used for seamless boot.

Reviewed-by: Mario Limonciello 
Reviewed-by: Rodrigo Siqueira 
Signed-off-by: Alex Hung 
Signed-off-by: Roman Li 
---
 drivers/gpu/drm/amd/display/dc/core/dc_resource.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
index bf14fa1e3771..e6bc479497e8 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_resource.c
@@ -3764,6 +3764,8 @@ static void mark_seamless_boot_stream(const struct dc  
*dc,
 {
struct dc_bios *dcb = dc->ctx->dc_bios;
 
+   DC_LOGGER_INIT(dc->ctx->logger);
+
if (stream->apply_seamless_boot_optimization)
return;
if (!dc->config.allow_seamless_boot_optimization)
@@ -3772,7 +3774,7 @@ static void mark_seamless_boot_stream(const struct dc  
*dc,
return;
if (dc_validate_boot_timing(dc, stream->sink, &stream->timing)) {
stream->apply_seamless_boot_optimization = true;
-   DC_LOG_INFO("Marked stream for seamless boot optimization\n");
+   DC_LOG_DC("Marked stream for seamless boot optimization\n");
}
 }
 
-- 
2.34.1

[PATCH 14/16] drm/amd/display: dpia should avoid encoder used by dp2

2025-02-14 Thread Roman.Li

From: Peichen Huang 

[WHY]
In current HPO DP2 implementation, driver would enable/disable DIG
encoder when configuring HPO DP2. Therefore, usb4 dp tunnelling should
not use the DIG encoder if the corresponded phy is used by a HPO DP2
stream.

[HOW]
A DP2 stream is treated as a dig stream.

Reviewed-by: Meenakshikumar Somasundaram 
Signed-off-by: Peichen Huang 
Signed-off-by: Roman Li 
---
 .../drm/amd/display/dc/core/dc_link_enc_cfg.c| 16 ++--
 1 file changed, 2 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_enc_cfg.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_link_enc_cfg.c
index 08b4258b0e2f..814f68d76257 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_link_enc_cfg.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_link_enc_cfg.c
@@ -44,20 +44,8 @@ static bool is_dig_link_enc_stream(struct dc_stream_state 
*stream)
 * yet match.
 */
if (link_enc && 
((uint32_t)stream->link->connector_signal & link_enc->output_signals)) {
-   if (dc_is_dp_signal(stream->signal)) {
-   /* DIGs do not support DP2.0 streams 
with 128b/132b encoding. */
-   struct dc_link_settings link_settings = 
{0};
-
-   
stream->ctx->dc->link_srv->dp_decide_link_settings(stream, &link_settings);
-   if ((link_settings.link_rate >= 
LINK_RATE_LOW) &&
-   link_settings.link_rate 
<= LINK_RATE_HIGH3) {
-   is_dig_stream = true;
-   break;
-   }
-   } else {
-   is_dig_stream = true;
-   break;
-   }
+   is_dig_stream = true;
+   break;
}
}
}
-- 
2.34.1

[PATCH 15/16] drm/amd/display: Add support for disconnected eDP streams

2025-02-14 Thread Roman.Li

From: Harry VanZyllDeJong 

[Why]
eDP may not be connected to the GPU on driver start causing
fail enumeration.

[How]
Move the virtual signal type check before the eDP connector
signal check.

Reviewed-by: Wenjing Liu 
Signed-off-by: Harry VanZyllDeJong 
Signed-off-by: Roman Li 
---
 .../drm/amd/display/dc/link/protocols/link_dp_capability.c  | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c 
b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c
index 80439224acca..e3e7fcb07f19 100644
--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c
+++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c
@@ -945,6 +945,9 @@ bool link_decide_link_settings(struct dc_stream_state 
*stream,
 * TODO: add MST specific link training routine
 */
decide_mst_link_settings(link, link_setting);
+   } else if (stream->signal == SIGNAL_TYPE_VIRTUAL) {
+   link_setting->lane_count = LANE_COUNT_FOUR;
+   link_setting->link_rate = LINK_RATE_HIGH3;
} else if (link->connector_signal == SIGNAL_TYPE_EDP) {
/* enable edp link optimization for DSC eDP case */
if (stream->timing.flags.DSC) {
@@ -967,9 +970,6 @@ bool link_decide_link_settings(struct dc_stream_state 
*stream,
} else {
edp_decide_link_settings(link, link_setting, req_bw);
}
-   } else if (stream->signal == SIGNAL_TYPE_VIRTUAL) {
-   link_setting->lane_count = LANE_COUNT_FOUR;
-   link_setting->link_rate = LINK_RATE_HIGH3;
} else {
decide_dp_link_settings(link, link_setting, req_bw);
}
-- 
2.34.1

[PATCH 11/16] drm/amd/display: Add log for MALL entry on DCN32x

2025-02-14 Thread Roman.Li

From: Aurabindo Pillai 

[Why&How]
Add a dyndbg log entry to check whether the driver requested scanout
from MALL cache to PMFW via DMCUB

Reviewed-by: Zaeem Mohamed 
Reviewed-by: Roman Li 
Signed-off-by: Aurabindo Pillai 
---
 drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
index dd46db67d033..cd0adf72b223 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn32/dcn32_hwseq.c
@@ -316,10 +316,12 @@ bool dcn32_apply_idle_power_optimizations(struct dc *dc, 
bool enable)
cmd.cab.cab_alloc_ways = (uint8_t)ways;
 
dc_wake_and_execute_dmub_cmd(dc->ctx, &cmd, 
DM_DMUB_WAIT_TYPE_NO_WAIT);
+   DC_LOG_MALL("enable scanout from MALL");
 
return true;
}
 
+   DC_LOG_MALL("surface cannot fit in CAB, disabling scanout from 
MALL\n");
return false;
}
 
-- 
2.34.1

[PATCH 03/16] Revert "drm/amd/display: Request HW cursor on DCN3.2 with SubVP"

2025-02-14 Thread Roman.Li

From: Leo Zeng 

This reverts commit aaa44ed6cd8af2089d2bf6a2e66a0436fef9791f.

Reason to revert: idle power regression found in testing.

Reviewed-by: Dillon Varone 
Signed-off-by: Leo Zeng 
Signed-off-by: Roman Li 
---
 drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c 
b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
index 56dda686e299..6f490d8d7038 100644
--- a/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
+++ b/drivers/gpu/drm/amd/display/dc/dml/dcn32/dcn32_fpu.c
@@ -626,7 +626,6 @@ static bool dcn32_assign_subvp_pipe(struct dc *dc,
 * - Not TMZ surface
 */
if (pipe->plane_state && !pipe->top_pipe && 
!pipe->prev_odm_pipe && !dcn32_is_center_timing(pipe) &&
-   !pipe->stream->hw_cursor_req &&
!(pipe->stream->timing.pix_clk_100hz / 1 > 
DCN3_2_MAX_SUBVP_PIXEL_RATE_MHZ) &&
(!dcn32_is_psr_capable(pipe) || 
(context->stream_count == 1 && dc->caps.dmub_caps.subvp_psr)) &&
dc_state_get_pipe_subvp_type(context, pipe) == 
SUBVP_NONE &&
-- 
2.34.1

[PATCH 01/16] drm/amd/display: Exit idle optimizations before accessing PHY

2025-02-14 Thread Roman.Li

From: Ovidiu Bunea 

[why & how]
By default, DCN HW is in idle optimized state which does not allow access
to PHY registers. If BIOS powers up the DCN, it is fine because they will
power up everything. Only exit idle optimized state when not taking control
from VBIOS.

Fixes: 53f82eb16293 ("Revert "drm/amd/display: Exit idle optimizations before 
attempt to access PHY"")

Reviewed-by: Charlene Liu 
Signed-off-by: Ovidiu Bunea 
Signed-off-by: Roman Li 
---
 drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
index 7572448e5b9f..935d08d3a670 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce110/dce110_hwseq.c
@@ -1891,6 +1891,7 @@ void dce110_enable_accelerated_mode(struct dc *dc, struct 
dc_state *context)
bool can_apply_edp_fast_boot = false;
bool can_apply_seamless_boot = false;
bool keep_edp_vdd_on = false;
+   struct dc_bios *dcb = dc->ctx->dc_bios;
DC_LOGGER_INIT();
 
 
@@ -1967,6 +1968,8 @@ void dce110_enable_accelerated_mode(struct dc *dc, struct 
dc_state *context)
hws->funcs.edp_backlight_control(edp_link_with_sink, 
false);
}
/*resume from S3, no vbios posting, no need to power down 
again*/
+   if (dcb && dcb->funcs && !dcb->funcs->is_accelerated_mode(dcb))
+   clk_mgr_exit_optimized_pwr_state(dc, dc->clk_mgr);
 
power_down_all_hw_blocks(dc);
 
@@ -1979,6 +1982,8 @@ void dce110_enable_accelerated_mode(struct dc *dc, struct 
dc_state *context)
disable_vga_and_power_gate_all_controllers(dc);
if (edp_link_with_sink && !keep_edp_vdd_on)
dc->hwss.edp_power_control(edp_link_with_sink, false);
+   if (dcb && dcb->funcs && !dcb->funcs->is_accelerated_mode(dcb))
+   clk_mgr_optimize_pwr_state(dc, dc->clk_mgr);
}
bios_set_scratch_acc_mode_change(dc->ctx->dc_bios, 1);
 }
-- 
2.34.1

[PATCH 00/16] DC Patches February 14, 2025

2025-02-14 Thread Roman.Li

From: Roman Li 

Summary:

* Add support for disconnected eDP streams
* Add log for MALL entry on DCN32x
* Add DCC/Tiling reset helper for DCN and DCE
* Guard against setting dispclk low when active
* Other minor fixes

Cc: Daniel Wheeler 

Alex Hung (1):
  drm/amd/display: Print seamless boot message in
mark_seamless_boot_stream

Aurabindo Pillai (1):
  drm/amd/display: Add log for MALL entry on DCN32x

George Shen (1):
  drm/amd/display: Read LTTPR ALPM caps during link cap retrieval

Harry VanZyllDeJong (1):
  drm/amd/display: Add support for disconnected eDP streams

Harry Wentland (1):
  drm/amd/display: Don't treat wb connector as physical in
create_validate_stream_for_sink

Ilya Bakoulin (1):
  drm/amd/display: Support BT2020 YCbCr fullrange

Leo Zeng (1):
  Revert "drm/amd/display: Request HW cursor on DCN3.2 with SubVP"

Nicholas Kazlauskas (1):
  drm/amd/display: Guard against setting dispclk low when active

Oleh Kuzhylnyi (1):
  drm/amd/display: Add total_num_dpps_required field to informative
structure

Ovidiu Bunea (1):
  drm/amd/display: Exit idle optimizations before accessing PHY

Peichen Huang (1):
  drm/amd/display: dpia should avoid encoder used by dp2

Rodrigo Siqueira (4):
  drm/amd/display: Add DCC/Tiling reset helper for DCN and DCE
  drm/amd/display: Rename panic function
  drm/amd/display: Add clear DCC and Tiling callback for DCN
  drm/amd/display: Add clear DCC and Tiling callback for DCE

Taimur Hassan (1):
  drm/amd/display: 3.2.321

 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 32 ---
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.h |  2 +-
 .../amd/display/amdgpu_dm/amdgpu_dm_debugfs.c |  2 +-
 .../display/amdgpu_dm/amdgpu_dm_mst_types.c   |  6 ++--
 .../amd/display/amdgpu_dm/amdgpu_dm_plane.c   |  2 +-
 .../gpu/drm/amd/display/dc/basics/dc_common.c |  3 +-
 .../display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c  | 13 +---
 .../drm/amd/display/dc/core/dc_hw_sequencer.c |  5 +--
 .../drm/amd/display/dc/core/dc_link_enc_cfg.c | 16 ++
 .../gpu/drm/amd/display/dc/core/dc_resource.c |  8 +++--
 .../gpu/drm/amd/display/dc/core/dc_surface.c  | 31 +++---
 drivers/gpu/drm/amd/display/dc/dc.h   |  2 +-
 drivers/gpu/drm/amd/display/dc/dc_dp_types.h  | 12 +++
 drivers/gpu/drm/amd/display/dc/dc_hw_types.h  |  4 ++-
 drivers/gpu/drm/amd/display/dc/dc_plane.h |  4 +--
 .../amd/display/dc/dce/dce_stream_encoder.c   |  3 +-
 .../amd/display/dc/dce60/dce60_hw_sequencer.c |  1 +
 .../dc/dio/dcn10/dcn10_stream_encoder.c   |  3 +-
 .../dc/dio/dcn401/dcn401_dio_stream_encoder.c |  3 +-
 .../drm/amd/display/dc/dml/dcn32/dcn32_fpu.c  |  1 -
 .../display/dc/dml2/dml21/inc/dml_top_types.h |  4 +++
 .../src/dml2_core/dml2_core_dcn4_calcs.c  |  5 ++-
 .../hpo/dcn31/dcn31_hpo_dp_stream_encoder.c   |  3 +-
 .../amd/display/dc/hwss/dce100/dce100_hwseq.c | 30 +
 .../amd/display/dc/hwss/dce100/dce100_hwseq.h |  4 +++
 .../amd/display/dc/hwss/dce110/dce110_hwseq.c |  7 
 .../amd/display/dc/hwss/dce120/dce120_hwseq.c |  2 ++
 .../amd/display/dc/hwss/dce80/dce80_hwseq.c   |  1 +
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   | 29 +
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.h   |  4 +++
 .../amd/display/dc/hwss/dcn10/dcn10_init.c|  1 +
 .../amd/display/dc/hwss/dcn20/dcn20_init.c|  1 +
 .../amd/display/dc/hwss/dcn201/dcn201_init.c  |  1 +
 .../amd/display/dc/hwss/dcn21/dcn21_init.c|  1 +
 .../amd/display/dc/hwss/dcn30/dcn30_init.c|  1 +
 .../amd/display/dc/hwss/dcn301/dcn301_init.c  |  1 +
 .../amd/display/dc/hwss/dcn31/dcn31_init.c|  1 +
 .../amd/display/dc/hwss/dcn314/dcn314_init.c  |  1 +
 .../amd/display/dc/hwss/dcn32/dcn32_hwseq.c   |  2 ++
 .../amd/display/dc/hwss/dcn32/dcn32_init.c|  1 +
 .../amd/display/dc/hwss/dcn35/dcn35_init.c|  1 +
 .../amd/display/dc/hwss/dcn351/dcn351_init.c  |  1 +
 .../amd/display/dc/hwss/dcn401/dcn401_init.c  |  1 +
 .../drm/amd/display/dc/hwss/hw_sequencer.h|  1 +
 drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h   |  6 +++-
 .../dc/link/protocols/link_dp_capability.c| 12 ---
 .../display/modules/info_packet/info_packet.c |  4 +--
 47 files changed, 192 insertions(+), 87 deletions(-)

-- 
2.34.1

[PATCH 05/16] drm/amd/display: Rename panic function

2025-02-14 Thread Roman.Li

From: Rodrigo Siqueira 

Rename dc_plane_force_update_for_panic to
dc_plane_force_dcc_and_tiling_disable to describe the function operation
in the name. Also, this function might be used in other contexts, and a
more generic name can be helpful for this purpose.

Reviewed-by: Alvin Lee 
Signed-off-by: Rodrigo Siqueira 
Signed-off-by: Roman Li 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 2 +-
 drivers/gpu/drm/amd/display/dc/core/dc_surface.c| 4 ++--
 drivers/gpu/drm/amd/display/dc/dc_plane.h   | 4 ++--
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 774cc3f4f3fd..dcf2b98566ea 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -1430,7 +1430,7 @@ static void amdgpu_dm_plane_panic_flush(struct drm_plane 
*plane)
 
dc_plane_state = dm_plane_state->dc_state;
 
-   dc_plane_force_update_for_panic(dc_plane_state, fb->modifier ? true : 
false);
+   dc_plane_force_dcc_and_tiling_disable(dc_plane_state, fb->modifier ? 
true : false);
 }
 
 static const struct drm_plane_helper_funcs dm_plane_helper_funcs = {
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_surface.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_surface.c
index f3471d45b312..aa4184dd0e53 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_surface.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_surface.c
@@ -270,8 +270,8 @@ void dc_3dlut_func_retain(struct dc_3dlut *lut)
kref_get(&lut->refcount);
 }
 
-void dc_plane_force_update_for_panic(struct dc_plane_state *plane_state,
-bool clear_tiling)
+void dc_plane_force_dcc_and_tiling_disable(struct dc_plane_state *plane_state,
+  bool clear_tiling)
 {
struct dc *dc;
int i;
diff --git a/drivers/gpu/drm/amd/display/dc/dc_plane.h 
b/drivers/gpu/drm/amd/display/dc/dc_plane.h
index fabcefeda288..e9413685ed4f 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_plane.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_plane.h
@@ -34,7 +34,7 @@ const struct dc_plane_status *dc_plane_get_status(
 void dc_plane_state_retain(struct dc_plane_state *plane_state);
 void dc_plane_state_release(struct dc_plane_state *plane_state);
 
-void dc_plane_force_update_for_panic(struct dc_plane_state *plane_state,
-bool clear_tiling);
+void dc_plane_force_dcc_and_tiling_disable(struct dc_plane_state *plane_state,
+  bool clear_tiling);
 
 #endif /* _DC_PLANE_H_ */
-- 
2.34.1

[PATCH 13/16] drm/amd/display: Guard against setting dispclk low when active

2025-02-14 Thread Roman.Li

From: Nicholas Kazlauskas 

[Why]
We should never apply a minimum dispclk value while in prepare_bandwidth
or while displays are active. This is always an optimization for when
all displays are disabled.

[How]
Defer dispclk optimization until safe_to_lower = true and display_count
reaches 0.

Since 0 has a special value in this logic (ie. no dispclk required)
we also need adjust the logic that clamps it for the actual request
to PMFW.

Reviewed-by: Gabe Teeger 
Reviewed-by: Leo Chen 
Reviewed-by: Syed Hassan 
Signed-off-by: Nicholas Kazlauskas 
Signed-off-by: Roman Li 
---
 .../amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c| 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c
index 56800c573a71..df29d28d89c9 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_clk_mgr.c
@@ -467,14 +467,19 @@ void dcn35_update_clocks(struct clk_mgr *clk_mgr_base,
update_dppclk = true;
}
 
-   if (should_set_clock(safe_to_lower, new_clocks->dispclk_khz, 
clk_mgr_base->clks.dispclk_khz)) {
+   if (should_set_clock(safe_to_lower, new_clocks->dispclk_khz, 
clk_mgr_base->clks.dispclk_khz) &&
+   (new_clocks->dispclk_khz > 0 || (safe_to_lower && display_count == 
0))) {
+   int requested_dispclk_khz = new_clocks->dispclk_khz;
+
dcn35_disable_otg_wa(clk_mgr_base, context, safe_to_lower, 
true);
 
-   if (dc->debug.min_disp_clk_khz > 0 && new_clocks->dispclk_khz < 
dc->debug.min_disp_clk_khz)
-   new_clocks->dispclk_khz = dc->debug.min_disp_clk_khz;
+   /* Clamp the requested clock to PMFW based on their limit. */
+   if (dc->debug.min_disp_clk_khz > 0 && requested_dispclk_khz < 
dc->debug.min_disp_clk_khz)
+   requested_dispclk_khz = dc->debug.min_disp_clk_khz;
 
+   dcn35_smu_set_dispclk(clk_mgr, requested_dispclk_khz);
clk_mgr_base->clks.dispclk_khz = new_clocks->dispclk_khz;
-   dcn35_smu_set_dispclk(clk_mgr, clk_mgr_base->clks.dispclk_khz);
+
dcn35_disable_otg_wa(clk_mgr_base, context, safe_to_lower, 
false);
 
update_dispclk = true;
-- 
2.34.1

[PATCH 10/16] drm/amd/display: Add total_num_dpps_required field to informative structure

2025-02-14 Thread Roman.Li

From: Oleh Kuzhylnyi 

[Why]
The informative structure needs to be extended by the total number of DPPs
required per each active plane.
The new informative field is going to be used as a statistical indicator.

[How]
The dml2_core_calcs_get_informative() routine must count a total number of DPPs.

Reviewed-by: Austin Zheng 
Signed-off-by: Oleh Kuzhylnyi 
Signed-off-by: Roman Li 
---
 .../gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_types.h| 4 
 .../dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c   | 5 -
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_types.h 
b/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_types.h
index 19bce4084382..0dbf886d8926 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dml2/dml21/inc/dml_top_types.h
@@ -453,6 +453,10 @@ struct dml2_display_cfg_programming {
unsigned int meta_row_height_plane1;
} plane_info[DML2_MAX_PLANES];
 
+   struct {
+   unsigned int total_num_dpps_required;
+   } dpp;
+
struct {
unsigned long long total_surface_size_in_mall_bytes;
unsigned int 
subviewport_lines_needed_in_mall[DML2_MAX_PLANES];
diff --git 
a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c
 
b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c
index 87e53f59cb9f..78c93a502518 100644
--- 
a/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c
+++ 
b/drivers/gpu/drm/amd/display/dc/dml2/dml21/src/dml2_core/dml2_core_dcn4_calcs.c
@@ -13147,8 +13147,11 @@ void dml2_core_calcs_get_informative(const struct 
dml2_core_internal_display_mod
out->informative.watermarks.temp_read_or_ppt_watermark_us = 
dml_get_wm_temp_read_or_ppt(mode_lib);
 
out->informative.mall.total_surface_size_in_mall_bytes = 0;
-   for (k = 0; k < out->display_config.num_planes; ++k)
+   out->informative.dpp.total_num_dpps_required = 0;
+   for (k = 0; k < out->display_config.num_planes; ++k) {
out->informative.mall.total_surface_size_in_mall_bytes += 
mode_lib->mp.SurfaceSizeInTheMALL[k];
+   out->informative.dpp.total_num_dpps_required += 
mode_lib->mp.NoOfDPP[k];
+   }
 
out->informative.qos.min_return_latency_in_dcfclk = 
mode_lib->mp.min_return_latency_in_dcfclk;
out->informative.qos.urgent_latency_us = 
dml_get_urgent_latency(mode_lib);
-- 
2.34.1

[PATCH 04/16] drm/amd/display: Add DCC/Tiling reset helper for DCN and DCE

2025-02-14 Thread Roman.Li

From: Rodrigo Siqueira 

This commit introduces a function helper for resetting DCN/DCE DCC and
tiling. Those functions are generic for their respective DCN/DCE, so
they were added to the oldest version of each architecture.

Reviewed-by: Alvin Lee 
Signed-off-by: Rodrigo Siqueira 
Signed-off-by: Roman Li 
---
 .../amd/display/dc/hwss/dce100/dce100_hwseq.c | 29 +++
 .../amd/display/dc/hwss/dce100/dce100_hwseq.h |  4 +++
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.c   | 29 +++
 .../amd/display/dc/hwss/dcn10/dcn10_hwseq.h   |  4 +++
 4 files changed, 66 insertions(+)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.c
index f1f14796a3da..b76350a9cf5f 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.c
@@ -140,3 +140,32 @@ void dce100_hw_sequencer_construct(struct dc *dc)
dc->hwss.optimize_bandwidth = dce100_optimize_bandwidth;
 }
 
+/**
+ * dce100_reset_surface_dcc_and_tiling - Set DCC and tiling in DCE to their 
disable mode.
+ *
+ * @pipe_ctx: Pointer to the pipe context structure.
+ * @plane_state: Surface state
+ * @clear_tiling: If true set tiling to Linear, otherwise does not change 
tiling
+ *
+ * This function is responsible for call the HUBP block to disable DCC and set
+ * tiling to the linear mode.
+ */
+void dce100_reset_surface_dcc_and_tiling(struct pipe_ctx *pipe_ctx,
+   struct dc_plane_state *plane_state,
+   bool clear_tiling)
+{
+   struct mem_input *mi = pipe_ctx->plane_res.mi;
+
+   if (!mi)
+   return;
+
+   /* if framebuffer is tiled, disable tiling */
+   if (clear_tiling && mi->funcs->mem_input_clear_tiling)
+   mi->funcs->mem_input_clear_tiling(mi);
+
+   /* force page flip to see the new content of the framebuffer */
+   mi->funcs->mem_input_program_surface_flip_and_addr(mi,
+  
&plane_state->address,
+  true);
+}
+
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.h 
b/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.h
index 34518da20009..fadfa794f96b 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.h
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dce100/dce100_hwseq.h
@@ -46,5 +46,9 @@ bool dce100_enable_display_power_gating(struct dc *dc, 
uint8_t controller_id,
struct dc_bios *dcb,
enum pipe_gating_control power_gating);
 
+void dce100_reset_surface_dcc_and_tiling(struct pipe_ctx *pipe_ctx,
+   struct dc_plane_state *plane_state,
+   bool clear_tiling);
+
 #endif /* __DC_HWSS_DCE100_H__ */
 
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index 35c0d101d7c8..301ef36d3d05 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3920,3 +3920,32 @@ void dcn10_get_dcc_en_bits(struct dc *dc, int 
*dcc_en_bits)
dcc_en_bits[i] = s->dcc_en ? 1 : 0;
}
 }
+
+/**
+ * dcn10_reset_surface_dcc_and_tiling - Set DCC and tiling in DCN to their 
disable mode.
+ *
+ * @pipe_ctx: Pointer to the pipe context structure.
+ * @plane_state: Surface state
+ * @clear_tiling: If true set tiling to Linear, otherwise does not change 
tiling
+ *
+ * This function is responsible for call the HUBP block to disable DCC and set
+ * tiling to the linear mode.
+ */
+void dcn10_reset_surface_dcc_and_tiling(struct pipe_ctx *pipe_ctx,
+   struct dc_plane_state *plane_state,
+   bool clear_tiling)
+{
+   struct hubp *hubp = pipe_ctx->plane_res.hubp;
+
+   if (!hubp)
+   return;
+
+   /* if framebuffer is tiled, disable tiling */
+   if (clear_tiling && hubp->funcs->hubp_clear_tiling)
+   hubp->funcs->hubp_clear_tiling(hubp);
+
+   /* force page flip to see the new content of the framebuffer */
+   hubp->funcs->hubp_program_surface_flip_and_addr(hubp,
+   &plane_state->address,
+   true);
+}
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.h 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.h
index bc5dd68a2408..42ffd1e1299c 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.h
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.h
@@ -207,4 +207,8 @@ void dcn10_update_visual_confirm_color(
struct pipe_ctx *pipe_ctx,
int

[PATCH 16/16] drm/amd/display: 3.2.321

2025-02-14 Thread Roman.Li

From: Taimur Hassan 

Summary:

* Add support for disconnected eDP streams
* Add log for MALL entry on DCN32x
* Add DCC/Tiling reset helper for DCN and DCE
* Guard against setting dispclk low when active
* Other minor fixes

Reviewed-by: Aurabindo Pillai 
Signed-off-by: Taimur Hassan 
Signed-off-by: Roman Li 
---
 drivers/gpu/drm/amd/display/dc/dc.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc.h 
b/drivers/gpu/drm/amd/display/dc/dc.h
index ab88ce02893e..5e96913bcab1 100644
--- a/drivers/gpu/drm/amd/display/dc/dc.h
+++ b/drivers/gpu/drm/amd/display/dc/dc.h
@@ -53,7 +53,7 @@ struct aux_payload;
 struct set_config_cmd_payload;
 struct dmub_notification;
 
-#define DC_VER "3.2.320"
+#define DC_VER "3.2.321"
 
 /**
  * MAX_SURFACES - representative of the upper bound of surfaces that can be 
piped to a single CRTC
-- 
2.34.1

[PATCH 09/16] drm/amd/display: Read LTTPR ALPM caps during link cap retrieval

2025-02-14 Thread Roman.Li

From: George Shen 

[Why]
The latest DP spec requires the DP TX to read DPCD Fh through F0009h
when detecting LTTPR capabilities for the first time.

[How]
Update LTTPR cap retrieval to read up to F0009h (two more bytes than the
previous F0007h), and store the LTTPR ALPM capabilities.

Reviewed-by: Wenjing Liu 
Signed-off-by: George Shen 
Signed-off-by: Roman Li 
---
 drivers/gpu/drm/amd/display/dc/dc_dp_types.h | 12 
 .../display/dc/link/protocols/link_dp_capability.c   |  6 +-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/display/dc/dc_dp_types.h 
b/drivers/gpu/drm/amd/display/dc/dc_dp_types.h
index 94ce8fe74481..ae6e2d8552ac 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_dp_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_dp_types.h
@@ -959,6 +959,14 @@ union dp_128b_132b_supported_lttpr_link_rates {
uint8_t raw;
 };
 
+union dp_alpm_lttpr_cap {
+   struct {
+   uint8_t AUX_LESS_ALPM_SUPPORTED :1;
+   uint8_t RESERVED:7;
+   } bits;
+   uint8_t raw;
+};
+
 union dp_sink_video_fallback_formats {
struct {
uint8_t dp_1024x768_60Hz_24bpp_support  :1;
@@ -1118,6 +1126,7 @@ struct dc_lttpr_caps {
uint8_t max_ext_timeout;
union dp_main_link_channel_coding_lttpr_cap main_link_channel_coding;
union dp_128b_132b_supported_lttpr_link_rates supported_128b_132b_rates;
+   union dp_alpm_lttpr_cap alpm;
uint8_t aux_rd_interval[MAX_REPEATER_CNT - 1];
 };
 
@@ -1370,6 +1379,9 @@ struct dp_trace {
 #ifndef DPCD_MAX_UNCOMPRESSED_PIXEL_RATE_CAP
 #define DPCD_MAX_UNCOMPRESSED_PIXEL_RATE_CAP0x221c
 #endif
+#ifndef DP_LTTPR_ALPM_CAPABILITIES
+#define DP_LTTPR_ALPM_CAPABILITIES  0xF0009
+#endif
 #ifndef DP_REPEATER_CONFIGURATION_AND_STATUS_SIZE
 #define DP_REPEATER_CONFIGURATION_AND_STATUS_SIZE  0x50
 #endif
diff --git a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c 
b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c
index 44c3023a7731..80439224acca 100644
--- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c
+++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_dp_capability.c
@@ -1502,7 +1502,7 @@ static bool dpcd_read_sink_ext_caps(struct dc_link *link)
 
 enum dc_status dp_retrieve_lttpr_cap(struct dc_link *link)
 {
-   uint8_t lttpr_dpcd_data[8] = {0};
+   uint8_t lttpr_dpcd_data[10] = {0};
enum dc_status status;
bool is_lttpr_present;
 
@@ -1552,6 +1552,10 @@ enum dc_status dp_retrieve_lttpr_cap(struct dc_link 
*link)
lttpr_dpcd_data[DP_PHY_REPEATER_128B132B_RATES -

DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV];
 
+   link->dpcd_caps.lttpr_caps.alpm.raw =
+   lttpr_dpcd_data[DP_LTTPR_ALPM_CAPABILITIES -
+   
DP_LT_TUNABLE_PHY_REPEATER_FIELD_DATA_STRUCTURE_REV];
+
/* If this chip cap is set, at least one retimer must exist in the chain
 * Override count to 1 if we receive a known bad count (0 or an invalid 
value) */
if (((link->chip_caps & AMD_EXT_DISPLAY_PATH_CAPS__EXT_CHIP_MASK) == 
AMD_EXT_DISPLAY_PATH_CAPS__DP_FIXED_VS_EN) &&
-- 
2.34.1

[PATCH 06/16] drm/amd/display: Add clear DCC and Tiling callback for DCN

2025-02-14 Thread Roman.Li

From: Rodrigo Siqueira 

Introduce the DCC and Tiling reset callback to all DCN versions that can
call it.

Reviewed-by: Alvin Lee 
Signed-off-by: Rodrigo Siqueira 
Signed-off-by: Roman Li 
---
 drivers/gpu/drm/amd/display/dc/core/dc_surface.c| 13 ++---
 .../gpu/drm/amd/display/dc/hwss/dcn10/dcn10_init.c  |  1 +
 .../gpu/drm/amd/display/dc/hwss/dcn20/dcn20_init.c  |  1 +
 .../drm/amd/display/dc/hwss/dcn201/dcn201_init.c|  1 +
 .../gpu/drm/amd/display/dc/hwss/dcn21/dcn21_init.c  |  1 +
 .../gpu/drm/amd/display/dc/hwss/dcn30/dcn30_init.c  |  1 +
 .../drm/amd/display/dc/hwss/dcn301/dcn301_init.c|  1 +
 .../gpu/drm/amd/display/dc/hwss/dcn31/dcn31_init.c  |  1 +
 .../drm/amd/display/dc/hwss/dcn314/dcn314_init.c|  1 +
 .../gpu/drm/amd/display/dc/hwss/dcn32/dcn32_init.c  |  1 +
 .../gpu/drm/amd/display/dc/hwss/dcn35/dcn35_init.c  |  1 +
 .../drm/amd/display/dc/hwss/dcn351/dcn351_init.c|  1 +
 .../drm/amd/display/dc/hwss/dcn401/dcn401_init.c|  1 +
 drivers/gpu/drm/amd/display/dc/hwss/hw_sequencer.h  |  1 +
 14 files changed, 15 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_surface.c 
b/drivers/gpu/drm/amd/display/dc/core/dc_surface.c
index aa4184dd0e53..691b4a68d8ac 100644
--- a/drivers/gpu/drm/amd/display/dc/core/dc_surface.c
+++ b/drivers/gpu/drm/amd/display/dc/core/dc_surface.c
@@ -291,17 +291,8 @@ void dc_plane_force_dcc_and_tiling_disable(struct 
dc_plane_state *plane_state,
continue;
 
if (dc->ctx->dce_version >= DCE_VERSION_MAX) {
-   struct hubp *hubp = pipe_ctx->plane_res.hubp;
-   if (!hubp)
-   continue;
-   /* if framebuffer is tiled, disable tiling */
-   if (clear_tiling && hubp->funcs->hubp_clear_tiling)
-   hubp->funcs->hubp_clear_tiling(hubp);
-
-   /* force page flip to see the new content of the 
framebuffer */
-   hubp->funcs->hubp_program_surface_flip_and_addr(hubp,
-   
&plane_state->address,
-   true);
+   if (dc->hwss.clear_surface_dcc_and_tiling)
+   dc->hwss.clear_surface_dcc_and_tiling(pipe_ctx, 
plane_state, clear_tiling);
} else {
struct mem_input *mi = pipe_ctx->plane_res.mi;
if (!mi)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_init.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_init.c
index 5e51e1761707..079c226c1097 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_init.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_init.c
@@ -40,6 +40,7 @@ static const struct hw_sequencer_funcs dcn10_funcs = {
.update_plane_addr = dcn10_update_plane_addr,
.update_dchub = dcn10_update_dchub,
.update_pending_status = dcn10_update_pending_status,
+   .clear_surface_dcc_and_tiling = dcn10_reset_surface_dcc_and_tiling,
.program_output_csc = dcn10_program_output_csc,
.enable_accelerated_mode = dce110_enable_accelerated_mode,
.enable_timing_synchronization = dcn10_enable_timing_synchronization,
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_init.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_init.c
index 32707b344f0b..ad253c586ea1 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_init.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn20/dcn20_init.c
@@ -36,6 +36,7 @@ static const struct hw_sequencer_funcs dcn20_funcs = {
.apply_ctx_to_hw = dce110_apply_ctx_to_hw,
.apply_ctx_for_surface = NULL,
.program_front_end_for_ctx = dcn20_program_front_end_for_ctx,
+   .clear_surface_dcc_and_tiling = dcn10_reset_surface_dcc_and_tiling,
.wait_for_pending_cleared = dcn10_wait_for_pending_cleared,
.post_unlock_program_front_end = dcn20_post_unlock_program_front_end,
.update_plane_addr = dcn20_update_plane_addr,
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn201/dcn201_init.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn201/dcn201_init.c
index 78351408e864..dec57fb4c05c 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn201/dcn201_init.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn201/dcn201_init.c
@@ -36,6 +36,7 @@ static const struct hw_sequencer_funcs dcn201_funcs = {
.apply_ctx_to_hw = dce110_apply_ctx_to_hw,
.apply_ctx_for_surface = NULL,
.program_front_end_for_ctx = dcn20_program_front_end_for_ctx,
+   .clear_surface_dcc_and_tiling = dcn10_reset_surface_dcc_and_tiling,
.wait_for_pending_cleared = dcn10_wait_for_pending_cleared,
.post_unlock_program_front_end = dcn10_post_unlock_program_front_end,
.update_plane_addr = dcn201_update_plane_addr,
diff --git a/drivers/

[PATCH 4/5] drm/scheduler: Add basic priority tests

2025-02-14 Thread Tvrtko Ursulin

Add some basic tests for exercising entity priority handling.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/tests/tests_basic.c | 99 ++-
 1 file changed, 98 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c 
b/drivers/gpu/drm/scheduler/tests/tests_basic.c
index 93b3043fbcb7..25dba633a14a 100644
--- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
+++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
@@ -1,5 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 
+#include 
+
 #include "sched_tests.h"
 
 /*
@@ -252,5 +254,100 @@ static struct kunit_suite drm_sched_timeout = {
.test_cases = drm_sched_timeout_tests,
 };
 
+static void drm_sched_priorities(struct kunit *test)
+{
+   struct drm_mock_sched_entity *entity[DRM_SCHED_PRIORITY_COUNT];
+   struct drm_mock_scheduler *sched = test->priv;
+   struct drm_mock_sched_job *job;
+   const unsigned int qd = 100;
+   unsigned int i, cur_ent = 0;
+   enum drm_sched_priority p;
+   bool done;
+
+   /*
+* Submit a bunch of jobs against entities configured with different
+* priorities.
+*/
+
+   BUILD_BUG_ON(DRM_SCHED_PRIORITY_KERNEL > DRM_SCHED_PRIORITY_LOW);
+   BUILD_BUG_ON(DRM_SCHED_PRIORITY_COUNT != ARRAY_SIZE(entity));
+
+   for (p = DRM_SCHED_PRIORITY_KERNEL; p <= DRM_SCHED_PRIORITY_LOW; p++)
+   entity[p] = drm_mock_new_sched_entity(test, p, sched);
+
+   for (i = 0; i < qd; i++) {
+   job = drm_mock_new_sched_job(test, entity[cur_ent++]);
+   cur_ent %= ARRAY_SIZE(entity);
+   drm_mock_sched_job_set_duration_us(job, 1000);
+   drm_mock_sched_job_submit(job);
+   }
+
+   done = drm_mock_sched_job_wait_finished(job, HZ);
+   KUNIT_ASSERT_EQ(test, done, true);
+
+   for (i = 0; i < ARRAY_SIZE(entity); i++)
+   drm_mock_sched_entity_free(entity[i]);
+}
+
+static void drm_sched_change_priority(struct kunit *test)
+{
+   struct drm_mock_sched_entity *entity[DRM_SCHED_PRIORITY_COUNT];
+   struct drm_mock_scheduler *sched = test->priv;
+   struct drm_mock_sched_job *job;
+   const unsigned int qd = 1000;
+   unsigned int i, cur_ent = 0;
+   enum drm_sched_priority p;
+   bool done;
+
+   /*
+* Submit a bunch of jobs against entities configured with different
+* priorities and while waiting for them to complete, periodically keep
+* changing their priorities.
+*
+* We set up the queue-depth (qd) and job duration so the priority
+* changing loop has some time to interact with submissions to the
+* backend and job completions as they progress.
+*/
+
+   for (p = DRM_SCHED_PRIORITY_KERNEL; p <= DRM_SCHED_PRIORITY_LOW; p++)
+   entity[p] = drm_mock_new_sched_entity(test, p, sched);
+
+   for (i = 0; i < qd; i++) {
+   job = drm_mock_new_sched_job(test, entity[cur_ent++]);
+   cur_ent %= ARRAY_SIZE(entity);
+   drm_mock_sched_job_set_duration_us(job, 1000);
+   drm_mock_sched_job_submit(job);
+   }
+
+   do {
+   drm_sched_entity_set_priority(&entity[cur_ent]->base,
+ (entity[cur_ent]->base.priority + 
1) %
+ DRM_SCHED_PRIORITY_COUNT);
+   cur_ent++;
+   cur_ent %= ARRAY_SIZE(entity);
+   usleep_range(200, 500);
+   } while (!drm_mock_sched_job_is_finished(job));
+
+   done = drm_mock_sched_job_wait_finished(job, HZ);
+   KUNIT_ASSERT_EQ(test, done, true);
+
+   for (i = 0; i < ARRAY_SIZE(entity); i++)
+   drm_mock_sched_entity_free(entity[i]);
+}
+
+static struct kunit_case drm_sched_priority_tests[] = {
+   KUNIT_CASE(drm_sched_priorities),
+   KUNIT_CASE(drm_sched_change_priority),
+   {}
+};
+
+static struct kunit_suite drm_sched_priority = {
+   .name = "drm_sched_basic_priority_tests",
+   .init = drm_sched_basic_init,
+   .exit = drm_sched_basic_exit,
+   .test_cases = drm_sched_priority_tests,
+};
+
 kunit_test_suites(&drm_sched_basic,
- &drm_sched_timeout);
+ &drm_sched_timeout,
+ &drm_sched_priority);
-- 
2.48.0

[PATCH 0/5] DRM scheduler kunit tests

2025-02-14 Thread Tvrtko Ursulin

There has repeatedly been quite a bit of apprehension when any change to the DRM
scheduler is proposed, with two main reasons being code base is considered
fragile, not well understood and not very well documented, and secondly the lack
of systematic testing outside the vendor specific tests suites and/or test
farms.

This series is an attempt to dislodge this status quo by adding some unit tests
using the kunit framework.

General approach is that there is a mock "hardware" backend which can be
controlled from tests, which in turn allows exercising various scheduler code
paths.

Only some simple basic tests get added in the series and hopefully it is easy to
understand what tests are doing.

An obligatory "screenshot" for reference:

[14:29:37]  drm_sched_basic_tests (3 subtests) 
[14:29:38] [PASSED] drm_sched_basic_submit
[14:29:38] == drm_sched_basic_test  ===
[14:29:38] [PASSED] A queue of jobs in a single entity
[14:29:38] [PASSED] A chain of dependent jobs across multiple entities
[14:29:38] [PASSED] Multiple independent job queues
[14:29:38] [PASSED] Multiple inter-dependent job queues
[14:29:38] == [PASSED] drm_sched_basic_test ===
[14:29:38] [PASSED] drm_sched_basic_entity_cleanup
[14:29:38] == [PASSED] drm_sched_basic_tests ==
[14:29:38]  drm_sched_basic_timeout_tests (1 subtest) =
[14:29:40] [PASSED] drm_sched_basic_timeout
[14:29:40] == [PASSED] drm_sched_basic_timeout_tests ==
[14:29:40] === drm_sched_basic_priority_tests (2 subtests) 
[14:29:42] [PASSED] drm_sched_priorities
[14:29:42] [PASSED] drm_sched_change_priority
[14:29:42] = [PASSED] drm_sched_basic_priority_tests ==
[14:29:42] == drm_sched_basic_modify_sched_tests (1 subtest) ==
[14:29:43] [PASSED] drm_sched_test_modify_sched
[14:29:43] === [PASSED] drm_sched_basic_modify_sched_tests 
[14:29:43] 
[14:29:43] Testing complete. Ran 10 tests: passed: 10
[14:29:43] Elapsed time: 13.330s total, 0.001s configuring, 4.005s building, 
9.276s running

v2:
 * Parameterize a bunch of similar tests.
 * Improve test commentary.
 * Rename TDR test to timeout. (Christian)
 * Improve quality and consistency of naming. (Philipp)

RFC v2 -> series v1:
 * Rebased for drm_sched_init changes.
 * Fixed modular build.
 * Added some comments.
 * Filename renames. (Philipp)

Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 

Tvrtko Ursulin (5):
  drm: Move some options to separate new Kconfig.debug
  drm/scheduler: Add scheduler unit testing infrastructure and some
basic tests
  drm/scheduler: Add a simple timeout test
  drm/scheduler: Add basic priority tests
  drm/scheduler: Add a basic test for modifying entities scheduler list

 drivers/gpu/drm/Kconfig   | 109 +
 drivers/gpu/drm/Kconfig.debug | 115 +
 drivers/gpu/drm/scheduler/.kunitconfig|  12 +
 drivers/gpu/drm/scheduler/Makefile|   2 +
 drivers/gpu/drm/scheduler/tests/Makefile  |   6 +
 .../gpu/drm/scheduler/tests/mock_scheduler.c  | 323 +
 drivers/gpu/drm/scheduler/tests/sched_tests.h | 222 +
 drivers/gpu/drm/scheduler/tests/tests_basic.c | 424 ++
 8 files changed, 1109 insertions(+), 104 deletions(-)
 create mode 100644 drivers/gpu/drm/Kconfig.debug
 create mode 100644 drivers/gpu/drm/scheduler/.kunitconfig
 create mode 100644 drivers/gpu/drm/scheduler/tests/Makefile
 create mode 100644 drivers/gpu/drm/scheduler/tests/mock_scheduler.c
 create mode 100644 drivers/gpu/drm/scheduler/tests/sched_tests.h
 create mode 100644 drivers/gpu/drm/scheduler/tests/tests_basic.c

-- 
2.48.0

[PATCH 2/5] drm/scheduler: Add scheduler unit testing infrastructure and some basic tests

2025-02-14 Thread Tvrtko Ursulin

Implement a mock scheduler backend and add some basic test to exercise the
core scheduler code paths.

Mock backend (kind of like a very simple mock GPU) can either process jobs
by tests manually advancing the "timeline" job at a time, or alternatively
jobs can be configured with a time duration in which case they get
completed asynchronously from the unit test code.

Core scheduler classes are subclassed to support this mock implementation.

The tests added are just a few simple submission patterns.

Signed-off-by: Tvrtko Ursulin 
Suggested-by: Philipp Stanner 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/Kconfig.debug |  12 +
 drivers/gpu/drm/scheduler/.kunitconfig|  12 +
 drivers/gpu/drm/scheduler/Makefile|   2 +
 drivers/gpu/drm/scheduler/tests/Makefile  |   6 +
 .../gpu/drm/scheduler/tests/mock_scheduler.c  | 316 ++
 drivers/gpu/drm/scheduler/tests/sched_tests.h | 218 
 drivers/gpu/drm/scheduler/tests/tests_basic.c | 196 +++
 7 files changed, 762 insertions(+)
 create mode 100644 drivers/gpu/drm/scheduler/.kunitconfig
 create mode 100644 drivers/gpu/drm/scheduler/tests/Makefile
 create mode 100644 drivers/gpu/drm/scheduler/tests/mock_scheduler.c
 create mode 100644 drivers/gpu/drm/scheduler/tests/sched_tests.h
 create mode 100644 drivers/gpu/drm/scheduler/tests/tests_basic.c

diff --git a/drivers/gpu/drm/Kconfig.debug b/drivers/gpu/drm/Kconfig.debug
index 601d7e07d421..6fd4c5669400 100644
--- a/drivers/gpu/drm/Kconfig.debug
+++ b/drivers/gpu/drm/Kconfig.debug
@@ -99,5 +99,17 @@ config DRM_TTM_KUNIT_TEST
 
  If in doubt, say "N".
 
+config DRM_SCHED_KUNIT_TEST
+   tristate "KUnit tests for the DRM scheduler" if !KUNIT_ALL_TESTS
+   select DRM_SCHED
+   depends on DRM && KUNIT
+   default KUNIT_ALL_TESTS
+   help
+ Choose this option to build unit tests for the DRM scheduler.
+
+ Recommended for driver developers only.
+
+ If in doubt, say "N".
+
 config DRM_EXPORT_FOR_TESTS
bool
diff --git a/drivers/gpu/drm/scheduler/.kunitconfig 
b/drivers/gpu/drm/scheduler/.kunitconfig
new file mode 100644
index ..cece53609fcf
--- /dev/null
+++ b/drivers/gpu/drm/scheduler/.kunitconfig
@@ -0,0 +1,12 @@
+CONFIG_KUNIT=y
+CONFIG_DRM=y
+CONFIG_DRM_SCHED_KUNIT_TEST=y
+CONFIG_EXPERT=y
+CONFIG_DEBUG_SPINLOCK=y
+CONFIG_DEBUG_MUTEXES=y
+CONFIG_DEBUG_ATOMIC_SLEEP=y
+CONFIG_LOCK_DEBUGGING_SUPPORT=y
+CONFIG_PROVE_LOCKING=y
+CONFIG_LOCKDEP=y
+CONFIG_DEBUG_LOCKDEP=y
+CONFIG_DEBUG_LIST=y
diff --git a/drivers/gpu/drm/scheduler/Makefile 
b/drivers/gpu/drm/scheduler/Makefile
index 53863621829f..6e13e4c63e9d 100644
--- a/drivers/gpu/drm/scheduler/Makefile
+++ b/drivers/gpu/drm/scheduler/Makefile
@@ -23,3 +23,5 @@
 gpu-sched-y := sched_main.o sched_fence.o sched_entity.o
 
 obj-$(CONFIG_DRM_SCHED) += gpu-sched.o
+
+obj-$(CONFIG_DRM_SCHED_KUNIT_TEST) += tests/
diff --git a/drivers/gpu/drm/scheduler/tests/Makefile 
b/drivers/gpu/drm/scheduler/tests/Makefile
new file mode 100644
index ..51d275a18cf4
--- /dev/null
+++ b/drivers/gpu/drm/scheduler/tests/Makefile
@@ -0,0 +1,6 @@
+
+drm-sched-tests-y := \
+mock_scheduler.o \
+tests_basic.o
+
+obj-$(CONFIG_DRM_SCHED_KUNIT_TEST) += drm-sched-tests.o
diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c 
b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
new file mode 100644
index ..7624795c43fd
--- /dev/null
+++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
@@ -0,0 +1,316 @@
+// SPDX-License-Identifier: GPL-2.0
+
+#include "sched_tests.h"
+
+/*
+ * Here we implement the mock "GPU" (or the scheduler backend) which is used by
+ * the DRM scheduler unit tests in order to exercise the core functionality.
+ *
+ * Test cases are implemented in a separate file.
+ */
+
+/**
+ * drm_mock_new_sched_entity - Create a new mock scheduler entity
+ *
+ * @test: KUnit test owning the entity
+ * @priority: Scheduling priority
+ * @sched: Mock scheduler on which the entity can be scheduled
+ *
+ * Returns: New mock scheduler entity with allocation managed by the test
+ */
+struct drm_mock_sched_entity *
+drm_mock_new_sched_entity(struct kunit *test,
+ enum drm_sched_priority priority,
+ struct drm_mock_scheduler *sched)
+{
+   struct drm_mock_sched_entity *entity;
+   struct drm_gpu_scheduler *drm_sched;
+   int ret;
+
+   entity = kunit_kzalloc(test, sizeof(*entity), GFP_KERNEL);
+   KUNIT_ASSERT_NOT_NULL(test, entity);
+
+   drm_sched = &sched->base;
+   ret = drm_sched_entity_init(&entity->base,
+   priority,
+   &drm_sched, 1,
+   NULL);
+   KUNIT_ASSERT_EQ(test, ret, 0);
+
+   entity->test = test;
+
+   return entity;
+}
+
+/**
+ * drm_mock_sched_entity_free - Destroys a moc

[PATCH 3/5] drm/scheduler: Add a simple timeout test

2025-02-14 Thread Tvrtko Ursulin

Add a very simple timeout test which submits a single job and verifies
that the timeout handling will run if the backend failed to complete the
job in time.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 .../gpu/drm/scheduler/tests/mock_scheduler.c  | 13 +++-
 drivers/gpu/drm/scheduler/tests/sched_tests.h |  6 +-
 drivers/gpu/drm/scheduler/tests/tests_basic.c | 64 ++-
 3 files changed, 77 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c 
b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
index 7624795c43fd..b783d8f055f3 100644
--- a/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
+++ b/drivers/gpu/drm/scheduler/tests/mock_scheduler.c
@@ -192,7 +192,11 @@ static struct dma_fence *mock_sched_run_job(struct 
drm_sched_job *sched_job)
 static enum drm_gpu_sched_stat
 mock_sched_timedout_job(struct drm_sched_job *sched_job)
 {
-   return DRM_GPU_SCHED_STAT_ENODEV;
+   struct drm_mock_sched_job *job = drm_sched_job_to_mock_job(sched_job);
+
+   job->flags |= DRM_MOCK_SCHED_JOB_TIMEDOUT;
+
+   return DRM_GPU_SCHED_STAT_NOMINAL;
 }
 
 static void mock_sched_free_job(struct drm_sched_job *sched_job)
@@ -210,17 +214,20 @@ static const struct drm_sched_backend_ops 
drm_mock_scheduler_ops = {
  * drm_mock_new_scheduler - Create a new mock scheduler
  *
  * @test: KUnit test owning the job
+ * @timeout: Job timeout to set
  *
  * Returns: New mock scheduler with allocation managed by the test
  */
-struct drm_mock_scheduler *drm_mock_new_scheduler(struct kunit *test)
+struct drm_mock_scheduler *
+drm_mock_new_scheduler(struct kunit *test,
+  long timeout)
 {
struct drm_sched_init_args args = {
.ops= &drm_mock_scheduler_ops,
.num_rqs= DRM_SCHED_PRIORITY_COUNT,
.credit_limit   = U32_MAX,
.hang_limit = UINT_MAX,
-   .timeout= MAX_SCHEDULE_TIMEOUT,
+   .timeout= timeout,
.name   = "drm-mock-scheduler",
};
struct drm_mock_scheduler *sched;
diff --git a/drivers/gpu/drm/scheduler/tests/sched_tests.h 
b/drivers/gpu/drm/scheduler/tests/sched_tests.h
index eae79365ff67..b17bf0db9e9c 100644
--- a/drivers/gpu/drm/scheduler/tests/sched_tests.h
+++ b/drivers/gpu/drm/scheduler/tests/sched_tests.h
@@ -88,6 +88,9 @@ struct drm_mock_scheduler {
 struct drm_mock_sched_job {
struct drm_sched_jobbase;
 
+#define DRM_MOCK_SCHED_JOB_TIMEDOUT 0x1
+   unsigned long   flags;
+
struct list_headlink;
struct hrtimer  timer;
 
@@ -118,7 +121,8 @@ drm_sched_job_to_mock_job(struct drm_sched_job *sched_job)
return container_of(sched_job, struct drm_mock_sched_job, base);
 };
 
-struct drm_mock_scheduler *drm_mock_new_scheduler(struct kunit *test);
+struct drm_mock_scheduler *drm_mock_new_scheduler(struct kunit *test,
+ long timeout);
 void drm_mock_scheduler_fini(struct drm_mock_scheduler *sched);
 unsigned int drm_mock_sched_advance(struct drm_mock_scheduler *sched,
unsigned int num);
diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c 
b/drivers/gpu/drm/scheduler/tests/tests_basic.c
index 2a1ab04e12b7..93b3043fbcb7 100644
--- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
+++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
@@ -11,7 +11,7 @@
 
 static int drm_sched_basic_init(struct kunit *test)
 {
-   test->priv = drm_mock_new_scheduler(test);
+   test->priv = drm_mock_new_scheduler(test, MAX_SCHEDULE_TIMEOUT);
 
return 0;
 }
@@ -23,6 +23,13 @@ static void drm_sched_basic_exit(struct kunit *test)
drm_mock_scheduler_fini(sched);
 }
 
+static int drm_sched_timeout_init(struct kunit *test)
+{
+   test->priv = drm_mock_new_scheduler(test, HZ);
+
+   return 0;
+}
+
 static void drm_sched_basic_submit(struct kunit *test)
 {
struct drm_mock_scheduler *sched = test->priv;
@@ -193,4 +200,57 @@ static struct kunit_suite drm_sched_basic = {
.test_cases = drm_sched_basic_tests,
 };
 
-kunit_test_suite(drm_sched_basic);
+static void drm_sched_basic_timeout(struct kunit *test)
+{
+   struct drm_mock_scheduler *sched = test->priv;
+   struct drm_mock_sched_entity *entity;
+   struct drm_mock_sched_job *job;
+   bool done;
+
+   /*
+* Submit a single job against a scheduler with the timeout configured
+* and verify that the timeout handling will run if the backend fails
+* to complete it in time.
+*/
+
+   entity = drm_mock_new_sched_entity(test,
+  DRM_SCHED_PRIORITY_NORMAL,
+  sched);
+   job = drm_mock_new_sched_job(test, entity);
+
+   drm_mock_sched_job_submit(job);
+
+   don

[PATCH 1/5] drm: Move some options to separate new Kconfig.debug

2025-02-14 Thread Tvrtko Ursulin

Move some options out into a new debug specific kconfig file in order to
make things a bit cleaner.

Signed-off-by: Tvrtko Ursulin 
---
 drivers/gpu/drm/Kconfig   | 109 ++
 drivers/gpu/drm/Kconfig.debug | 103 
 2 files changed, 108 insertions(+), 104 deletions(-)
 create mode 100644 drivers/gpu/drm/Kconfig.debug

diff --git a/drivers/gpu/drm/Kconfig b/drivers/gpu/drm/Kconfig
index d9986fd52194..46ba24592553 100644
--- a/drivers/gpu/drm/Kconfig
+++ b/drivers/gpu/drm/Kconfig
@@ -26,6 +26,11 @@ menuconfig DRM
  details.  You should also select and configure AGP
  (/dev/agpgart) support if it is available for your platform.
 
+menu "DRM debugging options"
+depends on DRM
+source "drivers/gpu/drm/Kconfig.debug"
+endmenu
+
 if DRM
 
 config DRM_MIPI_DBI
@@ -37,65 +42,6 @@ config DRM_MIPI_DSI
bool
depends on DRM
 
-config DRM_DEBUG_MM
-   bool "Insert extra checks and debug info into the DRM range managers"
-   default n
-   depends on DRM
-   depends on STACKTRACE_SUPPORT
-   select STACKDEPOT
-   help
- Enable allocation tracking of memory manager and leak detection on
- shutdown.
-
- Recommended for driver developers only.
-
- If in doubt, say "N".
-
-config DRM_USE_DYNAMIC_DEBUG
-   bool "use dynamic debug to implement drm.debug"
-   default n
-   depends on BROKEN
-   depends on DRM
-   depends on DYNAMIC_DEBUG || DYNAMIC_DEBUG_CORE
-   depends on JUMP_LABEL
-   help
- Use dynamic-debug to avoid drm_debug_enabled() runtime overheads.
- Due to callsite counts in DRM drivers (~4k in amdgpu) and 56
- bytes per callsite, the .data costs can be substantial, and
- are therefore configurable.
-
-config DRM_KUNIT_TEST_HELPERS
-   tristate
-   depends on DRM && KUNIT
-   select DRM_KMS_HELPER
-   help
- KUnit Helpers for KMS drivers.
-
-config DRM_KUNIT_TEST
-   tristate "KUnit tests for DRM" if !KUNIT_ALL_TESTS
-   depends on DRM && KUNIT && MMU
-   select DRM_BUDDY
-   select DRM_DISPLAY_DP_HELPER
-   select DRM_DISPLAY_HDMI_STATE_HELPER
-   select DRM_DISPLAY_HELPER
-   select DRM_EXEC
-   select DRM_EXPORT_FOR_TESTS if m
-   select DRM_GEM_SHMEM_HELPER
-   select DRM_KUNIT_TEST_HELPERS
-   select DRM_LIB_RANDOM
-   select PRIME_NUMBERS
-   default KUNIT_ALL_TESTS
-   help
- This builds unit tests for DRM. This option is not useful for
- distributions or general kernels, but only for kernel
- developers working on DRM and associated drivers.
-
- For more information on KUnit and unit tests in general,
- please refer to the KUnit documentation in
- Documentation/dev-tools/kunit/.
-
- If in doubt, say "N".
-
 config DRM_KMS_HELPER
tristate
depends on DRM
@@ -247,23 +193,6 @@ config DRM_TTM
  GPU memory types. Will be enabled automatically if a device driver
  uses it.
 
-config DRM_TTM_KUNIT_TEST
-tristate "KUnit tests for TTM" if !KUNIT_ALL_TESTS
-default n
-depends on DRM && KUNIT && MMU && (UML || COMPILE_TEST)
-select DRM_TTM
-select DRM_BUDDY
-select DRM_EXPORT_FOR_TESTS if m
-select DRM_KUNIT_TEST_HELPERS
-default KUNIT_ALL_TESTS
-help
-  Enables unit tests for TTM, a GPU memory manager subsystem used
-  to manage memory buffers. This option is mostly useful for kernel
-  developers. It depends on (UML || COMPILE_TEST) since no other driver
-  which uses TTM can be loaded while running the tests.
-
-  If in doubt, say "N".
-
 config DRM_EXEC
tristate
depends on DRM
@@ -463,9 +392,6 @@ config DRM_HYPERV
 
 If M is selected the module will be called hyperv_drm.
 
-config DRM_EXPORT_FOR_TESTS
-   bool
-
 # Separate option as not all DRM drivers use it
 config DRM_PANEL_BACKLIGHT_QUIRKS
tristate
@@ -478,31 +404,6 @@ config DRM_PRIVACY_SCREEN
bool
default n
 
-config DRM_WERROR
-   bool "Compile the drm subsystem with warnings as errors"
-   depends on DRM && EXPERT
-   depends on !WERROR
-   default n
-   help
- A kernel build should not cause any compiler warnings, and this
- enables the '-Werror' flag to enforce that rule in the drm subsystem.
-
- The drm subsystem enables more warnings than the kernel default, so
- this config option is disabled by default.
-
- If in doubt, say N.
-
-config DRM_HEADER_TEST
-   bool "Ensure DRM headers are self-contained and pass kernel-doc"
-   depends on DRM && EXPERT
-   default n
-   help
- Ensure the DRM subsystem headers both under drivers/gpu/drm and
- include/drm compile, are self-contained, have header guards, and have
- no kernel-doc

[PATCH v5 1/6] drm/sched: Add internal job peek/pop API

2025-02-14 Thread Tvrtko Ursulin

Idea is to add helpers for peeking and popping jobs from entities with
the goal of decoupling the hidden assumption in the code that queue_node
is the first element in struct drm_sched_job.

That assumption usually comes in the form of:

  while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue

Which breaks if the queue_node is re-positioned due to_drm_sched_job
being implemented with a container_of.

This also allows us to remove duplicate definitions of to_drm_sched_job.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_entity.c   | 11 +++---
 drivers/gpu/drm/scheduler/sched_internal.h | 46 ++
 drivers/gpu/drm/scheduler/sched_main.c |  7 ++--
 3 files changed, 54 insertions(+), 10 deletions(-)
 create mode 100644 drivers/gpu/drm/scheduler/sched_internal.h

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index 69bcf0e99d57..a171f05ad761 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -28,11 +28,10 @@
 #include 
 #include 
 
+#include "sched_internal.h"
+
 #include "gpu_scheduler_trace.h"
 
-#define to_drm_sched_job(sched_job)\
-   container_of((sched_job), struct drm_sched_job, queue_node)
-
 /**
  * drm_sched_entity_init - Init a context entity used by scheduler when
  * submit to HW ring.
@@ -255,7 +254,7 @@ static void drm_sched_entity_kill(struct drm_sched_entity 
*entity)
/* The entity is guaranteed to not be used by the scheduler */
prev = rcu_dereference_check(entity->last_scheduled, true);
dma_fence_get(prev);
-   while ((job = to_drm_sched_job(spsc_queue_pop(&entity->job_queue {
+   while ((job = drm_sched_entity_queue_pop(entity))) {
struct drm_sched_fence *s_fence = job->s_fence;
 
dma_fence_get(&s_fence->finished);
@@ -477,7 +476,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
 {
struct drm_sched_job *sched_job;
 
-   sched_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
+   sched_job = drm_sched_entity_queue_peek(entity);
if (!sched_job)
return NULL;
 
@@ -513,7 +512,7 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity)
if (drm_sched_policy == DRM_SCHED_POLICY_FIFO) {
struct drm_sched_job *next;
 
-   next = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
+   next = drm_sched_entity_queue_peek(entity);
if (next) {
struct drm_sched_rq *rq;
 
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h 
b/drivers/gpu/drm/scheduler/sched_internal.h
new file mode 100644
index ..815d384845a3
--- /dev/null
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -0,0 +1,46 @@
+#ifndef _DRM_GPU_SCHEDULER_INTERNAL_H_
+#define _DRM_GPU_SCHEDULER_INTERNAL_H_
+
+/**
+ * drm_sched_entity_queue_pop - Low level helper for popping queued jobs
+ *
+ * @entity: scheduler entity
+ *
+ * Low level helper for popping queued jobs.
+ *
+ * Returns: The job dequeued or NULL.
+ */
+static inline struct drm_sched_job *
+drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
+{
+   struct spsc_node *node;
+
+   node = spsc_queue_pop(&entity->job_queue);
+   if (!node)
+   return NULL;
+
+   return container_of(node, struct drm_sched_job, queue_node);
+}
+
+/**
+ * drm_sched_entity_queue_peek - Low level helper for peeking at the job queue
+ *
+ * @entity: scheduler entity
+ *
+ * Low level helper for peeking at the job queue
+ *
+ * Returns: The job at the head of the queue or NULL.
+ */
+static inline struct drm_sched_job *
+drm_sched_entity_queue_peek(struct drm_sched_entity *entity)
+{
+   struct spsc_node *node;
+
+   node = spsc_queue_peek(&entity->job_queue);
+   if (!node)
+   return NULL;
+
+   return container_of(node, struct drm_sched_job, queue_node);
+}
+
+#endif
diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
b/drivers/gpu/drm/scheduler/sched_main.c
index 8c36a59afb72..c634993f1346 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -78,6 +78,8 @@
 #include 
 #include 
 
+#include "sched_internal.h"
+
 #define CREATE_TRACE_POINTS
 #include "gpu_scheduler_trace.h"
 
@@ -87,9 +89,6 @@ static struct lockdep_map drm_sched_lockdep_map = {
 };
 #endif
 
-#define to_drm_sched_job(sched_job)\
-   container_of((sched_job), struct drm_sched_job, queue_node)
-
 int drm_sched_policy = DRM_SCHED_POLICY_FIFO;
 
 /**
@@ -123,7 +122,7 @@ static bool drm_sched_can_queue(struct drm_gpu_scheduler 
*sched,
 {
struct drm_sched_job *s_job;
 
-   s_job = to_drm_sched_job(spsc_queue_peek(&entity->job_queue));
+   s_job = drm_sched

[PATCH v5 0/6] drm/sched: Job queue peek/pop helpers and struct job re-order

2025-02-14 Thread Tvrtko Ursulin

Lets add some helpers for peeking and popping from the job queue which allows us
to re-order the fields in struct drm_sched_job and remove one hole.

As in the process we have added a header file for scheduler internal prototypes,
lets also use it more and cleanup the "exported" header a bit.

v2:
 * Add header file for internal scheduler API.
 * Add helper for peeking too. (Danilo)
 * Add (temporary?) drm_sched_cancel_all_jobs() helper to replace amdgpu
   amdgpu_job_stop_all_jobs_on_sched().

v3:
 * Settle for a copy of __drm_sched_entity_queue_pop in amdgpu for now.

v4:
 * Expand the series with some more header file cleanup.

v5:
 * Rebase for drm_sched_init changes.
 * Tweak kerneldoc format.

Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 

Tvrtko Ursulin (6):
  drm/sched: Add internal job peek/pop API
  drm/amdgpu: Pop jobs from the queue more robustly
  drm/sched: Remove a hole from struct drm_sched_job
  drm/sched: Move drm_sched_entity_is_ready to internal header
  drm/sched: Move internal prototypes to internal header
  drm/sched: Group exported prototypes by object type

 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c|  22 +++-
 drivers/gpu/drm/scheduler/sched_entity.c   |  23 +---
 drivers/gpu/drm/scheduler/sched_fence.c|   2 +
 drivers/gpu/drm/scheduler/sched_internal.h |  89 +++
 drivers/gpu/drm/scheduler/sched_main.c |   7 +-
 include/drm/gpu_scheduler.h| 122 +
 6 files changed, 169 insertions(+), 96 deletions(-)
 create mode 100644 drivers/gpu/drm/scheduler/sched_internal.h

-- 
2.48.0

Re: [PATCH 2/3] drm/amdgpu: Pop jobs from the queue more robustly

2025-02-14 Thread Tvrtko Ursulin




On 14/02/2025 10:31, Christian König wrote:

Am 14.02.25 um 11:21 schrieb Tvrtko Ursulin:


Hi Christian,

On 11/02/2025 10:21, Christian König wrote:

Am 11.02.25 um 11:08 schrieb Philipp Stanner:

On Tue, 2025-02-11 at 09:22 +0100, Christian König wrote:

Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:

Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a
newly
added __drm_sched_entity_queue_pop.

This allows breaking the hidden dependency that queue_node has to
be the
first element in struct drm_sched_job.

A comment is also added with a reference to the mailing list
discussion
explaining the copied helper will be removed when the whole broken
amdgpu_job_stop_all_jobs_on_sched is removed.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Cc: "Zhang, Hawking" 

Reviewed-by: Christian König 

I think this v3 has been supplanted by a v4 by now.


I've seen the larger v4 series as well, but at least that patch here looks 
identical on first glance. So my rb still counts.


Is it okay for you to merge the whole series (including this single amdgpu 
patch) via drm-misc?


I can do that, but don't want the scheduler maintainer want to pick them up?


Sorry that was some bad and unclear English. :(

It is as you suggest - what I meant was, is it okay from your point of 
view that the whole series is merged via drm-misc? I assume Philipp 
would indeed be the one to merge it, once all patches get r-b-ed.


Regards,

Tvrtko


@Tvrtko: btw, do you create patches with
git format-patch -v4 ?

That way the v4 label will be included in all patch titles, too, not
just the cover letter. That makes searching etc. easier in large
inboxes

P.


---
    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++-
--
    1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 100f04475943..22cb48bab24d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct
drm_sched_job *sched_job)
    return fence;
    }
-#define to_drm_sched_job(sched_job)    \
-    container_of((sched_job), struct drm_sched_job,
queue_node)
+/*
+ * This is a duplicate function from DRM scheduler
sched_internal.h.
+ * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is
removed, due
+ * latter being incorrect and racy.
+ *
+ * See
https://lore.kernel.org/amd-gfx/44edde63-7181-44fb- a4f7-94e50514f...@amd.com/
+ */
+static struct drm_sched_job *
+__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
+{
+    struct spsc_node *node;
+
+    node = spsc_queue_pop(&entity->job_queue);
+    if (!node)
+    return NULL;
+
+    return container_of(node, struct drm_sched_job,
queue_node);
+}
    void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
*sched)
    {
@@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
drm_gpu_scheduler *sched)
    struct drm_sched_rq *rq = sched->sched_rq[i];
    spin_lock(&rq->lock);
    list_for_each_entry(s_entity, &rq->entities, list)
{
-    while ((s_job =
to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue {
+    while ((s_job =
__drm_sched_entity_queue_pop(s_entity))) {
    struct drm_sched_fence *s_fence =
s_job->s_fence;
    dma_fence_signal(&s_fence-

scheduled);

[PATCH v5 6/6] drm/sched: Group exported prototypes by object type

2025-02-14 Thread Tvrtko Ursulin

Do a bit of house keeping in gpu_scheduler.h by grouping the API by type
of object it operates on.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 include/drm/gpu_scheduler.h | 60 -
 1 file changed, 33 insertions(+), 27 deletions(-)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 8cb12f6231b8..50928a7ae98e 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -383,12 +383,6 @@ struct drm_sched_job {
struct xarray   dependencies;
 };
 
-static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
-   int threshold)
-{
-   return s_job && atomic_inc_return(&s_job->karma) > threshold;
-}
-
 enum drm_gpu_sched_stat {
DRM_GPU_SCHED_STAT_NONE, /* Reserve 0 */
DRM_GPU_SCHED_STAT_NOMINAL,
@@ -566,14 +560,36 @@ struct drm_sched_init_args {
struct device *dev;
 };
 
+/* Scheduler operations */
+
 int drm_sched_init(struct drm_gpu_scheduler *sched,
   const struct drm_sched_init_args *args);
 
 void drm_sched_fini(struct drm_gpu_scheduler *sched);
+
+unsigned long drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched);
+void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
+ unsigned long remaining);
+void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
+bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
+void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched);
+void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched);
+void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job 
*bad);
+void drm_sched_start(struct drm_gpu_scheduler *sched, int errno);
+void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
+void drm_sched_fault(struct drm_gpu_scheduler *sched);
+
+struct drm_gpu_scheduler *
+drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
+   unsigned int num_sched_list);
+
+/* Jobs */
+
 int drm_sched_job_init(struct drm_sched_job *job,
   struct drm_sched_entity *entity,
   u32 credits, void *owner);
 void drm_sched_job_arm(struct drm_sched_job *job);
+void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
 int drm_sched_job_add_dependency(struct drm_sched_job *job,
 struct dma_fence *fence);
 int drm_sched_job_add_syncobj_dependency(struct drm_sched_job *job,
@@ -588,21 +604,16 @@ int drm_sched_job_add_implicit_dependencies(struct 
drm_sched_job *job,
bool write);
 bool drm_sched_job_has_dependency(struct drm_sched_job *job,
  struct dma_fence *fence);
-
-void drm_sched_entity_modify_sched(struct drm_sched_entity *entity,
-   struct drm_gpu_scheduler **sched_list,
-   unsigned int num_sched_list);
-
-void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
 void drm_sched_job_cleanup(struct drm_sched_job *job);
-bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
-void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched);
-void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched);
-void drm_sched_stop(struct drm_gpu_scheduler *sched, struct drm_sched_job 
*bad);
-void drm_sched_start(struct drm_gpu_scheduler *sched, int errno);
-void drm_sched_resubmit_jobs(struct drm_gpu_scheduler *sched);
 void drm_sched_increase_karma(struct drm_sched_job *bad);
-void drm_sched_fault(struct drm_gpu_scheduler *sched);
+
+static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
+   int threshold)
+{
+   return s_job && atomic_inc_return(&s_job->karma) > threshold;
+}
+
+/* Entities */
 
 int drm_sched_entity_init(struct drm_sched_entity *entity,
  enum drm_sched_priority priority,
@@ -612,16 +623,11 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout);
 void drm_sched_entity_fini(struct drm_sched_entity *entity);
 void drm_sched_entity_destroy(struct drm_sched_entity *entity);
-void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
 void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
   enum drm_sched_priority priority);
 int drm_sched_entity_error(struct drm_sched_entity *entity);
-
-unsigned long drm_sched_suspend_timeout(struct drm_gpu_scheduler *sched);
-void drm_sched_resume_timeout(struct drm_gpu_scheduler *sched,
-   unsigned long remaining);
-struct drm_gpu_scheduler *
-drm_sched_pick_best(struct drm_gpu_scheduler **sched_list,
-unsigned int num_sched_list);
+void drm_sched_entity_modify_sched(struct drm_sched_entity *e

[PATCH v5 2/6] drm/amdgpu: Pop jobs from the queue more robustly

2025-02-14 Thread Tvrtko Ursulin

Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a newly
added drm_sched_entity_queue_pop.

This allows breaking the hidden dependency that queue_node has to be the
first element in struct drm_sched_job.

A comment is also added with a reference to the mailing list discussion
explaining the copied helper will be removed when the whole broken
amdgpu_job_stop_all_jobs_on_sched is removed.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Cc: "Zhang, Hawking" 
Reviewed-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 100f04475943..1899c601c95c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct 
drm_sched_job *sched_job)
return fence;
 }
 
-#define to_drm_sched_job(sched_job)\
-   container_of((sched_job), struct drm_sched_job, queue_node)
+/*
+ * This is a duplicate function from DRM scheduler sched_internal.h.
+ * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is removed, due
+ * latter being incorrect and racy.
+ *
+ * See 
https://lore.kernel.org/amd-gfx/44edde63-7181-44fb-a4f7-94e50514f...@amd.com/
+ */
+static struct drm_sched_job *
+drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
+{
+   struct spsc_node *node;
+
+   node = spsc_queue_pop(&entity->job_queue);
+   if (!node)
+   return NULL;
+
+   return container_of(node, struct drm_sched_job, queue_node);
+}
 
 void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler *sched)
 {
@@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct 
drm_gpu_scheduler *sched)
struct drm_sched_rq *rq = sched->sched_rq[i];
spin_lock(&rq->lock);
list_for_each_entry(s_entity, &rq->entities, list) {
-   while ((s_job = 
to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue {
+   while ((s_job = drm_sched_entity_queue_pop(s_entity))) {
struct drm_sched_fence *s_fence = 
s_job->s_fence;
 
dma_fence_signal(&s_fence->scheduled);
-- 
2.48.0

[PATCH v5 4/6] drm/sched: Move drm_sched_entity_is_ready to internal header

2025-02-14 Thread Tvrtko Ursulin

Helper is for scheduler internal use so lets hide it from DRM drivers
completely.

At the same time we change the method of checking whethere there is
anything in the queue from peeking to looking at the node count.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_entity.c   | 12 
 drivers/gpu/drm/scheduler/sched_internal.h | 13 +
 include/drm/gpu_scheduler.h|  1 -
 3 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
b/drivers/gpu/drm/scheduler/sched_entity.c
index a171f05ad761..87f88259ddf6 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -151,18 +151,6 @@ static bool drm_sched_entity_is_idle(struct 
drm_sched_entity *entity)
return false;
 }
 
-/* Return true if entity could provide a job. */
-bool drm_sched_entity_is_ready(struct drm_sched_entity *entity)
-{
-   if (spsc_queue_peek(&entity->job_queue) == NULL)
-   return false;
-
-   if (READ_ONCE(entity->dependency))
-   return false;
-
-   return true;
-}
-
 /**
  * drm_sched_entity_error - return error of last scheduled job
  * @entity: scheduler entity to check
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h 
b/drivers/gpu/drm/scheduler/sched_internal.h
index 815d384845a3..2fdf9dbd632e 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -43,4 +43,17 @@ drm_sched_entity_queue_peek(struct drm_sched_entity *entity)
return container_of(node, struct drm_sched_job, queue_node);
 }
 
+/* Return true if entity could provide a job. */
+static inline bool
+drm_sched_entity_is_ready(struct drm_sched_entity *entity)
+{
+   if (!spsc_queue_count(&entity->job_queue))
+   return false;
+
+   if (READ_ONCE(entity->dependency))
+   return false;
+
+   return true;
+}
+
 #endif
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 1c1138308e66..6cd0f288f6ed 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -632,7 +632,6 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
drm_sched_entity *entity);
 void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
 void drm_sched_entity_set_priority(struct drm_sched_entity *entity,
   enum drm_sched_priority priority);
-bool drm_sched_entity_is_ready(struct drm_sched_entity *entity);
 int drm_sched_entity_error(struct drm_sched_entity *entity);
 
 struct drm_sched_fence *drm_sched_fence_alloc(
-- 
2.48.0

Re: [PATCH 2/3] drm/amdgpu: Pop jobs from the queue more robustly

2025-02-14 Thread Tvrtko Ursulin




Hi Christian,

On 11/02/2025 10:21, Christian König wrote:

Am 11.02.25 um 11:08 schrieb Philipp Stanner:

On Tue, 2025-02-11 at 09:22 +0100, Christian König wrote:

Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:

Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a
newly
added __drm_sched_entity_queue_pop.

This allows breaking the hidden dependency that queue_node has to
be the
first element in struct drm_sched_job.

A comment is also added with a reference to the mailing list
discussion
explaining the copied helper will be removed when the whole broken
amdgpu_job_stop_all_jobs_on_sched is removed.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Cc: "Zhang, Hawking" 

Reviewed-by: Christian König 

I think this v3 has been supplanted by a v4 by now.


I've seen the larger v4 series as well, but at least that patch here 
looks identical on first glance. So my rb still counts.


Is it okay for you to merge the whole series (including this single 
amdgpu patch) via drm-misc?


Regards,

Tvrtko


@Tvrtko: btw, do you create patches with
git format-patch -v4 ?

That way the v4 label will be included in all patch titles, too, not
just the cover letter. That makes searching etc. easier in large
inboxes

P.


---
   drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++-
--
   1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
index 100f04475943..22cb48bab24d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
@@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct
drm_sched_job *sched_job)
   return fence;
   }
-#define to_drm_sched_job(sched_job)    \
-    container_of((sched_job), struct drm_sched_job,
queue_node)
+/*
+ * This is a duplicate function from DRM scheduler
sched_internal.h.
+ * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is
removed, due
+ * latter being incorrect and racy.
+ *
+ * See
https://lore.kernel.org/amd-gfx/44edde63-7181-44fb- 
a4f7-94e50514f...@amd.com/

+ */
+static struct drm_sched_job *
+__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
+{
+    struct spsc_node *node;
+
+    node = spsc_queue_pop(&entity->job_queue);
+    if (!node)
+    return NULL;
+
+    return container_of(node, struct drm_sched_job,
queue_node);
+}
   void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
*sched)
   {
@@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
drm_gpu_scheduler *sched)
   struct drm_sched_rq *rq = sched->sched_rq[i];
   spin_lock(&rq->lock);
   list_for_each_entry(s_entity, &rq->entities, list)
{
-    while ((s_job =
to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue {
+    while ((s_job =
__drm_sched_entity_queue_pop(s_entity))) {
   struct drm_sched_fence *s_fence =
s_job->s_fence;
   dma_fence_signal(&s_fence-

scheduled);

[PATCH 5/5] drm/scheduler: Add a basic test for modifying entities scheduler list

2025-02-14 Thread Tvrtko Ursulin

Add a basic test for exercising modifying the entities scheduler list at
runtime.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/tests/tests_basic.c | 73 ++-
 1 file changed, 72 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/scheduler/tests/tests_basic.c 
b/drivers/gpu/drm/scheduler/tests/tests_basic.c
index 25dba633a14a..edf3e59ce63d 100644
--- a/drivers/gpu/drm/scheduler/tests/tests_basic.c
+++ b/drivers/gpu/drm/scheduler/tests/tests_basic.c
@@ -348,6 +348,77 @@ static struct kunit_suite drm_sched_priority = {
.test_cases = drm_sched_priority_tests,
 };
 
+static void drm_sched_test_modify_sched(struct kunit *test)
+{
+   unsigned int i, cur_ent = 0, cur_sched = 0;
+   struct drm_mock_sched_entity *entity[13];
+   struct drm_mock_scheduler *sched[3];
+   struct drm_mock_sched_job *job;
+   const unsigned int qd = 1000;
+   bool done;
+
+   /*
+* Submit a bunch of jobs against entities configured with different
+* schedulers and while waiting for them to complete, periodically keep
+* changing schedulers associated with each entity.
+*
+* We set up the queue-depth (qd) and job duration so the sched modify
+* loop has some time to interact with submissions to the backend and
+* job completions as they progress.
+*
+* For the number of schedulers and entities we use primes in order to
+* perturb the entity->sched assignments with less of a regular pattern.
+*/
+
+   for (i = 0; i < ARRAY_SIZE(sched); i++)
+   sched[i] = drm_mock_new_scheduler(test, MAX_SCHEDULE_TIMEOUT);
+
+   for (i = 0; i < ARRAY_SIZE(entity); i++)
+   entity[i] = drm_mock_new_sched_entity(test,
+ DRM_SCHED_PRIORITY_NORMAL,
+ sched[i % 
ARRAY_SIZE(sched)]);
+
+   for (i = 0; i < qd; i++) {
+   job = drm_mock_new_sched_job(test, entity[cur_ent++]);
+   cur_ent %= ARRAY_SIZE(entity);
+   drm_mock_sched_job_set_duration_us(job, 1000);
+   drm_mock_sched_job_submit(job);
+   }
+
+   do {
+   struct drm_gpu_scheduler *modify;
+
+   usleep_range(200, 500);
+   cur_ent++;
+   cur_ent %= ARRAY_SIZE(entity);
+   cur_sched++;
+   cur_sched %= ARRAY_SIZE(sched);
+   modify = &sched[cur_sched]->base;
+   drm_sched_entity_modify_sched(&entity[cur_ent]->base, &modify,
+ 1);
+   } while (!drm_mock_sched_job_is_finished(job));
+
+   done = drm_mock_sched_job_wait_finished(job, HZ);
+   KUNIT_ASSERT_EQ(test, done, true);
+
+   for (i = 0; i < ARRAY_SIZE(entity); i++)
+   drm_mock_sched_entity_free(entity[i]);
+
+   for (i = 0; i < ARRAY_SIZE(sched); i++)
+   drm_mock_scheduler_fini(sched[i]);
+}
+
+static struct kunit_case drm_sched_modify_sched_tests[] = {
+   KUNIT_CASE(drm_sched_test_modify_sched),
+   {}
+};
+
+static struct kunit_suite drm_sched_modify_sched = {
+   .name = "drm_sched_basic_modify_sched_tests",
+   .test_cases = drm_sched_modify_sched_tests,
+};
+
 kunit_test_suites(&drm_sched_basic,
  &drm_sched_timeout,
- &drm_sched_priority);
+ &drm_sched_priority,
+ &drm_sched_modify_sched);
-- 
2.48.0

[PATCH v5 5/6] drm/sched: Move internal prototypes to internal header

2025-02-14 Thread Tvrtko Ursulin

Now that we have a header file for internal scheduler interfaces we can
move some more prototypes into it. By doing that we eliminate the chance
of drivers trying to use something which was not intended to be used.

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
---
 drivers/gpu/drm/scheduler/sched_fence.c|  2 ++
 drivers/gpu/drm/scheduler/sched_internal.h | 30 ++
 include/drm/gpu_scheduler.h| 27 ---
 3 files changed, 32 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_fence.c 
b/drivers/gpu/drm/scheduler/sched_fence.c
index 0f35f009b9d3..e971528504a5 100644
--- a/drivers/gpu/drm/scheduler/sched_fence.c
+++ b/drivers/gpu/drm/scheduler/sched_fence.c
@@ -29,6 +29,8 @@
 
 #include 
 
+#include "sched_internal.h"
+
 static struct kmem_cache *sched_fence_slab;
 
 static int __init drm_sched_fence_slab_init(void)
diff --git a/drivers/gpu/drm/scheduler/sched_internal.h 
b/drivers/gpu/drm/scheduler/sched_internal.h
index 2fdf9dbd632e..53f587e9a8b4 100644
--- a/drivers/gpu/drm/scheduler/sched_internal.h
+++ b/drivers/gpu/drm/scheduler/sched_internal.h
@@ -1,6 +1,36 @@
 #ifndef _DRM_GPU_SCHEDULER_INTERNAL_H_
 #define _DRM_GPU_SCHEDULER_INTERNAL_H_
 
+
+/* Used to choose between FIFO and RR job-scheduling */
+extern int drm_sched_policy;
+
+#define DRM_SCHED_POLICY_RR0
+#define DRM_SCHED_POLICY_FIFO  1
+
+void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
+
+void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
+struct drm_sched_entity *entity);
+void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
+   struct drm_sched_entity *entity);
+
+void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
+struct drm_sched_rq *rq, ktime_t ts);
+
+void drm_sched_entity_select_rq(struct drm_sched_entity *entity);
+struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity 
*entity);
+
+struct drm_sched_fence *drm_sched_fence_alloc(struct drm_sched_entity 
*s_entity,
+ void *owner);
+void drm_sched_fence_init(struct drm_sched_fence *fence,
+ struct drm_sched_entity *entity);
+void drm_sched_fence_free(struct drm_sched_fence *fence);
+
+void drm_sched_fence_scheduled(struct drm_sched_fence *fence,
+  struct dma_fence *parent);
+void drm_sched_fence_finished(struct drm_sched_fence *fence, int result);
+
 /**
  * drm_sched_entity_queue_pop - Low level helper for popping queued jobs
  *
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 6cd0f288f6ed..8cb12f6231b8 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -71,12 +71,6 @@ enum drm_sched_priority {
DRM_SCHED_PRIORITY_COUNT
 };
 
-/* Used to choose between FIFO and RR job-scheduling */
-extern int drm_sched_policy;
-
-#define DRM_SCHED_POLICY_RR0
-#define DRM_SCHED_POLICY_FIFO  1
-
 /**
  * struct drm_sched_entity - A wrapper around a job queue (typically
  * attached to the DRM file_priv).
@@ -601,7 +595,6 @@ void drm_sched_entity_modify_sched(struct drm_sched_entity 
*entity,
 
 void drm_sched_tdr_queue_imm(struct drm_gpu_scheduler *sched);
 void drm_sched_job_cleanup(struct drm_sched_job *job);
-void drm_sched_wakeup(struct drm_gpu_scheduler *sched);
 bool drm_sched_wqueue_ready(struct drm_gpu_scheduler *sched);
 void drm_sched_wqueue_stop(struct drm_gpu_scheduler *sched);
 void drm_sched_wqueue_start(struct drm_gpu_scheduler *sched);
@@ -611,14 +604,6 @@ void drm_sched_resubmit_jobs(struct drm_gpu_scheduler 
*sched);
 void drm_sched_increase_karma(struct drm_sched_job *bad);
 void drm_sched_fault(struct drm_gpu_scheduler *sched);
 
-void drm_sched_rq_add_entity(struct drm_sched_rq *rq,
-struct drm_sched_entity *entity);
-void drm_sched_rq_remove_entity(struct drm_sched_rq *rq,
-   struct drm_sched_entity *entity);
-
-void drm_sched_rq_update_fifo_locked(struct drm_sched_entity *entity,
-struct drm_sched_rq *rq, ktime_t ts);
-
 int drm_sched_entity_init(struct drm_sched_entity *entity,
  enum drm_sched_priority priority,
  struct drm_gpu_scheduler **sched_list,
@@ -627,23 +612,11 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
 long drm_sched_entity_flush(struct drm_sched_entity *entity, long timeout);
 void drm_sched_entity_fini(struct drm_sched_entity *entity);
 void drm_sched_entity_destroy(struct drm_sched_entity *entity);
-void drm_sched_entity_select_rq(struct drm_sched_entity *entity);
-struct drm_sched_job *drm_sched_entity_pop_job(struct drm_sched_entity 
*entity);
 void drm_sched_entity_push_job(struct drm_sched_job *sched_job);
 void drm_sched_entity_set_priority(struct drm_sched_e

[PATCH v5 3/6] drm/sched: Remove a hole from struct drm_sched_job

2025-02-14 Thread Tvrtko Ursulin

We can re-order some struct members and take u32 credits outside of the
pointer sandwich and also for the last_dependency member we can get away
with an unsigned int since for dependency we use xa_limit_32b.

Pahole report before:
/* size: 160, cachelines: 3, members: 14 */
/* sum members: 156, holes: 1, sum holes: 4 */
/* last cacheline: 32 bytes */

And after:
/* size: 152, cachelines: 3, members: 14 */
/* last cacheline: 24 bytes */

Signed-off-by: Tvrtko Ursulin 
Cc: Christian König 
Cc: Danilo Krummrich 
Cc: Matthew Brost 
Cc: Philipp Stanner 
Acked-by: Danilo Krummrich 
Acked-by: Christian König 
---
 include/drm/gpu_scheduler.h | 38 +++--
 1 file changed, 20 insertions(+), 18 deletions(-)

diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 6bf458dbce84..1c1138308e66 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -338,8 +338,14 @@ struct drm_sched_fence *to_drm_sched_fence(struct 
dma_fence *f);
  * to schedule the job.
  */
 struct drm_sched_job {
-   struct spsc_nodequeue_node;
-   struct list_headlist;
+   u64 id;
+
+   /**
+* @submit_ts:
+*
+* When the job was pushed into the entity queue.
+*/
+   ktime_t submit_ts;
 
/**
 * @sched:
@@ -349,24 +355,30 @@ struct drm_sched_job {
 * has finished.
 */
struct drm_gpu_scheduler*sched;
+
struct drm_sched_fence  *s_fence;
+   struct drm_sched_entity *entity;
 
+   enum drm_sched_priority s_priority;
u32 credits;
+   /** @last_dependency: tracks @dependencies as they signal */
+   unsigned intlast_dependency;
+   atomic_tkarma;
+
+   struct spsc_nodequeue_node;
+   struct list_headlist;
 
/*
 * work is used only after finish_cb has been used and will not be
 * accessed anymore.
 */
union {
-   struct dma_fence_cb finish_cb;
-   struct work_struct  work;
+   struct dma_fence_cb finish_cb;
+   struct work_struct  work;
};
 
-   uint64_tid;
-   atomic_tkarma;
-   enum drm_sched_priority s_priority;
-   struct drm_sched_entity *entity;
struct dma_fence_cb cb;
+
/**
 * @dependencies:
 *
@@ -375,16 +387,6 @@ struct drm_sched_job {
 * drm_sched_job_add_implicit_dependencies().
 */
struct xarray   dependencies;
-
-   /** @last_dependency: tracks @dependencies as they signal */
-   unsigned long   last_dependency;
-
-   /**
-* @submit_ts:
-*
-* When the job was pushed into the entity queue.
-*/
-   ktime_t submit_ts;
 };
 
 static inline bool drm_sched_invalidate_job(struct drm_sched_job *s_job,
-- 
2.48.0

Re: [PATCH] drm/amd/display: Disable -Wenum-float-conversion for dml2_dpmm_dcn4.c

2025-02-14 Thread Alex Deucher

On Fri, Feb 14, 2025 at 11:28 AM Nathan Chancellor  wrote:
>
> On Thu, Dec 19, 2024 at 05:21:41PM -0500, Alex Deucher wrote:
> > On Thu, Dec 19, 2024 at 12:23 PM Nathan Chancellor  
> > wrote:
> > >
> > > Commit be4e3509314a ("drm/amd/display: DML21 Reintegration For Various
> > > Fixes") blew away commit fdedd77b0eb3 ("drm/amd/display: Reapply
> > > 2fde4fdddc1f"), which itself was a reapplication for the same reason,
> > > which results in that compiler warning returning:
> > >
> > >   
> > > drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_dpmm/dml2_dpmm_dcn4.c:215:58:
> > >  error: arithmetic between enumeration type 'enum dentist_divider_range' 
> > > and floating-point type 'double' [-Werror,-Wenum-float-conversion]
> > > 215 | divider = (unsigned int)(DFS_DIVIDER_RANGE_SCALE_FACTOR 
> > > * (vco_freq_khz / clock_khz));
> > > |  ~~ 
> > > ^ ~~
> > >
> > > Just disable the warning for the whole file via Makefile to avoid having
> > > to reapply the same fix every time the code syncs from wherever it is
> > > actually maintained.
> > >
> > > Fixes: be4e3509314a ("drm/amd/display: DML21 Reintegration For Various 
> > > Fixes")
> > > Signed-off-by: Nathan Chancellor 
> > > ---
> > > If you would prefer reapplying the local fix, feel free to do so, but I
> > > would like for it to be in the upstream source so it does not have to
> > > keep being applied.
> >
> > I've reapplied the original fix and I've confirmed that the fix will
> > be pushed to the DML tree as well this time.
>
> Did that actually end up happening? Commit 1b30456150e5
> ("drm/amd/display: DML21 Reintegration") in next-20250214 reintroduces
> this warning... I guess it may be a timing thing because the author date
> is three weeks ago or so. Should I send my "Reapply" patch or will you
> take care of it?

  I'll reapply.  Come on guys.  Please make sure this makes it's
way into the DML code this time if it hasn't already.

Thanks,

Alex

[PATCH] Documentation/gpu: Add acronyms for some firmware components

2025-02-14 Thread Rodrigo Siqueira

Users can check the file "/sys/kernel/debug/dri/0/amdgpu_firmware_info"
to get information on the firmware loaded in the system. This file has
multiple acronyms that are not documented in the glossary. This commit
introduces some missing acronyms to the AMD glossary documentation. The
meaning of each acronym in this commit was extracted from code
documentation available in the following files:

- drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
- drivers/gpu/drm/amd/include/amd_shared.h

Cc: Mario Limonciello 
Signed-off-by: Rodrigo Siqueira 
---
 Documentation/gpu/amdgpu/amdgpu-glossary.rst | 21 
 1 file changed, 21 insertions(+)

diff --git a/Documentation/gpu/amdgpu/amdgpu-glossary.rst 
b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
index 00a47ebb0b0f..3242db32b020 100644
--- a/Documentation/gpu/amdgpu/amdgpu-glossary.rst
+++ b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
@@ -12,6 +12,9 @@ we have a dedicated glossary for Display Core at
   The number of CUs that are active on the system.  The number of active
   CUs may be less than SE * SH * CU depending on the board configuration.
 
+CE
+  Constant Engine
+
 CP
   Command Processor
 
@@ -80,6 +83,9 @@ we have a dedicated glossary for Display Core at
 KIQ
   Kernel Interface Queue
 
+ME
+  Micro Engine
+
 MEC
   MicroEngine Compute
 
@@ -92,6 +98,9 @@ we have a dedicated glossary for Display Core at
 MQD
   Memory Queue Descriptor
 
+PFP
+  Pre-Fetch Parser
+
 PPLib
   PowerPlay Library - PowerPlay is the power management component.
 
@@ -110,14 +119,26 @@ we have a dedicated glossary for Display Core at
 SH
   SHader array
 
+SMC
+  System Management Controller
+
 SMU
   System Management Unit
 
 SS
   Spread Spectrum
 
+TA
+  Trusted Application
+
+UVD
+  Unified Video Decoder
+
 VCE
   Video Compression Engine
 
 VCN
   Video Codec Next
+
+VPE
+  Video Processing Engine
-- 
2.48.1

RE: [PATCH 12/16] drm/amd/display: Support BT2020 YCbCr fullrange

2025-02-14 Thread Li, Roman

[Public]

Hi Robert,  thank you for the feedback.
What about this version of commit message:

Fix BT2020 YCbCr limited/full range input

[Why]
BT2020 YCbCr input is not handled properly when full range
quantization is used and limited range is not supported at all.

[How]
- Add enums for BT2020 YCbCr limited/full range
- Add limited range CSC matrix


Thanks,
Roman

> -Original Message-
> From: Robert Mader 
> Sent: Friday, February 14, 2025 4:24 PM
> To: Li, Roman ; amd-gfx@lists.freedesktop.org
> Cc: Wentland, Harry ; Li, Sun peng (Leo)
> ; Pillai, Aurabindo ; Lin,
> Wayne ; Chung, ChiaHsuan (Tom)
> ; Zuo, Jerry ; Mohamed,
> Zaeem ; Chiu, Solomon
> ; Wheeler, Daniel ;
> Bakoulin, Ilya ; Kovac, Krunoslav
> ; Robert Mader 
> Subject: Re: [PATCH 12/16] drm/amd/display: Support BT2020 YCbCr fullrange
>
> Thanks a lot for the patch!
>
> Small commit title nit, sorry for spotting this earlier: this commit adds 
> BT2020
> *limited* range - full range was already supported, see the changes in 
> amdgpu_dm.c
> and dpp.h.
>
> On 14.02.25 16:00, roman...@amd.com wrote:
> > From: Ilya Bakoulin 
> >
> > [Why/How]
> > Need to add support for full-range quantization for YCbCr in BT2020
> > color space.
> >
> > Reviewed-by: Krunoslav Kovac 
> > Signed-off-by: Ilya Bakoulin 
> > Signed-off-by: Roman Li 
> > Tested-by: Robert Mader 
> > ---
> >   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c   | 6 +++---
> >   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c   | 2 +-
> >   drivers/gpu/drm/amd/display/dc/basics/dc_common.c   | 3 ++-
> >   drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c   | 5 +++--
> >   drivers/gpu/drm/amd/display/dc/core/dc_resource.c   | 4 ++--
> >   drivers/gpu/drm/amd/display/dc/dc_hw_types.h| 4 +++-
> >   drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c | 3 ++-
> >   .../gpu/drm/amd/display/dc/dio/dcn10/dcn10_stream_encoder.c | 3 ++-
> >   .../amd/display/dc/dio/dcn401/dcn401_dio_stream_encoder.c   | 3 ++-
> >   .../amd/display/dc/hpo/dcn31/dcn31_hpo_dp_stream_encoder.c  | 3 ++-
> >   drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h | 6 +-
> >   .../gpu/drm/amd/display/modules/info_packet/info_packet.c   | 4 ++--
> >   12 files changed, 29 insertions(+), 17 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > index b1b5f352b9aa..4ae54b3573ba 100644
> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > @@ -5616,9 +5616,9 @@ fill_plane_color_attributes(const struct
> > drm_plane_state *plane_state,
> >
> > case DRM_COLOR_YCBCR_BT2020:
> > if (full_range)
> > -   *color_space = COLOR_SPACE_2020_YCBCR;
> > +   *color_space = COLOR_SPACE_2020_YCBCR_FULL;
> > else
> > -   return -EINVAL;
> > +   *color_space = COLOR_SPACE_2020_YCBCR_LIMITED;
> > break;
> >
> > default:
> > @@ -6114,7 +6114,7 @@ get_output_color_space(const struct dc_crtc_timing
> *dc_crtc_timing,
> > if (dc_crtc_timing->pixel_encoding == PIXEL_ENCODING_RGB)
> > color_space = COLOR_SPACE_2020_RGB_FULLRANGE;
> > else
> > -   color_space = COLOR_SPACE_2020_YCBCR;
> > +   color_space = COLOR_SPACE_2020_YCBCR_LIMITED;
> > break;
> > case DRM_MODE_COLORIMETRY_DEFAULT: // ITU601
> > default:
> > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
> > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
> > index 049046c60462..c7d13e743e6c 100644
> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
> > @@ -1169,7 +1169,7 @@ static int amdgpu_current_colorspace_show(struct
> seq_file *m, void *data)
> > case COLOR_SPACE_2020_RGB_FULLRANGE:
> > seq_puts(m, "BT2020_RGB");
> > break;
> > -   case COLOR_SPACE_2020_YCBCR:
> > +   case COLOR_SPACE_2020_YCBCR_LIMITED:
> > seq_puts(m, "BT2020_YCC");
> > break;
> > default:
> > diff --git a/drivers/gpu/drm/amd/display/dc/basics/dc_common.c
> > b/drivers/gpu/drm/amd/display/dc/basics/dc_common.c
> > index b2fc4f8e6482..a51c2701da24 100644
> > --- a/drivers/gpu/drm/amd/display/dc/basics/dc_common.c
> > +++ b/drivers/gpu/drm/amd/display/dc/basics/dc_common.c
> > @@ -40,7 +40,8 @@ bool is_rgb_cspace(enum dc_color_space
> output_color_space)
> > case COLOR_SPACE_YCBCR709:
> > case COLOR_SPACE_YCBCR601_LIMITED:
> > case COLOR_SPACE_YCBCR709_LIMITED:
> > -   case COLOR_SPACE_2020_YCBCR:
> > +   case COLOR_SPACE_2020_YCBCR_LIMITED:
> > +   case COLOR_SPACE_2020_YCBCR_FULL:
> > return false;
> > default:
> > /* Add a case to switch */
> > diff --git a/drivers/gp

RE: [PATCH 12/16] drm/amd/display: Support BT2020 YCbCr fullrange

2025-02-14 Thread Kovac, Krunoslav

[AMD Official Use Only - AMD Internal Distribution Only]

Hi Robert,

We only had one enum: COLOR_SPACE_2020_YCBCR.
On the output side this assumed limited range.
On the input side this apparently assumed full range given the dpp matrix.
Now we split it into two enums to distinguish them and add limited range 
yuv->rgb matrix.

Roman, maybe we can reword it a little bit further?

Thanks,
Kruno

-Original Message-
From: Robert Mader 
Sent: February 14, 2025 17:06
To: Li, Roman ; amd-gfx@lists.freedesktop.org
Cc: Wentland, Harry ; Li, Sun peng (Leo) 
; Pillai, Aurabindo ; Lin, Wayne 
; Chung, ChiaHsuan (Tom) ; Zuo, 
Jerry ; Mohamed, Zaeem ; Chiu, 
Solomon ; Wheeler, Daniel ; 
Bakoulin, Ilya ; Kovac, Krunoslav 

Subject: Re: [PATCH 12/16] drm/amd/display: Support BT2020 YCbCr fullrange

Hi Roman,

 > not handled properly when full range quantization is used

I wasn't aware that there was something wrong with the full range handling and 
it's also not clear from the [How] section what that was - can you shortly 
elaborate on that? Apart from that it looks good to me, thanks!

On 14.02.25 22:55, Li, Roman wrote:
> [Public]
>
> Hi Robert,  thank you for the feedback.
> What about this version of commit message:
>
> Fix BT2020 YCbCr limited/full range input
>
> [Why]
> BT2020 YCbCr input is not handled properly when full range
> quantization is used and limited range is not supported at all.
>
> [How]
> - Add enums for BT2020 YCbCr limited/full range
> - Add limited range CSC matrix
>
>
> Thanks,
> Roman
>
>> -Original Message-
>> From: Robert Mader 
>> Sent: Friday, February 14, 2025 4:24 PM
>> To: Li, Roman ; amd-gfx@lists.freedesktop.org
>> Cc: Wentland, Harry ; Li, Sun peng (Leo)
>> ; Pillai, Aurabindo ;
>> Lin, Wayne ; Chung, ChiaHsuan (Tom)
>> ; Zuo, Jerry ; Mohamed,
>> Zaeem ; Chiu, Solomon ;
>> Wheeler, Daniel ; Bakoulin, Ilya
>> ; Kovac, Krunoslav ;
>> Robert Mader 
>> Subject: Re: [PATCH 12/16] drm/amd/display: Support BT2020 YCbCr
>> fullrange
>>
>> Thanks a lot for the patch!
>>
>> Small commit title nit, sorry for spotting this earlier: this commit
>> adds BT2020
>> *limited* range - full range was already supported, see the changes
>> in amdgpu_dm.c and dpp.h.
>>
>> On 14.02.25 16:00, roman...@amd.com wrote:
>>> From: Ilya Bakoulin 
>>>
>>> [Why/How]
>>> Need to add support for full-range quantization for YCbCr in BT2020
>>> color space.
>>>
>>> Reviewed-by: Krunoslav Kovac 
>>> Signed-off-by: Ilya Bakoulin 
>>> Signed-off-by: Roman Li 
>>> Tested-by: Robert Mader 
>>> ---
>>>drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c   | 6 +++---
>>>drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c   | 2 +-
>>>drivers/gpu/drm/amd/display/dc/basics/dc_common.c   | 3 ++-
>>>drivers/gpu/drm/amd/display/dc/core/dc_hw_sequencer.c   | 5 +++--
>>>drivers/gpu/drm/amd/display/dc/core/dc_resource.c   | 4 ++--
>>>drivers/gpu/drm/amd/display/dc/dc_hw_types.h| 4 +++-
>>>drivers/gpu/drm/amd/display/dc/dce/dce_stream_encoder.c | 3 ++-
>>>.../gpu/drm/amd/display/dc/dio/dcn10/dcn10_stream_encoder.c | 3 ++-
>>>.../amd/display/dc/dio/dcn401/dcn401_dio_stream_encoder.c   | 3 ++-
>>>.../amd/display/dc/hpo/dcn31/dcn31_hpo_dp_stream_encoder.c  | 3 ++-
>>>drivers/gpu/drm/amd/display/dc/inc/hw/dpp.h | 6 +-
>>>.../gpu/drm/amd/display/modules/info_packet/info_packet.c   | 4 ++--
>>>12 files changed, 29 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> index b1b5f352b9aa..4ae54b3573ba 100644
>>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>>> @@ -5616,9 +5616,9 @@ fill_plane_color_attributes(const struct
>>> drm_plane_state *plane_state,
>>>
>>>  case DRM_COLOR_YCBCR_BT2020:
>>>  if (full_range)
>>> -   *color_space = COLOR_SPACE_2020_YCBCR;
>>> +   *color_space = COLOR_SPACE_2020_YCBCR_FULL;
>>>  else
>>> -   return -EINVAL;
>>> +   *color_space = COLOR_SPACE_2020_YCBCR_LIMITED;
>>>  break;
>>>
>>>  default:
>>> @@ -6114,7 +6114,7 @@ get_output_color_space(const struct
>>> dc_crtc_timing
>> *dc_crtc_timing,
>>>  if (dc_crtc_timing->pixel_encoding == PIXEL_ENCODING_RGB)
>>>  color_space = COLOR_SPACE_2020_RGB_FULLRANGE;
>>>  else
>>> -   color_space = COLOR_SPACE_2020_YCBCR;
>>> +   color_space = COLOR_SPACE_2020_YCBCR_LIMITED;
>>>  break;
>>>  case DRM_MODE_COLORIMETRY_DEFAULT: // ITU601
>>>  default:
>>> diff --git
>>> a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
>>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_debugfs.c
>>> index 049046c60462..c7d13e743e6c 100644
>>> --- a/drivers/gpu/drm/am

Re: [PATCH] Documentation/gpu: Add acronyms for some firmware components

2025-02-14 Thread Alex Deucher

On Fri, Feb 14, 2025 at 6:38 PM Rodrigo Siqueira  wrote:
>
> On 02/14, Alex Deucher wrote:
> > On Fri, Feb 14, 2025 at 6:00 PM Rodrigo Siqueira  
> > wrote:
> > >
> > > Users can check the file "/sys/kernel/debug/dri/0/amdgpu_firmware_info"
> > > to get information on the firmware loaded in the system. This file has
> > > multiple acronyms that are not documented in the glossary. This commit
> > > introduces some missing acronyms to the AMD glossary documentation. The
> > > meaning of each acronym in this commit was extracted from code
> > > documentation available in the following files:
> > >
> > > - drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c
> > > - drivers/gpu/drm/amd/include/amd_shared.h
> > >
> > > Cc: Mario Limonciello 
> > > Signed-off-by: Rodrigo Siqueira 
> > > ---
> > >  Documentation/gpu/amdgpu/amdgpu-glossary.rst | 21 
> > >  1 file changed, 21 insertions(+)
> > >
> > > diff --git a/Documentation/gpu/amdgpu/amdgpu-glossary.rst 
> > > b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> > > index 00a47ebb0b0f..3242db32b020 100644
> > > --- a/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> > > +++ b/Documentation/gpu/amdgpu/amdgpu-glossary.rst
> > > @@ -12,6 +12,9 @@ we have a dedicated glossary for Display Core at
> > >The number of CUs that are active on the system.  The number of 
> > > active
> > >CUs may be less than SE * SH * CU depending on the board 
> > > configuration.
> > >
> > > +CE
> > > +  Constant Engine
> > > +
> > >  CP
> > >Command Processor
> > >
> > > @@ -80,6 +83,9 @@ we have a dedicated glossary for Display Core at
> > >  KIQ
> > >Kernel Interface Queue
> > >
> > > +ME
> > > +  Micro Engine
> >
> > This is part of Graphics so maybe something like:
> >
> > ME
> > MicroEngine (Graphics)
> >
> > > +
> > >  MEC
> > >MicroEngine Compute
> > >
> > > @@ -92,6 +98,9 @@ we have a dedicated glossary for Display Core at
> > >  MQD
> > >Memory Queue Descriptor
> > >
> > > +PFP
> > > +  Pre-Fetch Parser
> >
> > This is also part of GFX.
> >
> > PFP
> > Pre-Fetch Parser (Graphics)
> >
> > > +
> > >  PPLib
> > >PowerPlay Library - PowerPlay is the power management component.
> > >
> > > @@ -110,14 +119,26 @@ we have a dedicated glossary for Display Core at
> > >  SH
> > >SHader array
> > >
> > > +SMC
> > > +  System Management Controller
> > > +
> > >  SMU
> > >System Management Unit
> >
> > These two are synonyms.
> >
> > How about
> > SMU / SMC
> > System Management Unit / System Management Controller
> >
> > Other than that, looks good.
> >
>
> Thanks a lot for all the suggestions; I'll make those changes for the
> V2.
>
> btw, from the amdgpu_firmware_info, I did not find the meaning of the
> below acronyms, could you help me with that?
>
> MC

Memory Controller

> SRL(C|G|S)

RLC = RunList Controller
The name is a remnant of ages past and doesn't really have much
meaning today.  It's a group of general purpose helper engines for the
GFX block.  It's involved in GFX power management and SR-IOV among
other things.

SRLC = SAVE/RESTORE LIST CNTL
SRLG = SAVE/RESTORE LIST GPM_MEM
SRLS = SAVE/RESTORE LIST SRM_MEM


> IMU

Integrated Management Unit

Another engine which helps with power management tasks.

> ASD

I can't remember what this stands for off hand.  Will need to look it up.

> TOC

Table of Contents

>
> Thanks
> Siqueira
>
> > Alex
> >
> > >
> > >  SS
> > >Spread Spectrum
> > >
> > > +TA
> > > +  Trusted Application
> > > +
> > > +UVD
> > > +  Unified Video Decoder
> > > +
> > >  VCE
> > >Video Compression Engine
> > >
> > >  VCN
> > >Video Codec Next
> > > +
> > > +VPE
> > > +  Video Processing Engine
> > > --
> > > 2.48.1
> > >

Re: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

2025-02-14 Thread Liu, Shaoyun

[AMD Official Use Only - AMD Internal Distribution Only]

I think I should make it more clear. When mes is been used , no matter its 
pipe0 or pipe1 , we expected both set_hw_resource and set_hw_resource_1 been 
called, that's requirement for mes_v12 and later . For none unified mes config, 
the pipe1 will not use mes, so no mes api is required for pipe1, but for pipe0, 
it's still the same requirement.

Regards
Shaoyun.liu

Get Outlook for iOS

From: Liu, Shaoyun
Sent: Friday, February 14, 2025 12:46:27 PM
To: Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org 
Subject: RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once


Oh,  you right.  It’s only for unified MES , for none-unified , it will still 
use the kiq from CP directly on pipe1 . So there is no MES API for it at all . 
It’s my fault . please ignore my previous comments . Your current change for 
this serials is good enough.



Regards

Shaoyun.liu



From: Deucher, Alexander 
Sent: Friday, February 14, 2025 12:42 PM
To: Liu, Shaoyun ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once



[AMD Official Use Only - AMD Internal Distribution Only]



Does it matter which pipe we use for these packets?



Alex





From: Liu, Shaoyun mailto:shaoyun@amd.com>>
Sent: Friday, February 14, 2025 12:36 PM
To: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; 
amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Subject: RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once



[AMD Official Use Only - AMD Internal Distribution Only]



Ok .  From MES point of view , we expecting  both set_hw_resource and 
set_hw_resource_1 been called all the time.



Reviewed-by: Shaoyun.liu mailto:shaoyun@amd.com>>



From: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Sent: Friday, February 14, 2025 11:53 AM
To: Liu, Shaoyun mailto:shaoyun@amd.com>>; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once



[AMD Official Use Only - AMD Internal Distribution Only]



I can add that as a follow up patch as I don't want to change the current 
behavior to avoid a potential regression.  Should we submit both the resource 
and resource_1 packets all the time?



Thanks,



Alex





From: Liu, Shaoyun mailto:shaoyun@amd.com>>
Sent: Friday, February 14, 2025 11:45 AM
To: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; 
amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once



[AMD Official Use Only - AMD Internal Distribution Only]

I'd suggest remove the  enable_uni_mes check, set_hw_resource_1 is always 
required for gfx12 and  up. Especially after add the  cleaner_shader_fence_addr 
there.

Regards
Shaoyun.liu

-Original Message-
From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Alex Deucher
Sent: Friday, February 14, 2025 10:19 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

Allocate the buffer at sw init time so we don't alloc and free it for every 
suspend/resume or reset cycle.

Signed-off-by: Alex Deucher 
mailto:alexander.deuc...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 39 +-
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 8dbab3834d82d..6db88584dd529 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -678,9 +678,6 @@ static int mes_v12_0_misc_op(struct amdgpu_mes *mes,

 static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes *mes, int pipe)  {
-   unsigned int alloc_size = AMDGPU_GPU_PAGE_SIZE;
-   int ret = 0;
-   struct amdgpu_device *adev = mes->adev;
union MESAPI_SET_HW_RESOURCES_1 mes_set_hw_res_1_pkt;

memset(&mes_set_hw_res_1_pkt, 0, sizeof(mes_set_hw_res_1_pkt)); @@ 
-689,17 +686,6 @@ static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes 
*mes, int pipe)
mes_set_hw_res_1_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
mes_set_hw_res_1_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
mes_set_hw_res_1_pkt.mes_kiq_unmap_timeout = 0xa;
-
-   ret = amdgpu_bo_create_kernel(adev, alloc_size, PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   &mes->resource_1,
-   &mes->resource_1_gp

[PATCH] drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV

2025-02-14 Thread Srinivasan Shanmugam

RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.

The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.

The warning [ BUG: Invalid wait context ] indicates that a thread is
trying to acquire a mutex while it is in a context that does not allow
it to sleep (like holding a spinlock).

Fixes the below:

[  253.013423] =
[  253.013434] [ BUG: Invalid wait context ]
[  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE
[  253.013464] -
[  253.013475] kworker/0:1/10 is trying to lock:
[  253.013487] 9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: 
amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.013815] other info that might help us debug this:
[  253.013827] context-{4:4}
[  253.013835] 3 locks held by kworker/0:1/10:
[  253.013847]  #0: 9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: 
process_one_work+0x3f5/0x680
[  253.013877]  #1: b789c008be40 
((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[  253.013905]  #2: 9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, 
at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
[  253.014154] stack backtrace:
[  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE  
6.12.0-amdstaging-drm-next-lol-050225 #14
[  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
[  253.014224] Workqueue: events work_for_cpu_fn
[  253.014241] Call Trace:
[  253.014250]  
[  253.014260]  dump_stack_lvl+0x9b/0xf0
[  253.014275]  dump_stack+0x10/0x20
[  253.014287]  __lock_acquire+0xa47/0x2810
[  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014321]  lock_acquire+0xd1/0x300
[  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014562]  ? __lock_acquire+0xa6b/0x2810
[  253.014578]  __mutex_lock+0x85/0xe20
[  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014782]  ? sched_clock_noinstr+0x9/0x10
[  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014808]  ? local_clock_noinstr+0xe/0xc0
[  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.015029]  mutex_lock_nested+0x1b/0x30
[  253.015044]  ? mutex_lock_nested+0x1b/0x30
[  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
[  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
[  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
[  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]
[  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
[  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
[  253.017057]  amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
[  253.017493]  local_pci_probe+0x4b/0xb0
[  253.017746]  work_for_cpu_fn+0x1a/0x30
[  253.017995]  process_one_work+0x21e/0x680
[  253.018248]  worker_thread+0x190/0x330
[  253.018500]  ? __pfx_worker_thread+0x10/0x10
[  253.018746]  kthread+0xe7/0x120
[  253.018988]  ? __pfx_kthread+0x10/0x10
[  253.019231]  ret_from_fork+0x3c/0x60
[  253.019468]  ? __pfx_kthread+0x10/0x10
[  253.019701]  ret_from_fork_asm+0x1a/0x30
[  253.019939]  

Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface")
Cc: lin cao 
Cc: Jingwen Chen 
Cc: Victor Skvortsov 
Cc: Zhigang Luo 
Cc: Christian König 
Cc: Alex Deucher 
Signed-off-by: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   | 9 +++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   | 3 ++-
 3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index eab530778fbd..14125cc3a937 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4251,7 +4251,6 @@ int

Re: [PATCH] drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV

2025-02-14 Thread Christian König

Am 14.02.25 um 09:57 schrieb Srinivasan Shanmugam:
> RLCG Register Access is a way for virtual functions to safely access GPU
> registers in a virtualized environment., including TLB flushes and
> register reads. When multiple threads or VFs try to access the same
> registers simultaneously, it can lead to race conditions. By using the
> RLCG interface, the driver can serialize access to the registers. This
> means that only one thread can access the registers at a time,
> preventing conflicts and ensuring that operations are performed
> correctly. Additionally, when a low-priority task holds a mutex that a
> high-priority task needs, ie., If a thread holding a spinlock tries to
> acquire a mutex, it can lead to priority inversion. register access in
> amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.
>
> The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
> called, which attempts to acquire the mutex. This function is invoked
> from amdgpu_sriov_wreg, which in turn is called from
> gmc_v11_0_flush_gpu_tlb.
>
> The warning [ BUG: Invalid wait context ] indicates that a thread is
> trying to acquire a mutex while it is in a context that does not allow
> it to sleep (like holding a spinlock).
>
> Fixes the below:
>
> [  253.013423] =
> [  253.013434] [ BUG: Invalid wait context ]
> [  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U 
> OE
> [  253.013464] -
> [  253.013475] kworker/0:1/10 is trying to lock:
> [  253.013487] 9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: 
> amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
> [  253.013815] other info that might help us debug this:
> [  253.013827] context-{4:4}
> [  253.013835] 3 locks held by kworker/0:1/10:
> [  253.013847]  #0: 9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: 
> process_one_work+0x3f5/0x680
> [  253.013877]  #1: b789c008be40 
> ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
> [  253.013905]  #2: 9f3054281838 
> (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: 
> gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
> [  253.014154] stack backtrace:
> [  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U 
> OE  6.12.0-amdstaging-drm-next-lol-050225 #14
> [  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
> Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
> [  253.014224] Workqueue: events work_for_cpu_fn
> [  253.014241] Call Trace:
> [  253.014250]  
> [  253.014260]  dump_stack_lvl+0x9b/0xf0
> [  253.014275]  dump_stack+0x10/0x20
> [  253.014287]  __lock_acquire+0xa47/0x2810
> [  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  253.014321]  lock_acquire+0xd1/0x300
> [  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
> [  253.014562]  ? __lock_acquire+0xa6b/0x2810
> [  253.014578]  __mutex_lock+0x85/0xe20
> [  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
> [  253.014782]  ? sched_clock_noinstr+0x9/0x10
> [  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  253.014808]  ? local_clock_noinstr+0xe/0xc0
> [  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
> [  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  253.015029]  mutex_lock_nested+0x1b/0x30
> [  253.015044]  ? mutex_lock_nested+0x1b/0x30
> [  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
> [  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
> [  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
> [  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
> [  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
> [  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5
> [  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]
> [  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
> [  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
> [  253.017057]  amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
> [  253.017493]  local_pci_probe+0x4b/0xb0
> [  253.017746]  work_for_cpu_fn+0x1a/0x30
> [  253.017995]  process_one_work+0x21e/0x680
> [  253.018248]  worker_thread+0x190/0x330
> [  253.018500]  ? __pfx_worker_thread+0x10/0x10
> [  253.018746]  kthread+0xe7/0x120
> [  253.018988]  ? __pfx_kthread+0x10/0x10
> [  253.019231]  ret_from_fork+0x3c/0x60
> [  253.019468]  ? __pfx_kthread+0x10/0x10
> [  253.019701]  ret_from_fork_asm+0x1a/0x30
> [  253.019939]  
>
> Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface")
> Cc: lin cao 
> Cc: Jingwen Chen 
> Cc: Victor Skvortsov 
> Cc: Zhigang Luo 
> Cc: Christian König 
> Cc: Alex Deucher 
> Signed-off-by: Srinivasan Shanmugam 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   | 9 +++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   | 3 ++-
>  3 files changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a

Re: [PATCH 2/3] drm/amdgpu: Pop jobs from the queue more robustly

2025-02-14 Thread Christian König

Am 14.02.25 um 11:21 schrieb Tvrtko Ursulin:
>
> Hi Christian,
>
> On 11/02/2025 10:21, Christian König wrote:
>> Am 11.02.25 um 11:08 schrieb Philipp Stanner:
>>> On Tue, 2025-02-11 at 09:22 +0100, Christian König wrote:
 Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:
> Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a
> newly
> added __drm_sched_entity_queue_pop.
>
> This allows breaking the hidden dependency that queue_node has to
> be the
> first element in struct drm_sched_job.
>
> A comment is also added with a reference to the mailing list
> discussion
> explaining the copied helper will be removed when the whole broken
> amdgpu_job_stop_all_jobs_on_sched is removed.
>
> Signed-off-by: Tvrtko Ursulin 
> Cc: Christian König 
> Cc: Danilo Krummrich 
> Cc: Matthew Brost 
> Cc: Philipp Stanner 
> Cc: "Zhang, Hawking" 
 Reviewed-by: Christian König 
>>> I think this v3 has been supplanted by a v4 by now.
>>
>> I've seen the larger v4 series as well, but at least that patch here looks 
>> identical on first glance. So my rb still counts.
>
> Is it okay for you to merge the whole series (including this single amdgpu 
> patch) via drm-misc?

I can do that, but don't want the scheduler maintainer want to pick them up?

Regards,
Christian.

>
> Regards,
>
> Tvrtko
>
>>> @Tvrtko: btw, do you create patches with
>>> git format-patch -v4 ?
>>>
>>> That way the v4 label will be included in all patch titles, too, not
>>> just the cover letter. That makes searching etc. easier in large
>>> inboxes
>>>
>>> P.
>>>
> ---
>    drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++-
> -- 
>    1 file changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> index 100f04475943..22cb48bab24d 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
> @@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct
> drm_sched_job *sched_job)
>    return fence;
>    }
> -#define to_drm_sched_job(sched_job)    \
> -    container_of((sched_job), struct drm_sched_job,
> queue_node)
> +/*
> + * This is a duplicate function from DRM scheduler
> sched_internal.h.
> + * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is
> removed, due
> + * latter being incorrect and racy.
> + *
> + * See
> https://lore.kernel.org/amd-gfx/44edde63-7181-44fb- 
> a4f7-94e50514f...@amd.com/
> + */
> +static struct drm_sched_job *
> +__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
> +{
> +    struct spsc_node *node;
> +
> +    node = spsc_queue_pop(&entity->job_queue);
> +    if (!node)
> +    return NULL;
> +
> +    return container_of(node, struct drm_sched_job,
> queue_node);
> +}
>    void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
> *sched)
>    {
> @@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
> drm_gpu_scheduler *sched)
>    struct drm_sched_rq *rq = sched->sched_rq[i];
>    spin_lock(&rq->lock);
>    list_for_each_entry(s_entity, &rq->entities, list)
> {
> -    while ((s_job =
> to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue {
> +    while ((s_job =
> __drm_sched_entity_queue_pop(s_entity))) {
>    struct drm_sched_fence *s_fence =
> s_job->s_fence;
>    dma_fence_signal(&s_fence-
>> scheduled);
>>
>

Re: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not released

2025-02-14 Thread Yadav, Arvind




On 2/14/2025 6:09 PM, Christian König wrote:

Yeah, completely agree.

But not checking the syncobj handle before doing the update is actually even 
more problematic than leaking the memory.

This could be used by userspace to put the kernel into a broken situation it 
can't come out any more.

Arvin can you take care of the complete fix?

Sure, I will do that.


Thanks,

~Arvind


Thanks,
Christian.

Am 14.02.25 um 13:14 schrieb YuanShang Mao (River):

[AMD Official Use Only - AMD Internal Distribution Only]

Better to put the fence outside amdgpu_gem_va_update_vm. Since it is passed to 
the caller, and the caller must keep one reference at least until this fence is 
no longer needed.

Thanks
River

-Original Message-
From: amd-gfx  On Behalf Of Yadav, Arvind
Sent: Friday, February 14, 2025 7:42 PM
To: Koenig, Christian ; Ma, Le ; 
amd-gfx@lists.freedesktop.org; Yadav, Arvind 
Cc: Zhang, Hawking ; Lazar, Lijo 
Subject: Re: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not released


On 2/14/2025 4:08 PM, Christian König wrote:

Adding Arvind, please make sure to keep him in the loop.

Am 14.02.25 um 11:07 schrieb Le Ma:

On systems with CONFIG_SLUB_DEBUG enabled, the memleak like below
will show up explicitly during driver unloading if created bo without
drm_timeline object before.

  BUG drm_sched_fence (Tainted: G   OE ): Objects remaining in 
drm_sched_fence on __kmem_cache_shutdown()
  
-
  Call Trace:
  
  dump_stack_lvl+0x4c/0x70
  dump_stack+0x14/0x20
  slab_err+0xb0/0xf0
  ? srso_alias_return_thunk+0x5/0xfbef5
  ? flush_work+0x12/0x20
  ? srso_alias_return_thunk+0x5/0xfbef5
  __kmem_cache_shutdown+0x163/0x2e0
  kmem_cache_destroy+0x61/0x170
  drm_sched_fence_slab_fini+0x19/0x900

Thus call dma_fence_put properly to avoid the memleak.

v2: call dma_fence_put in amdgpu_gem_va_update_vm

Signed-off-by: Le Ma 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +++--
   1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 8b67aae6c2fe..00f1f34705c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -759,7 +759,8 @@ static struct dma_fence *
   amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
  struct amdgpu_vm *vm,
  struct amdgpu_bo_va *bo_va,
-uint32_t operation)
+uint32_t operation,
+uint32_t syncobj_handle)
   {
  struct dma_fence *fence = dma_fence_get_stub();
  int r;
@@ -771,6 +772,9 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
  if (r)
  goto error;

+if (!syncobj_handle)
+dma_fence_put(fence);
+

Having that check inside amdgpu_gem_update_bo_mapping() was actually correct. 
Here it doesn't make much sense.

Agreed,

Regards,
~Arvind


  if (operation == AMDGPU_VA_OP_MAP ||
  operation == AMDGPU_VA_OP_REPLACE) {
  r = amdgpu_vm_bo_update(adev, bo_va, false); @@ -965,7 +969,8 @@
int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
  &timeline_chain);

Right before this here is a call to amdgpu_gem_update_timeline_node() which is 
incorrectly placed.

That needs to come much earlier, above the switch (args->operation)

Regards,
Christian.


  fence = amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
-args->operation);
+args->operation,
+args->vm_timeline_syncobj_out);

  if (!r)
  amdgpu_gem_update_bo_mapping(filp, bo_va,

Re: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not released

2025-02-14 Thread Yadav, Arvind




On 2/14/2025 4:08 PM, Christian König wrote:

Adding Arvind, please make sure to keep him in the loop.

Am 14.02.25 um 11:07 schrieb Le Ma:

On systems with CONFIG_SLUB_DEBUG enabled, the memleak like below
will show up explicitly during driver unloading if created bo without
drm_timeline object before.

 BUG drm_sched_fence (Tainted: G   OE ): Objects remaining in 
drm_sched_fence on __kmem_cache_shutdown()
 
-
 Call Trace:
 
 dump_stack_lvl+0x4c/0x70
 dump_stack+0x14/0x20
 slab_err+0xb0/0xf0
 ? srso_alias_return_thunk+0x5/0xfbef5
 ? flush_work+0x12/0x20
 ? srso_alias_return_thunk+0x5/0xfbef5
 __kmem_cache_shutdown+0x163/0x2e0
 kmem_cache_destroy+0x61/0x170
 drm_sched_fence_slab_fini+0x19/0x900

Thus call dma_fence_put properly to avoid the memleak.

v2: call dma_fence_put in amdgpu_gem_va_update_vm

Signed-off-by: Le Ma 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +++--
  1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 8b67aae6c2fe..00f1f34705c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -759,7 +759,8 @@ static struct dma_fence *
  amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
struct amdgpu_vm *vm,
struct amdgpu_bo_va *bo_va,
-   uint32_t operation)
+   uint32_t operation,
+   uint32_t syncobj_handle)
  {
struct dma_fence *fence = dma_fence_get_stub();
int r;
@@ -771,6 +772,9 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
if (r)
goto error;
  
+	if (!syncobj_handle)

+   dma_fence_put(fence);
+

Having that check inside amdgpu_gem_update_bo_mapping() was actually correct. 
Here it doesn't make much sense.


Agreed,

Regards,
~Arvind




if (operation == AMDGPU_VA_OP_MAP ||
operation == AMDGPU_VA_OP_REPLACE) {
r = amdgpu_vm_bo_update(adev, bo_va, false);
@@ -965,7 +969,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
&timeline_chain);

Right before this here is a call to amdgpu_gem_update_timeline_node() which is 
incorrectly placed.

That needs to come much earlier, above the switch (args->operation)

Regards,
Christian.

  
  		fence = amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,

-   args->operation);
+   args->operation,
+   args->vm_timeline_syncobj_out);
  
  		if (!r)

amdgpu_gem_update_bo_mapping(filp, bo_va,

Re: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not released

2025-02-14 Thread Christian König

Yeah, completely agree.

But not checking the syncobj handle before doing the update is actually even 
more problematic than leaking the memory.

This could be used by userspace to put the kernel into a broken situation it 
can't come out any more.

Arvin can you take care of the complete fix?

Thanks,
Christian.

Am 14.02.25 um 13:14 schrieb YuanShang Mao (River):
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> Better to put the fence outside amdgpu_gem_va_update_vm. Since it is passed 
> to the caller, and the caller must keep one reference at least until this 
> fence is no longer needed.
>
> Thanks
> River
>
> -Original Message-
> From: amd-gfx  On Behalf Of Yadav, 
> Arvind
> Sent: Friday, February 14, 2025 7:42 PM
> To: Koenig, Christian ; Ma, Le ; 
> amd-gfx@lists.freedesktop.org; Yadav, Arvind 
> Cc: Zhang, Hawking ; Lazar, Lijo 
> Subject: Re: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not 
> released
>
>
> On 2/14/2025 4:08 PM, Christian König wrote:
>> Adding Arvind, please make sure to keep him in the loop.
>>
>> Am 14.02.25 um 11:07 schrieb Le Ma:
>>> On systems with CONFIG_SLUB_DEBUG enabled, the memleak like below
>>> will show up explicitly during driver unloading if created bo without
>>> drm_timeline object before.
>>>
>>>  BUG drm_sched_fence (Tainted: G   OE ): Objects remaining 
>>> in drm_sched_fence on __kmem_cache_shutdown()
>>>  
>>> -
>>>  Call Trace:
>>>  
>>>  dump_stack_lvl+0x4c/0x70
>>>  dump_stack+0x14/0x20
>>>  slab_err+0xb0/0xf0
>>>  ? srso_alias_return_thunk+0x5/0xfbef5
>>>  ? flush_work+0x12/0x20
>>>  ? srso_alias_return_thunk+0x5/0xfbef5
>>>  __kmem_cache_shutdown+0x163/0x2e0
>>>  kmem_cache_destroy+0x61/0x170
>>>  drm_sched_fence_slab_fini+0x19/0x900
>>>
>>> Thus call dma_fence_put properly to avoid the memleak.
>>>
>>> v2: call dma_fence_put in amdgpu_gem_va_update_vm
>>>
>>> Signed-off-by: Le Ma 
>>> ---
>>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +++--
>>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> index 8b67aae6c2fe..00f1f34705c0 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>>> @@ -759,7 +759,8 @@ static struct dma_fence *
>>>   amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>>>  struct amdgpu_vm *vm,
>>>  struct amdgpu_bo_va *bo_va,
>>> -uint32_t operation)
>>> +uint32_t operation,
>>> +uint32_t syncobj_handle)
>>>   {
>>>  struct dma_fence *fence = dma_fence_get_stub();
>>>  int r;
>>> @@ -771,6 +772,9 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>>>  if (r)
>>>  goto error;
>>>
>>> +if (!syncobj_handle)
>>> +dma_fence_put(fence);
>>> +
>> Having that check inside amdgpu_gem_update_bo_mapping() was actually 
>> correct. Here it doesn't make much sense.
> Agreed,
>
> Regards,
> ~Arvind
>
>>>  if (operation == AMDGPU_VA_OP_MAP ||
>>>  operation == AMDGPU_VA_OP_REPLACE) {
>>>  r = amdgpu_vm_bo_update(adev, bo_va, false); @@ -965,7 +969,8 
>>> @@
>>> int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>>>  &timeline_chain);
>> Right before this here is a call to amdgpu_gem_update_timeline_node() which 
>> is incorrectly placed.
>>
>> That needs to come much earlier, above the switch (args->operation)
>>
>> Regards,
>> Christian.
>>
>>>  fence = amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
>>> -args->operation);
>>> +args->operation,
>>> +args->vm_timeline_syncobj_out);
>>>
>>>  if (!r)
>>>  amdgpu_gem_update_bo_mapping(filp, bo_va,

Re: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not released

2025-02-14 Thread Christian König

Adding Arvind, please make sure to keep him in the loop.

Am 14.02.25 um 11:07 schrieb Le Ma:
> On systems with CONFIG_SLUB_DEBUG enabled, the memleak like below
> will show up explicitly during driver unloading if created bo without
> drm_timeline object before.
>
> BUG drm_sched_fence (Tainted: G   OE ): Objects remaining in 
> drm_sched_fence on __kmem_cache_shutdown()
> 
> -
> Call Trace:
> 
> dump_stack_lvl+0x4c/0x70
> dump_stack+0x14/0x20
> slab_err+0xb0/0xf0
> ? srso_alias_return_thunk+0x5/0xfbef5
> ? flush_work+0x12/0x20
> ? srso_alias_return_thunk+0x5/0xfbef5
> __kmem_cache_shutdown+0x163/0x2e0
> kmem_cache_destroy+0x61/0x170
> drm_sched_fence_slab_fini+0x19/0x900
>
> Thus call dma_fence_put properly to avoid the memleak.
>
> v2: call dma_fence_put in amdgpu_gem_va_update_vm
>
> Signed-off-by: Le Ma 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> index 8b67aae6c2fe..00f1f34705c0 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
> @@ -759,7 +759,8 @@ static struct dma_fence *
>  amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>   struct amdgpu_vm *vm,
>   struct amdgpu_bo_va *bo_va,
> - uint32_t operation)
> + uint32_t operation,
> + uint32_t syncobj_handle)
>  {
>   struct dma_fence *fence = dma_fence_get_stub();
>   int r;
> @@ -771,6 +772,9 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>   if (r)
>   goto error;
>  
> + if (!syncobj_handle)
> + dma_fence_put(fence);
> +

Having that check inside amdgpu_gem_update_bo_mapping() was actually correct. 
Here it doesn't make much sense.

>   if (operation == AMDGPU_VA_OP_MAP ||
>   operation == AMDGPU_VA_OP_REPLACE) {
>   r = amdgpu_vm_bo_update(adev, bo_va, false);
> @@ -965,7 +969,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void 
> *data,
>   &timeline_chain);

Right before this here is a call to amdgpu_gem_update_timeline_node() which is 
incorrectly placed.

That needs to come much earlier, above the switch (args->operation)

Regards,
Christian.

>  
>   fence = amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
> - args->operation);
> + args->operation,
> + args->vm_timeline_syncobj_out);
>  
>   if (!r)
>   amdgpu_gem_update_bo_mapping(filp, bo_va,

[PATCH] drm/amd/pm: extend the gfxoff delay for compute workload

2025-02-14 Thread Kenneth Feng

extend the gfxoff delay for compute workload on smu 14.0.2/3
to fix the kfd test issue.

Signed-off-by: Kenneth Feng 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   |  3 +++
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 14 ++
 drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  1 +
 drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 15 +++
 drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  2 ++
 5 files changed, 35 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index b9bd6654f317..4ae6fde6c69c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -776,6 +776,9 @@ static void amdgpu_gfx_do_off_ctrl(struct amdgpu_device 
*adev, bool enable,
 {
unsigned long delay = GFX_OFF_DELAY_ENABLE;
 
+   if (amdgpu_dpm_need_extra_gfxoff_delay(adev))
+   delay *= 5;
+
if (!(adev->pm.pp_feature & PP_GFXOFF_MASK))
return;
 
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index 7a22aef6e59c..87de50b73a0e 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -873,6 +873,20 @@ int amdgpu_dpm_get_status_gfxoff(struct amdgpu_device 
*adev, uint32_t *value)
return ret;
 }
 
+bool amdgpu_dpm_need_extra_gfxoff_delay(struct amdgpu_device *adev)
+{
+   struct smu_context *smu = adev->powerplay.pp_handle;
+   bool ret = false;
+
+   if (is_support_sw_smu(adev)) {
+   mutex_lock(&adev->pm.mutex);
+   ret = smu_need_extra_gfxoff_delay(smu);
+   mutex_unlock(&adev->pm.mutex);
+   }
+
+   return ret;
+}
+
 uint64_t amdgpu_dpm_get_thermal_throttling_counter(struct amdgpu_device *adev)
 {
struct smu_context *smu = adev->powerplay.pp_handle;
diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h 
b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
index 1f5ac7e0230d..312ad348ce82 100644
--- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
+++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
@@ -470,6 +470,7 @@ int amdgpu_dpm_get_residency_gfxoff(struct amdgpu_device 
*adev, u32 *value);
 int amdgpu_dpm_set_residency_gfxoff(struct amdgpu_device *adev, bool value);
 int amdgpu_dpm_get_entrycount_gfxoff(struct amdgpu_device *adev, u64 *value);
 int amdgpu_dpm_get_status_gfxoff(struct amdgpu_device *adev, uint32_t *value);
+bool amdgpu_dpm_need_extra_gfxoff_delay(struct amdgpu_device *adev);
 uint64_t amdgpu_dpm_get_thermal_throttling_counter(struct amdgpu_device *adev);
 void amdgpu_dpm_gfx_state_change(struct amdgpu_device *adev,
 enum gfx_change_state state);
diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
index a1164912f674..61cd170ec30a 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
+++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
@@ -133,6 +133,21 @@ int smu_get_status_gfxoff(struct smu_context *smu, 
uint32_t *value)
return 0;
 }
 
+bool smu_need_extra_gfxoff_delay(struct smu_context *smu)
+{
+   bool ret = false;
+
+   if (!smu->pm_enabled)
+   return false;
+
+   if (((amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(14, 0, 
2)) ||
+   (amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(14, 0, 
3))) &&
+smu->workload_mask & (1 << PP_SMC_POWER_PROFILE_COMPUTE))
+   return true;
+
+   return ret;
+}
+
 int smu_set_soft_freq_range(struct smu_context *smu,
enum smu_clk_type clk_type,
uint32_t min,
diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
index 3630593bce61..82f06c2a752d 100644
--- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
+++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
@@ -1626,6 +1626,8 @@ int smu_set_residency_gfxoff(struct smu_context *smu, 
bool value);
 
 int smu_get_status_gfxoff(struct smu_context *smu, uint32_t *value);
 
+bool smu_need_extra_gfxoff_delay(struct smu_context *smu);
+
 int smu_handle_passthrough_sbr(struct smu_context *smu, bool enable);
 
 int smu_wait_for_event(struct smu_context *smu, enum smu_event_type event,
-- 
2.34.1

RE: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not released

2025-02-14 Thread YuanShang Mao (River)

[AMD Official Use Only - AMD Internal Distribution Only]

Better to put the fence outside amdgpu_gem_va_update_vm. Since it is passed to 
the caller, and the caller must keep one reference at least until this fence is 
no longer needed.

Thanks
River

-Original Message-
From: amd-gfx  On Behalf Of Yadav, Arvind
Sent: Friday, February 14, 2025 7:42 PM
To: Koenig, Christian ; Ma, Le ; 
amd-gfx@lists.freedesktop.org; Yadav, Arvind 
Cc: Zhang, Hawking ; Lazar, Lijo 
Subject: Re: [PATCH v2] drm/amdgpu: fix the memleak caused by fence not released


On 2/14/2025 4:08 PM, Christian König wrote:
> Adding Arvind, please make sure to keep him in the loop.
>
> Am 14.02.25 um 11:07 schrieb Le Ma:
>> On systems with CONFIG_SLUB_DEBUG enabled, the memleak like below
>> will show up explicitly during driver unloading if created bo without
>> drm_timeline object before.
>>
>>  BUG drm_sched_fence (Tainted: G   OE ): Objects remaining 
>> in drm_sched_fence on __kmem_cache_shutdown()
>>  
>> -
>>  Call Trace:
>>  
>>  dump_stack_lvl+0x4c/0x70
>>  dump_stack+0x14/0x20
>>  slab_err+0xb0/0xf0
>>  ? srso_alias_return_thunk+0x5/0xfbef5
>>  ? flush_work+0x12/0x20
>>  ? srso_alias_return_thunk+0x5/0xfbef5
>>  __kmem_cache_shutdown+0x163/0x2e0
>>  kmem_cache_destroy+0x61/0x170
>>  drm_sched_fence_slab_fini+0x19/0x900
>>
>> Thus call dma_fence_put properly to avoid the memleak.
>>
>> v2: call dma_fence_put in amdgpu_gem_va_update_vm
>>
>> Signed-off-by: Le Ma 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +++--
>>   1 file changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> index 8b67aae6c2fe..00f1f34705c0 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
>> @@ -759,7 +759,8 @@ static struct dma_fence *
>>   amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>>  struct amdgpu_vm *vm,
>>  struct amdgpu_bo_va *bo_va,
>> -uint32_t operation)
>> +uint32_t operation,
>> +uint32_t syncobj_handle)
>>   {
>>  struct dma_fence *fence = dma_fence_get_stub();
>>  int r;
>> @@ -771,6 +772,9 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
>>  if (r)
>>  goto error;
>>
>> +if (!syncobj_handle)
>> +dma_fence_put(fence);
>> +
> Having that check inside amdgpu_gem_update_bo_mapping() was actually correct. 
> Here it doesn't make much sense.

Agreed,

Regards,
~Arvind

>
>>  if (operation == AMDGPU_VA_OP_MAP ||
>>  operation == AMDGPU_VA_OP_REPLACE) {
>>  r = amdgpu_vm_bo_update(adev, bo_va, false); @@ -965,7 +969,8 @@
>> int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
>>  &timeline_chain);
> Right before this here is a call to amdgpu_gem_update_timeline_node() which 
> is incorrectly placed.
>
> That needs to come much earlier, above the switch (args->operation)
>
> Regards,
> Christian.
>
>>
>>  fence = amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
>> -args->operation);
>> +args->operation,
>> +args->vm_timeline_syncobj_out);
>>
>>  if (!r)
>>  amdgpu_gem_update_bo_mapping(filp, bo_va,

[PATCH v2 06/12] drm/amdgpu: add RAS CPER ring buffer

2025-02-14 Thread Xiang Liu

From: Tao Zhou 

And initialize it, this is a pure software ring to store RAS CPER data.

v2: update the initialization of count_dw of cper ring, it's dword
variable.

Signed-off-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c   | 39 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   | 29 ++--
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |  1 +
 drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c |  3 +-
 5 files changed, 57 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
index f82aa12a88f4..cef7c1ec0d7c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
@@ -365,6 +365,39 @@ int amdgpu_cper_generate_ce_records(struct amdgpu_device 
*adev,
return 0;
 }
 
+static u64 amdgpu_cper_ring_get_rptr(struct amdgpu_ring *ring)
+{
+   return *(ring->rptr_cpu_addr);
+}
+
+static u64 amdgpu_cper_ring_get_wptr(struct amdgpu_ring *ring)
+{
+   return ring->wptr;
+}
+
+static const struct amdgpu_ring_funcs cper_ring_funcs = {
+   .type = AMDGPU_RING_TYPE_CPER,
+   .align_mask = 0xff,
+   .support_64bit_ptrs = false,
+   .get_rptr = amdgpu_cper_ring_get_rptr,
+   .get_wptr = amdgpu_cper_ring_get_wptr,
+};
+
+static int amdgpu_cper_ring_init(struct amdgpu_device *adev)
+{
+   struct amdgpu_ring *ring = &(adev->cper.ring_buf);
+
+   ring->adev = NULL;
+   ring->ring_obj = NULL;
+   ring->use_doorbell = false;
+   ring->no_scheduler = true;
+   ring->funcs = &cper_ring_funcs;
+
+   sprintf(ring->name, "cper");
+   return amdgpu_ring_init(adev, ring, PAGE_SIZE, NULL, 0,
+   AMDGPU_RING_PRIO_DEFAULT, NULL);
+}
+
 int amdgpu_cper_init(struct amdgpu_device *adev)
 {
mutex_init(&adev->cper.cper_lock);
@@ -372,16 +405,14 @@ int amdgpu_cper_init(struct amdgpu_device *adev)
adev->cper.enabled = true;
adev->cper.max_count = CPER_MAX_ALLOWED_COUNT;
 
-   /*TODO: initialize cper ring*/
-
-   return 0;
+   return amdgpu_cper_ring_init(adev);
 }
 
 int amdgpu_cper_fini(struct amdgpu_device *adev)
 {
adev->cper.enabled = false;
 
-   /*TODO: free cper ring */
+   amdgpu_ring_fini(&(adev->cper.ring_buf));
adev->cper.count = 0;
adev->cper.wptr = 0;
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
index 6860a809f2f5..80c8571cff9d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
@@ -62,6 +62,7 @@ struct amdgpu_cper {
uint32_t wptr;
 
void *ring[CPER_MAX_ALLOWED_COUNT];
+   struct amdgpu_ring ring_buf;
 };
 
 void amdgpu_cper_entry_fill_hdr(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index cfbc18c12113..005cdaee9987 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -324,20 +324,27 @@ int amdgpu_ring_init(struct amdgpu_device *adev, struct 
amdgpu_ring *ring,
/* always set cond_exec_polling to CONTINUE */
*ring->cond_exe_cpu_addr = 1;
 
-   r = amdgpu_fence_driver_start_ring(ring, irq_src, irq_type);
-   if (r) {
-   dev_err(adev->dev, "failed initializing fences (%d).\n", r);
-   return r;
-   }
+   if (ring->funcs->type != AMDGPU_RING_TYPE_CPER) {
+   r = amdgpu_fence_driver_start_ring(ring, irq_src, irq_type);
+   if (r) {
+   dev_err(adev->dev, "failed initializing fences 
(%d).\n", r);
+   return r;
+   }
 
-   max_ibs_dw = ring->funcs->emit_frame_size +
-amdgpu_ring_max_ibs(ring->funcs->type) * 
ring->funcs->emit_ib_size;
-   max_ibs_dw = (max_ibs_dw + ring->funcs->align_mask) & 
~ring->funcs->align_mask;
+   max_ibs_dw = ring->funcs->emit_frame_size +
+amdgpu_ring_max_ibs(ring->funcs->type) * 
ring->funcs->emit_ib_size;
+   max_ibs_dw = (max_ibs_dw + ring->funcs->align_mask) & 
~ring->funcs->align_mask;
 
-   if (WARN_ON(max_ibs_dw > max_dw))
-   max_dw = max_ibs_dw;
+   if (WARN_ON(max_ibs_dw > max_dw))
+   max_dw = max_ibs_dw;
 
-   ring->ring_size = roundup_pow_of_two(max_dw * 4 * sched_hw_submission);
+   ring->ring_size = roundup_pow_of_two(max_dw * 4 * 
sched_hw_submission);
+   } else {
+   ring->ring_size = roundup_pow_of_two(max_dw * 4);
+   ring->count_dw = (ring->ring_size - 4) >> 2;
+   /* ring buffer is empty now */
+   ring->wptr = *ring->rptr_cpu_addr = 0;
+   }
 
ring->buf_mask = (ring->ring_size / 4) - 1;
ring->ptr_mask = ring->fu

[PATCH v2 02/12] drm/amdgpu: Introduce funcs for populating CPER

2025-02-14 Thread Xiang Liu

From: Hawking Zhang 

Introduce utility functions designed to assist
in populating CPER records.

v2: call cper_init/fini in device_ip_init/fini.

Signed-off-by: Hawking Zhang 
Reviewed-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/Makefile|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c   | 281 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h   |  91 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   4 +
 5 files changed, 381 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h

diff --git a/drivers/gpu/drm/amd/amdgpu/Makefile 
b/drivers/gpu/drm/amd/amdgpu/Makefile
index 04247303b3cf..84bb3dfa39a9 100644
--- a/drivers/gpu/drm/amd/amdgpu/Makefile
+++ b/drivers/gpu/drm/amd/amdgpu/Makefile
@@ -66,7 +66,7 @@ amdgpu-y += amdgpu_device.o amdgpu_doorbell_mgr.o 
amdgpu_kms.o \
amdgpu_fw_attestation.o amdgpu_securedisplay.o \
amdgpu_eeprom.o amdgpu_mca.o amdgpu_psp_ta.o amdgpu_lsdma.o \
amdgpu_ring_mux.o amdgpu_xcp.o amdgpu_seq64.o amdgpu_aca.o 
amdgpu_dev_coredump.o \
-   amdgpu_userq_fence.o amdgpu_eviction_fence.o
+   amdgpu_userq_fence.o amdgpu_eviction_fence.o amdgpu_cper.o
 
 amdgpu-$(CONFIG_PROC_FS) += amdgpu_fdinfo.o
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
index dc1f8d6fd0c4..db0a26800927 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
@@ -110,6 +110,7 @@
 #include "amdgpu_mca.h"
 #include "amdgpu_aca.h"
 #include "amdgpu_ras.h"
+#include "amdgpu_cper.h"
 #include "amdgpu_xcp.h"
 #include "amdgpu_seq64.h"
 #include "amdgpu_reg_state.h"
@@ -1128,6 +1129,9 @@ struct amdgpu_device {
/* ACA */
struct amdgpu_aca   aca;
 
+   /* CPER */
+   struct amdgpu_cper  cper;
+
struct amdgpu_ip_block  ip_blocks[AMDGPU_MAX_IP_NUM];
uint32_tharvest_ip_mask;
int num_ip_blocks;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
new file mode 100644
index ..8ce5dc6efcf9
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
@@ -0,0 +1,281 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright 2025 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#include "amdgpu.h"
+
+static const guid_t MCE= CPER_NOTIFY_MCE;
+static const guid_t CMC= CPER_NOTIFY_CMC;
+static const guid_t BOOT   = BOOT_TYPE;
+
+static const guid_t CRASHDUMP  = AMD_CRASHDUMP;
+static const guid_t RUNTIME= AMD_GPU_NONSTANDARD_ERROR;
+
+static void __inc_entry_length(struct cper_hdr *hdr, uint32_t size)
+{
+   hdr->record_length += size;
+}
+
+void amdgpu_cper_entry_fill_hdr(struct amdgpu_device *adev,
+   struct cper_hdr *hdr,
+   enum amdgpu_cper_type type,
+   enum cper_error_severity sev)
+{
+   hdr->signature[0]   = 'C';
+   hdr->signature[1]   = 'P';
+   hdr->signature[2]   = 'E';
+   hdr->signature[3]   = 'R';
+   hdr->revision   = CPER_HDR_REV_1;
+   hdr->signature_end  = 0x;
+   hdr->error_severity = sev;
+
+   hdr->valid_bits.platform_id = 1;
+   hdr->valid_bits.partition_id= 1;
+   hdr->valid_bits.timestamp   = 1;
+   /*TODO need to initialize hdr->timestamp */
+
+   snprintf(hdr->record_id, 8, "%d", 
atomic_inc_return(&adev->cper.unique_id));
+   snprintf(hdr->platform_id, 16, "0x%04X:0x%04X",
+adev->pdev->vendor, adev->pdev->device);
+   /* pmfw v

[PATCH v2 00/12] Generate CPER records for RAS and commit to CPER ring

2025-02-14 Thread Xiang Liu

This patch series generate RAS CPER records for UE/DE/CE/BP threshold exceed
event. SMU_TYPE_CE banks are combined into 1 CPER entry, they could be CEs or
DEs or both. UEs and BPs are encoded into separate CPER entries.

RAS CPER records for CEs will be generated only after CEs count been queried.

All records are committed to a pure software ring with a limit size, new records
will flush older records when overflow happened. User can access the records by
reading debugfs node, which is read-only.

Hawking Zhang (5):
  drm/amd/include: Add amd cper header
  drm/amdgpu: Introduce funcs for populating CPER
  drm/amdgpu: Include ACA error type in aca bank
  drm/amdgpu: Introduce funcs for generating cper record
  drm/amdgpu: Generate cper records

Tao Zhou (4):
  drm/amdgpu: add RAS CPER ring buffer
  drm/amdgpu: read CPER ring via debugfs
  drm/amdgpu: add data write function for CPER ring
  drm/amdgpu: add mutex lock for cper ring

Xiang Liu (3):
  drm/amdgpu: Get timestamp from system time
  drm/amdgpu: Commit CPER entry
  drm/amdgpu: Generate bad page threshold cper records

 drivers/gpu/drm/amd/amdgpu/Makefile|   2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu.h|   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c|  46 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h|  16 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c   | 559 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h   | 104 
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |   4 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c   |  91 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h   |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   |   2 +
 drivers/gpu/drm/amd/amdgpu/aqua_vanjaram.c |   3 +-
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c|   2 +
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c   |   2 +
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c|   2 +
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c   |   2 +
 drivers/gpu/drm/amd/amdgpu/umc_v12_0.c |   1 +
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c|   2 +
 drivers/gpu/drm/amd/include/amd_cper.h | 269 ++
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c|   3 +
 19 files changed, 1075 insertions(+), 40 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
 create mode 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
 create mode 100644 drivers/gpu/drm/amd/include/amd_cper.h

-- 
2.34.1

[PATCH v2 01/12] drm/amd/include: Add amd cper header

2025-02-14 Thread Xiang Liu

From: Hawking Zhang 

AMD is using Common Platform Error Record (CPER) format
to report all gpu hardware errors.

v2: add program attribute

Signed-off-by: Hawking Zhang 
Signed-off-by: Xiang Liu 
Reviewed-by: Tao Zhou 
---
 drivers/gpu/drm/amd/include/amd_cper.h | 269 +
 1 file changed, 269 insertions(+)
 create mode 100644 drivers/gpu/drm/amd/include/amd_cper.h

diff --git a/drivers/gpu/drm/amd/include/amd_cper.h 
b/drivers/gpu/drm/amd/include/amd_cper.h
new file mode 100644
index ..086869264425
--- /dev/null
+++ b/drivers/gpu/drm/amd/include/amd_cper.h
@@ -0,0 +1,269 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright 2025 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE COPYRIGHT HOLDER(S) OR AUTHOR(S) BE LIABLE FOR ANY CLAIM, DAMAGES OR
+ * OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE,
+ * ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR
+ * OTHER DEALINGS IN THE SOFTWARE.
+ *
+ */
+#ifndef __AMD_CPER_H__
+#define __AMD_CPER_H__
+
+#include 
+
+#define CPER_HDR_REV_1  (0x100)
+#define CPER_SEC_MINOR_REV_1(0x01)
+#define CPER_SEC_MAJOR_REV_22   (0x22)
+#define CPER_MAX_OAM_COUNT  (8)
+
+#define CPER_CTX_TYPE_CRASH (1)
+#define CPER_CTX_TYPE_BOOT  (9)
+
+#define CPER_CREATOR_ID_AMDGPU "amdgpu"
+
+#define CPER_NOTIFY_MCE   \
+   GUID_INIT(0xE8F56FFE, 0x919C, 0x4cc5, 0xBA, 0x88, 0x65, 0xAB, \
+ 0xE1, 0x49, 0x13, 0xBB)
+#define CPER_NOTIFY_CMC   \
+   GUID_INIT(0x2DCE8BB1, 0xBDD7, 0x450e, 0xB9, 0xAD, 0x9C, 0xF4, \
+ 0xEB, 0xD4, 0xF8, 0x90)
+#define BOOT_TYPE \
+   GUID_INIT(0x3D61A466, 0xAB40, 0x409a, 0xA6, 0x98, 0xF3, 0x62, \
+ 0xD4, 0x64, 0xB3, 0x8F)
+
+#define AMD_CRASHDUMP \
+   GUID_INIT(0x32AC0C78, 0x2623, 0x48F6, 0xB0, 0xD0, 0x73, 0x65, \
+ 0x72, 0x5F, 0xD6, 0xAE)
+#define AMD_GPU_NONSTANDARD_ERROR \
+   GUID_INIT(0x32AC0C78, 0x2623, 0x48F6, 0x81, 0xA2, 0xAC, 0x69, \
+ 0x17, 0x80, 0x55, 0x1D)
+#define PROC_ERR_SECTION_TYPE \
+   GUID_INIT(0xDC3EA0B0, 0xA144, 0x4797, 0xB9, 0x5B, 0x53, 0xFA, \
+ 0x24, 0x2B, 0x6E, 0x1D)
+
+enum cper_error_severity {
+   CPER_SEV_NON_FATAL_UNCORRECTED = 0,
+   CPER_SEV_FATAL = 1,
+   CPER_SEV_NON_FATAL_CORRECTED   = 2,
+   CPER_SEV_NUM   = 3,
+
+   CPER_SEV_UNUSED = 10,
+};
+
+enum cper_aca_reg {
+   CPER_ACA_REG_CTL_LO= 0,
+   CPER_ACA_REG_CTL_HI= 1,
+   CPER_ACA_REG_STATUS_LO = 2,
+   CPER_ACA_REG_STATUS_HI = 3,
+   CPER_ACA_REG_ADDR_LO   = 4,
+   CPER_ACA_REG_ADDR_HI   = 5,
+   CPER_ACA_REG_MISC0_LO  = 6,
+   CPER_ACA_REG_MISC0_HI  = 7,
+   CPER_ACA_REG_CONFIG_LO = 8,
+   CPER_ACA_REG_CONFIG_HI = 9,
+   CPER_ACA_REG_IPID_LO   = 10,
+   CPER_ACA_REG_IPID_HI   = 11,
+   CPER_ACA_REG_SYND_LO   = 12,
+   CPER_ACA_REG_SYND_HI   = 13,
+
+   CPER_ACA_REG_COUNT = 32,
+};
+
+#pragma pack(push, 1)
+
+struct cper_timestamp {
+   uint8_t seconds;
+   uint8_t minutes;
+   uint8_t hours;
+   uint8_t flag;
+   uint8_t day;
+   uint8_t month;
+   uint8_t year;
+   uint8_t century;
+};
+
+struct cper_hdr {
+   char signature[4];  /* "CPER"  */
+   uint16_t revision;
+   uint32_t signature_end; /* 0x */
+   uint16_t sec_cnt;
+   enum cper_error_severity error_severity;
+   union {
+   struct {
+   uint32_t platform_id: 1;
+   uint32_t timestamp  : 1;
+   uint32_t partition_id   : 1;
+   uint32_t reserved   : 29;
+   } valid_bits;
+   uint32_t valid_mask;
+   };
+   uint32_trecord_length;

[PATCH v2 09/12] drm/amdgpu: add mutex lock for cper ring

2025-02-14 Thread Xiang Liu

From: Tao Zhou 

Avoid the confliction between read and write of ring buffer.

Signed-off-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h |  1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 21 -
 3 files changed, 21 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
index 64624b8b0cbc..c14742eb4d67 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
@@ -423,6 +423,7 @@ void amdgpu_cper_ring_write(struct amdgpu_ring *ring,
 
wptr_old = ring->wptr;
 
+   mutex_lock(&ring->adev->cper.ring_lock);
while (count) {
ent_sz = amdgpu_cper_ring_get_ent_sz(ring, ring->wptr);
chunk = min(ent_sz, count);
@@ -451,6 +452,7 @@ void amdgpu_cper_ring_write(struct amdgpu_ring *ring,
pos = rptr;
} while (!amdgpu_cper_is_hdr(ring, rptr));
}
+   mutex_unlock(&ring->adev->cper.ring_lock);
 
if (ring->count_dw >= (count >> 2))
ring->count_dw - (count >> 2);
@@ -480,6 +482,8 @@ static int amdgpu_cper_ring_init(struct amdgpu_device *adev)
 {
struct amdgpu_ring *ring = &(adev->cper.ring_buf);
 
+   mutex_init(&adev->cper.ring_lock);
+
ring->adev = NULL;
ring->ring_obj = NULL;
ring->use_doorbell = false;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
index 1fa41858f22e..527835cbf0d3 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
@@ -63,6 +63,7 @@ struct amdgpu_cper {
 
void *ring[CPER_MAX_ALLOWED_COUNT];
struct amdgpu_ring ring_buf;
+   struct mutex ring_lock;
 };
 
 void amdgpu_cper_entry_fill_hdr(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 510fe1ad0628..5293eef4f0dd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -510,13 +510,18 @@ static ssize_t amdgpu_debugfs_ring_read(struct file *f, 
char __user *buf,
result = 0;
 
if (*pos < 12) {
+   if (ring->funcs->type == AMDGPU_RING_TYPE_CPER)
+   mutex_lock(&ring->adev->cper.ring_lock);
+
early[0] = amdgpu_ring_get_rptr(ring) & ring->buf_mask;
early[1] = amdgpu_ring_get_wptr(ring) & ring->buf_mask;
early[2] = ring->wptr & ring->buf_mask;
for (i = *pos / 4; i < 3 && size; i++) {
r = put_user(early[i], (uint32_t *)buf);
-   if (r)
-   return r;
+   if (r) {
+   result = r;
+   goto out;
+   }
buf += 4;
result += 4;
size -= 4;
@@ -547,12 +552,14 @@ static ssize_t amdgpu_debugfs_ring_read(struct file *f, 
char __user *buf,
 
while (size) {
if (p == early[1])
-   return result;
+   goto out;
 
value = ring->ring[p];
r = put_user(value, (uint32_t *)buf);
-   if (r)
-   return r;
+   if (r) {
+   result = r;
+   goto out;
+   }
 
buf += 4;
result += 4;
@@ -562,6 +569,10 @@ static ssize_t amdgpu_debugfs_ring_read(struct file *f, 
char __user *buf,
}
}
 
+out:
+   if (ring->funcs->type == AMDGPU_RING_TYPE_CPER)
+   mutex_unlock(&ring->adev->cper.ring_lock);
+
return result;
 }
 
-- 
2.34.1

[PATCH v2 08/12] drm/amdgpu: add data write function for CPER ring

2025-02-14 Thread Xiang Liu

From: Tao Zhou 

Old CPER data will be overwritten if ring buffer is full, and read
pointer always points to CPER header.

Signed-off-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 93 
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h |  2 +
 2 files changed, 95 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
index cef7c1ec0d7c..64624b8b0cbc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
@@ -365,6 +365,99 @@ int amdgpu_cper_generate_ce_records(struct amdgpu_device 
*adev,
return 0;
 }
 
+static bool amdgpu_cper_is_hdr(struct amdgpu_ring *ring, u64 pos)
+{
+   struct cper_hdr *chdr;
+
+   chdr = (struct cper_hdr *)&(ring->ring[pos]);
+   return strcmp(chdr->signature, "CPER") ? false : true;
+}
+
+static u32 amdgpu_cper_ring_get_ent_sz(struct amdgpu_ring *ring, u64 pos)
+{
+   struct cper_hdr *chdr;
+   u64 p;
+   u32 chunk, rec_len = 0;
+
+   chdr = (struct cper_hdr *)&(ring->ring[pos]);
+   chunk = ring->ring_size - (pos << 2);
+
+   if (!strcmp(chdr->signature, "CPER")) {
+   rec_len = chdr->record_length;
+   goto calc;
+   }
+
+   /* ring buffer is not full, no cper data after ring->wptr */
+   if (ring->count_dw)
+   goto calc;
+
+   for (p = pos + 1; p <= ring->buf_mask; p++) {
+   chdr = (struct cper_hdr *)&(ring->ring[p]);
+   if (!strcmp(chdr->signature, "CPER")) {
+   rec_len = (p - pos) << 2;
+   goto calc;
+   }
+   }
+
+calc:
+   if (!rec_len)
+   return chunk;
+   else
+   return min(rec_len, chunk);
+}
+
+void amdgpu_cper_ring_write(struct amdgpu_ring *ring,
+ void *src, int count)
+{
+   u64 pos, wptr_old, rptr = *ring->rptr_cpu_addr & ring->ptr_mask;
+   u32 chunk, ent_sz;
+   u8 *s = (u8 *)src;
+
+   if (count >= ring->ring_size - 4) {
+   dev_err(ring->adev->dev,
+   "CPER data size(%d) is larger than ring size(%d)\n",
+   count, ring->ring_size - 4);
+
+   return;
+   }
+
+   wptr_old = ring->wptr;
+
+   while (count) {
+   ent_sz = amdgpu_cper_ring_get_ent_sz(ring, ring->wptr);
+   chunk = min(ent_sz, count);
+
+   memcpy(&ring->ring[ring->wptr], s, chunk);
+
+   ring->wptr += (chunk >> 2);
+   ring->wptr &= ring->ptr_mask;
+   count -= chunk;
+   s += chunk;
+   }
+
+   /* the buffer is overflow, adjust rptr */
+   if (((wptr_old < rptr) && (rptr <= ring->wptr)) ||
+   ((ring->wptr < wptr_old) && (wptr_old < rptr)) ||
+   ((rptr <= ring->wptr) && (ring->wptr < wptr_old))) {
+   pos = (ring->wptr + 1) & ring->ptr_mask;
+
+   do {
+   ent_sz = amdgpu_cper_ring_get_ent_sz(ring, pos);
+
+   rptr += (ent_sz >> 2);
+   rptr &= ring->ptr_mask;
+   *ring->rptr_cpu_addr = rptr;
+
+   pos = rptr;
+   } while (!amdgpu_cper_is_hdr(ring, rptr));
+   }
+
+   if (ring->count_dw >= (count >> 2))
+   ring->count_dw - (count >> 2);
+   else
+   ring->count_dw = 0;
+}
+
 static u64 amdgpu_cper_ring_get_rptr(struct amdgpu_ring *ring)
 {
return *(ring->rptr_cpu_addr);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
index 80c8571cff9d..1fa41858f22e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
@@ -93,6 +93,8 @@ int amdgpu_cper_generate_ue_record(struct amdgpu_device *adev,
 int amdgpu_cper_generate_ce_records(struct amdgpu_device *adev,
struct aca_banks *banks,
uint16_t bank_count);
+void amdgpu_cper_ring_write(struct amdgpu_ring *ring,
+   void *src, int count);
 int amdgpu_cper_init(struct amdgpu_device *adev);
 int amdgpu_cper_fini(struct amdgpu_device *adev);
 
-- 
2.34.1

[PATCH v2 11/12] drm/amdgpu: Commit CPER entry

2025-02-14 Thread Xiang Liu

Commit the CPER entry to the ring buffer.

Signed-off-by: Xiang Liu 
Reviewed-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
index 0bdc08fba3b1..00f953ed6740 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
@@ -280,6 +280,7 @@ int amdgpu_cper_generate_ue_record(struct amdgpu_device 
*adev,
 {
struct cper_hdr *fatal = NULL;
struct cper_sec_crashdump_reg_data reg_data = { 0 };
+   struct amdgpu_ring *ring = &adev->cper.ring_buf;
int ret;
 
fatal = amdgpu_cper_alloc_entry(adev, AMDGPU_CPER_TYPE_FATAL, 1);
@@ -302,7 +303,7 @@ int amdgpu_cper_generate_ue_record(struct amdgpu_device 
*adev,
if (ret)
return ret;
 
-   /*TODO: commit the cper entry to cper ring */
+   amdgpu_cper_ring_write(ring, fatal, fatal->record_length);
 
return 0;
 }
@@ -329,6 +330,7 @@ int amdgpu_cper_generate_ce_records(struct amdgpu_device 
*adev,
 {
struct cper_hdr *corrected = NULL;
enum cper_error_severity sev = CPER_SEV_NON_FATAL_CORRECTED;
+   struct amdgpu_ring *ring = &adev->cper.ring_buf;
uint32_t reg_data[CPER_ACA_REG_COUNT] = { 0 };
struct aca_bank_node *node;
struct aca_bank *bank;
@@ -377,7 +379,7 @@ int amdgpu_cper_generate_ce_records(struct amdgpu_device 
*adev,
return ret;
}
 
-   /*TODO: commit the cper entry to cper ring */
+   amdgpu_cper_ring_write(ring, corrected, corrected->record_length);
 
return 0;
 }
-- 
2.34.1

[PATCH v2 04/12] drm/amdgpu: Introduce funcs for generating cper record

2025-02-14 Thread Xiang Liu

From: Hawking Zhang 

Introduce new functions that are used to generate
cper ue or ce records.

v2: return -ENOMEM instead of false
v2: check return value of fill section function

Signed-off-by: Hawking Zhang 
Signed-off-by: Xiang Liu 
Reviewed-by: Yang Wang 
Reviewed-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c  |  12 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h  |  12 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 108 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h |   9 +-
 4 files changed, 128 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index 1a26b8ad14cb..ed1c20bd8114 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
@@ -30,16 +30,6 @@
 
 typedef int bank_handler_t(struct aca_handle *handle, struct aca_bank *bank, 
enum aca_smu_type type, void *data);
 
-struct aca_banks {
-   int nr_banks;
-   struct list_head list;
-};
-
-struct aca_hwip {
-   int hwid;
-   int mcatype;
-};
-
 static struct aca_hwip aca_hwid_mcatypes[ACA_HWIP_TYPE_COUNT] = {
ACA_BANK_HWID(SMU,  0x01,   0x01),
ACA_BANK_HWID(PCS_XGMI, 0x50,   0x00),
@@ -111,7 +101,7 @@ static struct aca_regs_dump {
{"STATUS",  ACA_REG_IDX_STATUS},
{"ADDR",ACA_REG_IDX_ADDR},
{"MISC",ACA_REG_IDX_MISC0},
-   {"CONFIG",  ACA_REG_IDX_CONFG},
+   {"CONFIG",  ACA_REG_IDX_CONFIG},
{"IPID",ACA_REG_IDX_IPID},
{"SYND",ACA_REG_IDX_SYND},
{"DESTAT",  ACA_REG_IDX_DESTAT},
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h
index 3cd0115b0244..b84a3489b116 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h
@@ -81,7 +81,7 @@ enum aca_reg_idx {
ACA_REG_IDX_STATUS  = 1,
ACA_REG_IDX_ADDR= 2,
ACA_REG_IDX_MISC0   = 3,
-   ACA_REG_IDX_CONFG   = 4,
+   ACA_REG_IDX_CONFIG  = 4,
ACA_REG_IDX_IPID= 5,
ACA_REG_IDX_SYND= 6,
ACA_REG_IDX_DESTAT  = 8,
@@ -114,6 +114,11 @@ enum aca_smu_type {
ACA_SMU_TYPE_COUNT,
 };
 
+struct aca_hwip {
+   int hwid;
+   int mcatype;
+};
+
 struct aca_bank {
enum aca_error_type aca_err_type;
enum aca_smu_type smu_err_type;
@@ -125,6 +130,11 @@ struct aca_bank_node {
struct list_head node;
 };
 
+struct aca_banks {
+   int nr_banks;
+   struct list_head list;
+};
+
 struct aca_bank_info {
int die_id;
int socket_id;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
index 8ce5dc6efcf9..f82aa12a88f4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
@@ -21,6 +21,7 @@
  * OTHER DEALINGS IN THE SOFTWARE.
  *
  */
+#include 
 #include "amdgpu.h"
 
 static const guid_t MCE= CPER_NOTIFY_MCE;
@@ -257,6 +258,113 @@ struct cper_hdr *amdgpu_cper_alloc_entry(struct 
amdgpu_device *adev,
return hdr;
 }
 
+int amdgpu_cper_generate_ue_record(struct amdgpu_device *adev,
+  struct aca_bank *bank)
+{
+   struct cper_hdr *fatal = NULL;
+   struct cper_sec_crashdump_reg_data reg_data = { 0 };
+   int ret;
+
+   fatal = amdgpu_cper_alloc_entry(adev, AMDGPU_CPER_TYPE_FATAL, 1);
+   if (!fatal) {
+   dev_err(adev->dev, "fail to alloc cper entry for ue record\n");
+   return -ENOMEM;
+   }
+
+   reg_data.status_lo = lower_32_bits(bank->regs[ACA_REG_IDX_STATUS]);
+   reg_data.status_hi = upper_32_bits(bank->regs[ACA_REG_IDX_STATUS]);
+   reg_data.addr_lo   = lower_32_bits(bank->regs[ACA_REG_IDX_ADDR]);
+   reg_data.addr_hi   = upper_32_bits(bank->regs[ACA_REG_IDX_ADDR]);
+   reg_data.ipid_lo   = lower_32_bits(bank->regs[ACA_REG_IDX_IPID]);
+   reg_data.ipid_hi   = upper_32_bits(bank->regs[ACA_REG_IDX_IPID]);
+   reg_data.synd_lo   = lower_32_bits(bank->regs[ACA_REG_IDX_SYND]);
+   reg_data.synd_hi   = upper_32_bits(bank->regs[ACA_REG_IDX_SYND]);
+
+   amdgpu_cper_entry_fill_hdr(adev, fatal, AMDGPU_CPER_TYPE_FATAL, 
CPER_SEV_FATAL);
+   ret = amdgpu_cper_entry_fill_fatal_section(adev, fatal, 0, reg_data);
+   if (ret)
+   return ret;
+
+   /*TODO: commit the cper entry to cper ring */
+
+   return 0;
+}
+
+static enum cper_error_severity amdgpu_aca_err_type_to_cper_sev(struct 
amdgpu_device *adev,
+   enum 
aca_error_type aca_err_type)
+{
+   switch (aca_err_type) {
+   case ACA_ERROR_TYPE_UE:
+   return CPER_SEV_FATAL;
+   case ACA_ERROR_TYPE_CE:
+

[PATCH v2 03/12] drm/amdgpu: Include ACA error type in aca bank

2025-02-14 Thread Xiang Liu

From: Hawking Zhang 

ACA error types managed by driver a direct 1:1
correspondence with those managed by firmware.

To address this, for each ACA bank, include
both the ACA error type and the ACA SMU type.

This addition is useful for creating CPER records.

Signed-off-by: Hawking Zhang 
Reviewed-by: Yang Wang 
Reviewed-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c  | 2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h  | 4 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c  | 2 ++
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c  | 2 ++
 drivers/gpu/drm/amd/amdgpu/sdma_v4_4_2.c | 2 ++
 drivers/gpu/drm/amd/amdgpu/umc_v12_0.c   | 1 +
 drivers/gpu/drm/amd/amdgpu/vcn_v4_0_3.c  | 2 ++
 9 files changed, 17 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index 9d6345146495..1a26b8ad14cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
@@ -168,7 +168,7 @@ static int aca_smu_get_valid_aca_banks(struct amdgpu_device 
*adev, enum aca_smu_
if (ret)
return ret;
 
-   bank.type = type;
+   bank.smu_err_type = type;
 
aca_smu_bank_dump(adev, i, count, &bank, qctx);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h
index f3289d289913..3cd0115b0244 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.h
@@ -108,13 +108,15 @@ enum aca_error_type {
 };
 
 enum aca_smu_type {
+   ACA_SMU_TYPE_INVALID = -1,
ACA_SMU_TYPE_UE = 0,
ACA_SMU_TYPE_CE,
ACA_SMU_TYPE_COUNT,
 };
 
 struct aca_bank {
-   enum aca_smu_type type;
+   enum aca_error_type aca_err_type;
+   enum aca_smu_type smu_err_type;
u64 regs[ACA_MAX_REGS_COUNT];
 };
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index aecbe52a4f5c..94f306c0b706 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -1129,10 +1129,12 @@ static int xgmi_v6_4_0_aca_bank_parser(struct 
aca_handle *handle, struct aca_ban
if (ext_error_code != 0 && ext_error_code != 9)
count = 0ULL;
 
+   bank->aca_err_type = ACA_ERROR_TYPE_UE;
ret = aca_error_cache_log_bank_error(handle, &info, 
ACA_ERROR_TYPE_UE, count);
break;
case ACA_SMU_TYPE_CE:
count = ext_error_code == 6 ? count : 0ULL;
+   bank->aca_err_type = ACA_SMU_TYPE_CE;
ret = aca_error_cache_log_bank_error(handle, &info, 
ACA_ERROR_TYPE_CE, count);
break;
default:
diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
index aed05f3daeeb..d54b2261305b 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_4_3.c
@@ -893,10 +893,12 @@ static int gfx_v9_4_3_aca_bank_parser(struct aca_handle 
*handle,
 
switch (type) {
case ACA_SMU_TYPE_UE:
+   bank->aca_err_type = ACA_ERROR_TYPE_UE;
ret = aca_error_cache_log_bank_error(handle, &info,
 ACA_ERROR_TYPE_UE, 1ULL);
break;
case ACA_SMU_TYPE_CE:
+   bank->aca_err_type = ACA_SMU_TYPE_CE;
ret = aca_error_cache_log_bank_error(handle, &info,
 ACA_ERROR_TYPE_CE, 
ACA_REG__MISC0__ERRCNT(misc0));
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
index 9459e8cc7413..99bd68f705b0 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_3.c
@@ -1249,10 +1249,12 @@ static int jpeg_v4_0_3_aca_bank_parser(struct 
aca_handle *handle, struct aca_ban
misc0 = bank->regs[ACA_REG_IDX_MISC0];
switch (type) {
case ACA_SMU_TYPE_UE:
+   bank->aca_err_type = ACA_ERROR_TYPE_UE;
ret = aca_error_cache_log_bank_error(handle, &info, 
ACA_ERROR_TYPE_UE,
 1ULL);
break;
case ACA_SMU_TYPE_CE:
+   bank->aca_err_type = ACA_ERROR_TYPE_CE;
ret = aca_error_cache_log_bank_error(handle, &info, 
ACA_ERROR_TYPE_CE,
 
ACA_REG__MISC0__ERRCNT(misc0));
break;
diff --git a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c 
b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
index e646e5cef0a2..17d27b12ccce 100644
--- a/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
+++ b/drivers/gpu/drm/amd/amdgpu/mmhub_v1_8.c
@@ -719,10 +719,12 @@ static int mmhub_v1_8_aca_bank_parser(struct aca_handl

[PATCH v2 05/12] drm/amdgpu: Generate cper records

2025-02-14 Thread Xiang Liu

From: Hawking Zhang 

Encode the error information in CPER format and commit
to the cper ring

Signed-off-by: Hawking Zhang 
Reviewed-by: Yang Wang 
Reviewed-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c | 32 +
 1 file changed, 32 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
index ed1c20bd8114..c0da9096a7fa 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_aca.c
@@ -384,6 +384,36 @@ static bool aca_bank_should_update(struct amdgpu_device 
*adev, enum aca_smu_type
return ret;
 }
 
+static void aca_banks_generate_cper(struct amdgpu_device *adev,
+   enum aca_smu_type type,
+   struct aca_banks *banks,
+   int count)
+{
+   struct aca_bank_node *node;
+   struct aca_bank *bank;
+
+   if (!adev || !banks || !count) {
+   dev_warn(adev->dev, "fail to generate cper records\n");
+   return;
+   }
+
+   /* UEs must be encoded into separate CPER entries */
+   if (type == ACA_SMU_TYPE_UE) {
+   list_for_each_entry(node, &banks->list, node) {
+   bank = &node->bank;
+   if (amdgpu_cper_generate_ue_record(adev, bank))
+   dev_warn(adev->dev, "fail to generate ue cper 
records\n");
+   }
+   } else {
+   /*
+* SMU_TYPE_CE banks are combined into 1 CPER entries,
+* they could be CEs or DEs or both
+*/
+   if (amdgpu_cper_generate_ce_records(adev, banks, count))
+   dev_warn(adev->dev, "fail to generate ce cper 
records\n");
+   }
+}
+
 static int aca_banks_update(struct amdgpu_device *adev, enum aca_smu_type type,
bank_handler_t handler, struct ras_query_context 
*qctx, void *data)
 {
@@ -421,6 +451,8 @@ static int aca_banks_update(struct amdgpu_device *adev, 
enum aca_smu_type type,
if (ret)
goto err_release_banks;
 
+   aca_banks_generate_cper(adev, type, &banks, count);
+
 err_release_banks:
aca_banks_release(&banks);
 
-- 
2.34.1

[PATCH v2 07/12] drm/amdgpu: read CPER ring via debugfs

2025-02-14 Thread Xiang Liu

From: Tao Zhou 

We read CPER data from read pointer to write pointer without changing
the pointers.

Signed-off-by: Tao Zhou 
Reviewed-by: Hawking Zhang 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 47 ++--
 1 file changed, 36 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
index 005cdaee9987..510fe1ad0628 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c
@@ -500,6 +500,7 @@ static ssize_t amdgpu_debugfs_ring_read(struct file *f, 
char __user *buf,
 {
struct amdgpu_ring *ring = file_inode(f)->i_private;
uint32_t value, result, early[3];
+   uint64_t p;
loff_t i;
int r;
 
@@ -523,18 +524,42 @@ static ssize_t amdgpu_debugfs_ring_read(struct file *f, 
char __user *buf,
}
}
 
-   while (size) {
-   if (*pos >= (ring->ring_size + 12))
-   return result;
+   if (ring->funcs->type != AMDGPU_RING_TYPE_CPER) {
+   while (size) {
+   if (*pos >= (ring->ring_size + 12))
+   return result;
 
-   value = ring->ring[(*pos - 12)/4];
-   r = put_user(value, (uint32_t *)buf);
-   if (r)
-   return r;
-   buf += 4;
-   result += 4;
-   size -= 4;
-   *pos += 4;
+   value = ring->ring[(*pos - 12)/4];
+   r = put_user(value, (uint32_t *)buf);
+   if (r)
+   return r;
+   buf += 4;
+   result += 4;
+   size -= 4;
+   *pos += 4;
+   }
+   } else {
+   p = early[0];
+   if (early[0] <= early[1])
+   size = (early[1] - early[0]);
+   else
+   size = ring->ring_size - (early[0] - early[1]);
+
+   while (size) {
+   if (p == early[1])
+   return result;
+
+   value = ring->ring[p];
+   r = put_user(value, (uint32_t *)buf);
+   if (r)
+   return r;
+
+   buf += 4;
+   result += 4;
+   size--;
+   p++;
+   p &= ring->ptr_mask;
+   }
}
 
return result;
-- 
2.34.1

[PATCH v2 10/12] drm/amdgpu: Get timestamp from system time

2025-02-14 Thread Xiang Liu

Get system local time and encode it to timestamp for CPER.

Signed-off-by: Xiang Liu 
Reviewed-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 19 ++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
index c14742eb4d67..0bdc08fba3b1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
@@ -36,6 +36,22 @@ static void __inc_entry_length(struct cper_hdr *hdr, 
uint32_t size)
hdr->record_length += size;
 }
 
+static void amdgpu_cper_get_timestamp(struct cper_timestamp *timestamp)
+{
+   struct tm tm;
+   time64_t now = ktime_get_real_seconds();
+
+   time64_to_tm(now, 0, &tm);
+   timestamp->seconds = tm.tm_sec;
+   timestamp->minutes = tm.tm_min;
+   timestamp->hours = tm.tm_hour;
+   timestamp->flag = 0;
+   timestamp->day = tm.tm_mday;
+   timestamp->month = 1 + tm.tm_mon;
+   timestamp->year = (1900 + tm.tm_year) % 100;
+   timestamp->century = (1900 + tm.tm_year) / 100;
+}
+
 void amdgpu_cper_entry_fill_hdr(struct amdgpu_device *adev,
struct cper_hdr *hdr,
enum amdgpu_cper_type type,
@@ -52,7 +68,8 @@ void amdgpu_cper_entry_fill_hdr(struct amdgpu_device *adev,
hdr->valid_bits.platform_id = 1;
hdr->valid_bits.partition_id= 1;
hdr->valid_bits.timestamp   = 1;
-   /*TODO need to initialize hdr->timestamp */
+
+   amdgpu_cper_get_timestamp(&hdr->timestamp);
 
snprintf(hdr->record_id, 8, "%d", 
atomic_inc_return(&adev->cper.unique_id));
snprintf(hdr->platform_id, 16, "0x%04X:0x%04X",
-- 
2.34.1

Re: [PATCH 3/3] drm/amdgpu: Do not set power brake sequence for Aldebaran SRIOV

2025-02-14 Thread Lazar, Lijo




On 2/14/2025 5:43 AM, Victor Lu wrote:
> Aldebaran SRIOV VF cannot access the power brake feature regs.
> The accesses can be skipped to avoid a dmesg warning.
> 
> Signed-off-by: Victor Lu 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c 
> b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> index 569a76835918..31b378eb5318 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c
> @@ -4044,7 +4044,8 @@ static int gfx_v9_0_hw_init(struct amdgpu_ip_block 
> *ip_block)
>   if (r)
>   return r;
>  
> - if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 2))
> + if (amdgpu_ip_version(adev, GC_HWIP, 0) == IP_VERSION(9, 4, 2) &&
> + !(amdgpu_sriov_vf(adev) && (adev->asic_type == CHIP_ALDEBARAN)))

Asic type check is not required here -
https://github.com/torvalds/linux/blob/master/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c#L2639

Thanks,
Lijo

>   gfx_v9_4_2_set_power_brake_sequence(adev);
>  
>   return r;

[PATCH v2 12/12] drm/amdgpu: Generate bad page threshold cper records

2025-02-14 Thread Xiang Liu

Generate CPER record when bad page threshold exceed and
commit to CPER ring.

v2: return -ENOMEM instead of false
v2: check return value of fill section function

Signed-off-by: Xiang Liu 
Reviewed-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c | 23 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h |  2 ++
 drivers/gpu/drm/amd/pm/amdgpu_dpm.c  |  3 +++
 3 files changed, 28 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
index 00f953ed6740..67ad26c5e6df 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.c
@@ -308,6 +308,29 @@ int amdgpu_cper_generate_ue_record(struct amdgpu_device 
*adev,
return 0;
 }
 
+int amdgpu_cper_generate_bp_threshold_record(struct amdgpu_device *adev)
+{
+   struct cper_hdr *bp_threshold = NULL;
+   struct amdgpu_ring *ring = &adev->cper.ring_buf;
+   int ret;
+
+   bp_threshold = amdgpu_cper_alloc_entry(adev, AMDGPU_CPER_TYPE_FATAL, 1);
+   if (!bp_threshold) {
+   dev_err(adev->dev, "fail to alloc cper entry for bad page 
threshold record\n");
+   return -ENOMEM;
+   }
+
+   amdgpu_cper_entry_fill_hdr(adev, bp_threshold, 
AMDGPU_CPER_TYPE_BP_THRESHOLD,
+   CPER_SEV_FATAL);
+   ret = amdgpu_cper_entry_fill_bad_page_threshold_section(adev, 
bp_threshold, 0);
+   if (ret)
+   return ret;
+
+   amdgpu_cper_ring_write(ring, bp_threshold, bp_threshold->record_length);
+
+   return 0;
+}
+
 static enum cper_error_severity amdgpu_aca_err_type_to_cper_sev(struct 
amdgpu_device *adev,
enum 
aca_error_type aca_err_type)
 {
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
index 527835cbf0d3..561e0a43b4b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cper.h
@@ -94,6 +94,8 @@ int amdgpu_cper_generate_ue_record(struct amdgpu_device *adev,
 int amdgpu_cper_generate_ce_records(struct amdgpu_device *adev,
struct aca_banks *banks,
uint16_t bank_count);
+/* Bad page threshold is encoded into separated cper entry */
+int amdgpu_cper_generate_bp_threshold_record(struct amdgpu_device *adev);
 void amdgpu_cper_ring_write(struct amdgpu_ring *ring,
void *src, int count);
 int amdgpu_cper_init(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
index ba6e44951e57..c7abc0c4e87c 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
@@ -716,6 +716,9 @@ int amdgpu_dpm_send_rma_reason(struct amdgpu_device *adev)
ret = smu_send_rma_reason(smu);
mutex_unlock(&adev->pm.mutex);
 
+   if (amdgpu_cper_generate_bp_threshold_record(adev))
+   dev_warn(adev->dev, "fail to generate bad page threshold cper 
records\n");
+
return ret;
 }
 
-- 
2.34.1

[PATCH v2] drm/amdgpu: fix the memleak caused by fence not released

2025-02-14 Thread Le Ma

On systems with CONFIG_SLUB_DEBUG enabled, the memleak like below
will show up explicitly during driver unloading if created bo without
drm_timeline object before.

BUG drm_sched_fence (Tainted: G   OE ): Objects remaining in 
drm_sched_fence on __kmem_cache_shutdown()

-
Call Trace:

dump_stack_lvl+0x4c/0x70
dump_stack+0x14/0x20
slab_err+0xb0/0xf0
? srso_alias_return_thunk+0x5/0xfbef5
? flush_work+0x12/0x20
? srso_alias_return_thunk+0x5/0xfbef5
__kmem_cache_shutdown+0x163/0x2e0
kmem_cache_destroy+0x61/0x170
drm_sched_fence_slab_fini+0x19/0x900

Thus call dma_fence_put properly to avoid the memleak.

v2: call dma_fence_put in amdgpu_gem_va_update_vm

Signed-off-by: Le Ma 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
index 8b67aae6c2fe..00f1f34705c0 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c
@@ -759,7 +759,8 @@ static struct dma_fence *
 amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
struct amdgpu_vm *vm,
struct amdgpu_bo_va *bo_va,
-   uint32_t operation)
+   uint32_t operation,
+   uint32_t syncobj_handle)
 {
struct dma_fence *fence = dma_fence_get_stub();
int r;
@@ -771,6 +772,9 @@ amdgpu_gem_va_update_vm(struct amdgpu_device *adev,
if (r)
goto error;
 
+   if (!syncobj_handle)
+   dma_fence_put(fence);
+
if (operation == AMDGPU_VA_OP_MAP ||
operation == AMDGPU_VA_OP_REPLACE) {
r = amdgpu_vm_bo_update(adev, bo_va, false);
@@ -965,7 +969,8 @@ int amdgpu_gem_va_ioctl(struct drm_device *dev, void *data,
&timeline_chain);
 
fence = amdgpu_gem_va_update_vm(adev, &fpriv->vm, bo_va,
-   args->operation);
+   args->operation,
+   args->vm_timeline_syncobj_out);
 
if (!r)
amdgpu_gem_update_bo_mapping(filp, bo_va,
-- 
2.43.2

Re: [PATCH 2/3] drm/amdgpu: Pop jobs from the queue more robustly

2025-02-14 Thread Christian König

Am 14.02.25 um 11:34 schrieb Tvrtko Ursulin:
>
> On 14/02/2025 10:31, Christian König wrote:
>> Am 14.02.25 um 11:21 schrieb Tvrtko Ursulin:
>>>
>>> Hi Christian,
>>>
>>> On 11/02/2025 10:21, Christian König wrote:
 Am 11.02.25 um 11:08 schrieb Philipp Stanner:
> On Tue, 2025-02-11 at 09:22 +0100, Christian König wrote:
>> Am 06.02.25 um 17:40 schrieb Tvrtko Ursulin:
>>> Replace a copy of DRM scheduler's to_drm_sched_job with a copy of a
>>> newly
>>> added __drm_sched_entity_queue_pop.
>>>
>>> This allows breaking the hidden dependency that queue_node has to
>>> be the
>>> first element in struct drm_sched_job.
>>>
>>> A comment is also added with a reference to the mailing list
>>> discussion
>>> explaining the copied helper will be removed when the whole broken
>>> amdgpu_job_stop_all_jobs_on_sched is removed.
>>>
>>> Signed-off-by: Tvrtko Ursulin 
>>> Cc: Christian König 
>>> Cc: Danilo Krummrich 
>>> Cc: Matthew Brost 
>>> Cc: Philipp Stanner 
>>> Cc: "Zhang, Hawking" 
>> Reviewed-by: Christian König 
> I think this v3 has been supplanted by a v4 by now.

 I've seen the larger v4 series as well, but at least that patch here looks 
 identical on first glance. So my rb still counts.
>>>
>>> Is it okay for you to merge the whole series (including this single amdgpu 
>>> patch) via drm-misc?
>>
>> I can do that, but don't want the scheduler maintainer want to pick them up?
>
> Sorry that was some bad and unclear English. :(

Don't worry, I'm not a native speaker either and had only very minimal formal 
education on it :)

>
> It is as you suggest - what I meant was, is it okay from your point of view 
> that the whole series is merged via drm-misc? I assume Philipp would indeed 
> be the one to merge it, once all patches get r-b-ed.

Ah! Yes of course it. Feel free to go ahead.

Could only be that Alex runs into merge issues, but that is extremely unlikely 
I think.

Regards,
Christian.

>
> Regards,
>
> Tvrtko
>
> @Tvrtko: btw, do you create patches with
> git format-patch -v4 ?
>
> That way the v4 label will be included in all patch titles, too, not
> just the cover letter. That makes searching etc. easier in large
> inboxes
>
> P.
>
>>> ---
>>>     drivers/gpu/drm/amd/amdgpu/amdgpu_job.c | 22 +++-
>>> -- 
>>>     1 file changed, 19 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> index 100f04475943..22cb48bab24d 100644
>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_job.c
>>> @@ -411,8 +411,24 @@ static struct dma_fence *amdgpu_job_run(struct
>>> drm_sched_job *sched_job)
>>>     return fence;
>>>     }
>>> -#define to_drm_sched_job(sched_job)    \
>>> -    container_of((sched_job), struct drm_sched_job,
>>> queue_node)
>>> +/*
>>> + * This is a duplicate function from DRM scheduler
>>> sched_internal.h.
>>> + * Plan is to remove it when amdgpu_job_stop_all_jobs_on_sched is
>>> removed, due
>>> + * latter being incorrect and racy.
>>> + *
>>> + * See
>>> https://lore.kernel.org/amd-gfx/44edde63-7181-44fb- 
>>> a4f7-94e50514f...@amd.com/
>>> + */
>>> +static struct drm_sched_job *
>>> +__drm_sched_entity_queue_pop(struct drm_sched_entity *entity)
>>> +{
>>> +    struct spsc_node *node;
>>> +
>>> +    node = spsc_queue_pop(&entity->job_queue);
>>> +    if (!node)
>>> +    return NULL;
>>> +
>>> +    return container_of(node, struct drm_sched_job,
>>> queue_node);
>>> +}
>>>     void amdgpu_job_stop_all_jobs_on_sched(struct drm_gpu_scheduler
>>> *sched)
>>>     {
>>> @@ -425,7 +441,7 @@ void amdgpu_job_stop_all_jobs_on_sched(struct
>>> drm_gpu_scheduler *sched)
>>>     struct drm_sched_rq *rq = sched->sched_rq[i];
>>>     spin_lock(&rq->lock);
>>>     list_for_each_entry(s_entity, &rq->entities, list)
>>> {
>>> -    while ((s_job =
>>> to_drm_sched_job(spsc_queue_pop(&s_entity->job_queue {
>>> +    while ((s_job =
>>> __drm_sched_entity_queue_pop(s_entity))) {
>>>     struct drm_sched_fence *s_fence =
>>> s_job->s_fence;
>>>     dma_fence_signal(&s_fence-
 scheduled);

>>>
>>
>

Re: [PATCH] drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV

2025-02-14 Thread SRINIVASAN SHANMUGAM



On 2/14/2025 2:39 PM, Christian König wrote:

Am 14.02.25 um 09:57 schrieb Srinivasan Shanmugam:

RLCG Register Access is a way for virtual functions to safely access GPU
registers in a virtualized environment., including TLB flushes and
register reads. When multiple threads or VFs try to access the same
registers simultaneously, it can lead to race conditions. By using the
RLCG interface, the driver can serialize access to the registers. This
means that only one thread can access the registers at a time,
preventing conflicts and ensuring that operations are performed
correctly. Additionally, when a low-priority task holds a mutex that a
high-priority task needs, ie., If a thread holding a spinlock tries to
acquire a mutex, it can lead to priority inversion. register access in
amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.

The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
called, which attempts to acquire the mutex. This function is invoked
from amdgpu_sriov_wreg, which in turn is called from
gmc_v11_0_flush_gpu_tlb.

The warning [ BUG: Invalid wait context ] indicates that a thread is
trying to acquire a mutex while it is in a context that does not allow
it to sleep (like holding a spinlock).

Fixes the below:

[  253.013423] =
[  253.013434] [ BUG: Invalid wait context ]
[  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U OE
[  253.013464] -
[  253.013475] kworker/0:1/10 is trying to lock:
[  253.013487] 9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, at: 
amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.013815] other info that might help us debug this:
[  253.013827] context-{4:4}
[  253.013835] 3 locks held by kworker/0:1/10:
[  253.013847]  #0: 9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, at: 
process_one_work+0x3f5/0x680
[  253.013877]  #1: b789c008be40 
((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
[  253.013905]  #2: 9f3054281838 (&adev->gmc.invalidate_lock){+.+.}-{2:2}, 
at: gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
[  253.014154] stack backtrace:
[  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U OE  
6.12.0-amdstaging-drm-next-lol-050225 #14
[  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
[  253.014224] Workqueue: events work_for_cpu_fn
[  253.014241] Call Trace:
[  253.014250]  
[  253.014260]  dump_stack_lvl+0x9b/0xf0
[  253.014275]  dump_stack+0x10/0x20
[  253.014287]  __lock_acquire+0xa47/0x2810
[  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014321]  lock_acquire+0xd1/0x300
[  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014562]  ? __lock_acquire+0xa6b/0x2810
[  253.014578]  __mutex_lock+0x85/0xe20
[  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.014782]  ? sched_clock_noinstr+0x9/0x10
[  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.014808]  ? local_clock_noinstr+0xe/0xc0
[  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.015029]  mutex_lock_nested+0x1b/0x30
[  253.015044]  ? mutex_lock_nested+0x1b/0x30
[  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
[  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
[  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
[  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
[  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
[  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5
[  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]
[  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
[  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
[  253.017057]  amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
[  253.017493]  local_pci_probe+0x4b/0xb0
[  253.017746]  work_for_cpu_fn+0x1a/0x30
[  253.017995]  process_one_work+0x21e/0x680
[  253.018248]  worker_thread+0x190/0x330
[  253.018500]  ? __pfx_worker_thread+0x10/0x10
[  253.018746]  kthread+0xe7/0x120
[  253.018988]  ? __pfx_kthread+0x10/0x10
[  253.019231]  ret_from_fork+0x3c/0x60
[  253.019468]  ? __pfx_kthread+0x10/0x10
[  253.019701]  ret_from_fork_asm+0x1a/0x30
[  253.019939]  

Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface")
Cc: lin cao
Cc: Jingwen Chen
Cc: Victor Skvortsov
Cc: Zhigang Luo
Cc: Christian König
Cc: Alex Deucher
Signed-off-by: Srinivasan Shanmugam
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c   | 9 +++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_virt.h   | 3 ++-
  3 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index eab530778fbd..14125cc3a937 100644
--- a/drivers/gpu/drm/amd/

RE: [PATCH 4/4] drm/amdgpu/gfx12: Implement the GFX12 KCQ pipe reset

2025-02-14 Thread Liang, Prike

[Public]

The implementation of the gfx11/gfx12 pipe reset is derived from the gfx9 pipe 
reset sequence. Consequently, the driver sequence may not undergo significant 
changes except for incorporating gfx11/gfx12 firmware support for the pipe 
reset. To reduce the effort needed to address merge conflicts, could these 
series implementations be advanced to upstream?

Regards,
  Prike

> -Original Message-
> From: Liang, Prike 
> Sent: Sunday, January 26, 2025 4:38 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander ; Koenig, Christian
> ; Lazar, Lijo ; Liang, Prike
> 
> Subject: [PATCH 4/4] drm/amdgpu/gfx12: Implement the GFX12 KCQ pipe reset
>
> Implement the GFX12 KCQ pipe reset, disable the GFX12 kernel compute queue
> until the CPFW fully supports it.
>
> Signed-off-by: Prike Liang 
> ---
>  drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c | 89 +-
>  1 file changed, 87 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
> b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
> index 14ea7c1e827e..c5d07d5aa495 100644
> --- a/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/gfx_v12_0.c
> @@ -53,6 +53,7 @@
>
>  #define RLCG_UCODE_LOADING_START_ADDRESS 0x2000L
>  static uint32_t me_fw_start_pc;
> +static uint32_t mec_fw_start_pc;
>
>  MODULE_FIRMWARE("amdgpu/gc_12_0_0_pfp.bin");
>  MODULE_FIRMWARE("amdgpu/gc_12_0_0_me.bin");
> @@ -2127,6 +2128,7 @@ static void gfx_v12_0_config_gfx_rs64(struct
> amdgpu_device *adev)
>   tmp = REG_SET_FIELD(tmp, CP_MEC_RS64_CNTL,
> MEC_PIPE2_RESET, 0);
>   tmp = REG_SET_FIELD(tmp, CP_MEC_RS64_CNTL,
> MEC_PIPE3_RESET, 0);
>   WREG32_SOC15(GC, 0, regCP_MEC_RS64_CNTL, tmp);
> + mec_fw_start_pc = RREG32(SOC15_REG_OFFSET(GC, 0,
> +regCP_MEC_RS64_INSTR_PNTR));
>  }
>
>  static void gfx_v12_0_set_pfp_ucode_start_addr(struct amdgpu_device *adev)
> @@ -5356,6 +5358,87 @@ static int gfx_v12_0_reset_kgq(struct amdgpu_ring
> *ring, unsigned int vmid)
>   return amdgpu_ring_test_ring(ring);
>  }
>
> +static int gfx_v12_0_reset_compute_pipe(struct amdgpu_ring *ring) {
> +
> + struct amdgpu_device *adev = ring->adev;
> + uint32_t reset_pipe = 0, clean_pipe = 0;
> + int r;
> +
> + if (!gfx_v12_pipe_reset_support(adev))
> + return -EOPNOTSUPP;
> +
> + gfx_v12_0_set_safe_mode(adev, 0);
> + mutex_lock(&adev->srbm_mutex);
> + soc24_grbm_select(adev, ring->me, ring->pipe, ring->queue, 0);
> +
> + reset_pipe = RREG32_SOC15(GC, 0, regCP_MEC_RS64_CNTL);
> + clean_pipe = reset_pipe;
> +
> + if (adev->gfx.rs64_enable) {
> +
> + switch (ring->pipe) {
> + case 0:
> + reset_pipe = REG_SET_FIELD(reset_pipe,
> CP_MEC_RS64_CNTL,
> +MEC_PIPE0_RESET, 1);
> + clean_pipe = REG_SET_FIELD(clean_pipe,
> CP_MEC_RS64_CNTL,
> +MEC_PIPE0_RESET, 0);
> + break;
> + case 1:
> + reset_pipe = REG_SET_FIELD(reset_pipe,
> CP_MEC_RS64_CNTL,
> +MEC_PIPE1_RESET, 1);
> + clean_pipe = REG_SET_FIELD(clean_pipe,
> CP_MEC_RS64_CNTL,
> +MEC_PIPE1_RESET, 0);
> + break;
> + case 2:
> + reset_pipe = REG_SET_FIELD(reset_pipe,
> CP_MEC_RS64_CNTL,
> +MEC_PIPE2_RESET, 1);
> + clean_pipe = REG_SET_FIELD(clean_pipe,
> CP_MEC_RS64_CNTL,
> +MEC_PIPE2_RESET, 0);
> + break;
> + case 3:
> + reset_pipe = REG_SET_FIELD(reset_pipe,
> CP_MEC_RS64_CNTL,
> +MEC_PIPE3_RESET, 1);
> + clean_pipe = REG_SET_FIELD(clean_pipe,
> CP_MEC_RS64_CNTL,
> +MEC_PIPE3_RESET, 0);
> + break;
> + default:
> + break;
> + }
> + WREG32_SOC15(GC, 0, regCP_MEC_RS64_CNTL, reset_pipe);
> + WREG32_SOC15(GC, 0, regCP_MEC_RS64_CNTL, clean_pipe);
> + r = RREG32_SOC15(GC, 0, regCP_MEC_RS64_INSTR_PNTR) -
> mec_fw_start_pc;
> + } else {
> + switch (ring->pipe) {
> + case 0:
> + reset_pipe = REG_SET_FIELD(reset_pipe, CP_MEC_CNTL,
> +MEC_ME1_PIPE0_RESET,
> 1);
> + clean_pipe = REG_SET_FIELD(clean_pipe,
> CP_MEC_CNTL,
> +MEC_ME1_PIPE0_RESET,
> 0);
> + break;
> + case 1:
> + reset_pipe = REG_SET_FIELD(reset_pipe, CP_MEC_CNTL,
> +

RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

2025-02-14 Thread Liu, Shaoyun

[AMD Official Use Only - AMD Internal Distribution Only]

Ok .  From MES point of view , we expecting  both set_hw_resource and 
set_hw_resource_1 been called all the time.

Reviewed-by: Shaoyun.liu 

From: Deucher, Alexander 
Sent: Friday, February 14, 2025 11:53 AM
To: Liu, Shaoyun ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once


[AMD Official Use Only - AMD Internal Distribution Only]

I can add that as a follow up patch as I don't want to change the current 
behavior to avoid a potential regression.  Should we submit both the resource 
and resource_1 packets all the time?

Thanks,

Alex


From: Liu, Shaoyun mailto:shaoyun@amd.com>>
Sent: Friday, February 14, 2025 11:45 AM
To: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; 
amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

[AMD Official Use Only - AMD Internal Distribution Only]

I'd suggest remove the  enable_uni_mes check, set_hw_resource_1 is always 
required for gfx12 and  up. Especially after add the  cleaner_shader_fence_addr 
there.

Regards
Shaoyun.liu

-Original Message-
From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Alex Deucher
Sent: Friday, February 14, 2025 10:19 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

Allocate the buffer at sw init time so we don't alloc and free it for every 
suspend/resume or reset cycle.

Signed-off-by: Alex Deucher 
mailto:alexander.deuc...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 39 +-
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 8dbab3834d82d..6db88584dd529 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -678,9 +678,6 @@ static int mes_v12_0_misc_op(struct amdgpu_mes *mes,

 static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes *mes, int pipe)  {
-   unsigned int alloc_size = AMDGPU_GPU_PAGE_SIZE;
-   int ret = 0;
-   struct amdgpu_device *adev = mes->adev;
union MESAPI_SET_HW_RESOURCES_1 mes_set_hw_res_1_pkt;

memset(&mes_set_hw_res_1_pkt, 0, sizeof(mes_set_hw_res_1_pkt)); @@ 
-689,17 +686,6 @@ static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes 
*mes, int pipe)
mes_set_hw_res_1_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
mes_set_hw_res_1_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
mes_set_hw_res_1_pkt.mes_kiq_unmap_timeout = 0xa;
-
-   ret = amdgpu_bo_create_kernel(adev, alloc_size, PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   &mes->resource_1,
-   &mes->resource_1_gpu_addr,
-   &mes->resource_1_addr);
-   if (ret) {
-   dev_err(adev->dev, "(%d) failed to create mes resource_1 bo\n", 
ret);
-   return ret;
-   }
-
mes_set_hw_res_1_pkt.cleaner_shader_fence_mc_addr =
mes->resource_1_gpu_addr;

@@ -1550,6 +1536,20 @@ static int mes_v12_0_sw_init(struct amdgpu_ip_block 
*ip_block)
return r;
}

+   if (adev->enable_uni_mes) {
+   int ret;
+
+   ret = amdgpu_bo_create_kernel(adev, AMDGPU_GPU_PAGE_SIZE, 
PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_VRAM,
+ &adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+   if (ret) {
+   dev_err(adev->dev, "(%d) failed to create mes 
resource_1 bo\n", ret);
+   return ret;
+   }
+   }
+
return 0;
 }

@@ -1558,6 +1558,11 @@ static int mes_v12_0_sw_fini(struct amdgpu_ip_block 
*ip_block)
struct amdgpu_device *adev = ip_block->adev;
int pipe;

+   if (adev->enable_uni_mes)
+   amdgpu_bo_free_kernel(&adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+
for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
kfree(adev->mes.mqd_backup[pipe]);

@@ -1786,12 +1791,6 @@ static int mes_v12_0_hw_init(struct amdgpu_ip_block 
*ip_block)

 static int mes_v12_0_hw_fini(struct amdgpu_ip_block *ip_block)  {
-   struct amdgpu_device *adev = ip_block->adev;
-
-   if (adev->enable_u

[PATCH] drm/amdgpu/mes: keep enforce isolation up to date

2025-02-14 Thread Alex Deucher

Re-send the mes message on resume to make sure the
mes state is up to date.

Fixes: 8521e3c5f058 ("drm/amd/amdgpu: limit single process inside MES")
Signed-off-by: Alex Deucher 
Cc: Shaoyun Liu 
Cc: Srinivasan Shanmugam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 13 -
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 20 +++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h |  2 +-
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  |  4 
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c  |  4 
 5 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index b9bd6654f3172..a194bf3347cbc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1665,24 +1665,19 @@ static ssize_t amdgpu_gfx_set_enforce_isolation(struct 
device *dev,
}
 
mutex_lock(&adev->enforce_isolation_mutex);
-
for (i = 0; i < num_partitions; i++) {
-   if (adev->enforce_isolation[i] && !partition_values[i]) {
+   if (adev->enforce_isolation[i] && !partition_values[i])
/* Going from enabled to disabled */
amdgpu_vmid_free_reserved(adev, AMDGPU_GFXHUB(i));
-   if (adev->enable_mes && adev->gfx.enable_cleaner_shader)
-   amdgpu_mes_set_enforce_isolation(adev, i, 
false);
-   } else if (!adev->enforce_isolation[i] && partition_values[i]) {
+   else if (!adev->enforce_isolation[i] && partition_values[i])
/* Going from disabled to enabled */
amdgpu_vmid_alloc_reserved(adev, AMDGPU_GFXHUB(i));
-   if (adev->enable_mes && adev->gfx.enable_cleaner_shader)
-   amdgpu_mes_set_enforce_isolation(adev, i, true);
-   }
adev->enforce_isolation[i] = partition_values[i];
}
-
mutex_unlock(&adev->enforce_isolation_mutex);
 
+   amdgpu_mes_update_enforce_isolation(adev);
+
return count;
 }
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
index cee38bb6cfaf2..ca076306adba4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1508,7 +1508,8 @@ bool amdgpu_mes_suspend_resume_all_supported(struct 
amdgpu_device *adev)
 }
 
 /* Fix me -- node_id is used to identify the correct MES instances in the 
future */
-int amdgpu_mes_set_enforce_isolation(struct amdgpu_device *adev, uint32_t 
node_id, bool enable)
+static int amdgpu_mes_set_enforce_isolation(struct amdgpu_device *adev,
+   uint32_t node_id, bool enable)
 {
struct mes_misc_op_input op_input = {0};
int r;
@@ -1530,6 +1531,23 @@ int amdgpu_mes_set_enforce_isolation(struct 
amdgpu_device *adev, uint32_t node_i
return r;
 }
 
+int amdgpu_mes_update_enforce_isolation(struct amdgpu_device *adev)
+{
+   int i, r = 0;
+
+   if (adev->enable_mes && adev->gfx.enable_cleaner_shader) {
+   mutex_lock(&adev->enforce_isolation_mutex);
+   for (i = 0; i < (adev->xcp_mgr ? adev->xcp_mgr->num_xcps : 1); 
i++) {
+   if (adev->enforce_isolation[i])
+   r |= amdgpu_mes_set_enforce_isolation(adev, i, 
true);
+   else
+   r |= amdgpu_mes_set_enforce_isolation(adev, i, 
false);
+   }
+   mutex_unlock(&adev->enforce_isolation_mutex);
+   }
+   return r;
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 static int amdgpu_debugfs_mes_event_log_show(struct seq_file *m, void *unused)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 6a792ffc81e33..3a65c3788956d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -532,6 +532,6 @@ static inline void amdgpu_mes_unlock(struct amdgpu_mes *mes)
 
 bool amdgpu_mes_suspend_resume_all_supported(struct amdgpu_device *adev);
 
-int amdgpu_mes_set_enforce_isolation(struct amdgpu_device *adev, uint32_t 
node_id, bool enable);
+int amdgpu_mes_update_enforce_isolation(struct amdgpu_device *adev);
 
 #endif /* __AMDGPU_MES_H__ */
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 530371e6a7aee..fc7b17463cb4d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -1660,6 +1660,10 @@ static int mes_v11_0_hw_init(struct amdgpu_ip_block 
*ip_block)
goto failure;
}
 
+   r = amdgpu_mes_update_enforce_isolation(adev);
+   if (r)
+   goto failure;
+
 out:
/*
 * Disable KIQ ring usage from the driver once MES is enabled.
diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/

Re: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

2025-02-14 Thread Deucher, Alexander

[AMD Official Use Only - AMD Internal Distribution Only]

Does it matter which pipe we use for these packets?

Alex


From: Liu, Shaoyun 
Sent: Friday, February 14, 2025 12:36 PM
To: Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org 
Subject: RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once


[AMD Official Use Only - AMD Internal Distribution Only]


Ok .  From MES point of view , we expecting  both set_hw_resource and 
set_hw_resource_1 been called all the time.



Reviewed-by: Shaoyun.liu 



From: Deucher, Alexander 
Sent: Friday, February 14, 2025 11:53 AM
To: Liu, Shaoyun ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once



[AMD Official Use Only - AMD Internal Distribution Only]



I can add that as a follow up patch as I don't want to change the current 
behavior to avoid a potential regression.  Should we submit both the resource 
and resource_1 packets all the time?



Thanks,



Alex





From: Liu, Shaoyun mailto:shaoyun@amd.com>>
Sent: Friday, February 14, 2025 11:45 AM
To: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; 
amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once



[AMD Official Use Only - AMD Internal Distribution Only]

I'd suggest remove the  enable_uni_mes check, set_hw_resource_1 is always 
required for gfx12 and  up. Especially after add the  cleaner_shader_fence_addr 
there.

Regards
Shaoyun.liu

-Original Message-
From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Alex Deucher
Sent: Friday, February 14, 2025 10:19 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

Allocate the buffer at sw init time so we don't alloc and free it for every 
suspend/resume or reset cycle.

Signed-off-by: Alex Deucher 
mailto:alexander.deuc...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 39 +-
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 8dbab3834d82d..6db88584dd529 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -678,9 +678,6 @@ static int mes_v12_0_misc_op(struct amdgpu_mes *mes,

 static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes *mes, int pipe)  {
-   unsigned int alloc_size = AMDGPU_GPU_PAGE_SIZE;
-   int ret = 0;
-   struct amdgpu_device *adev = mes->adev;
union MESAPI_SET_HW_RESOURCES_1 mes_set_hw_res_1_pkt;

memset(&mes_set_hw_res_1_pkt, 0, sizeof(mes_set_hw_res_1_pkt)); @@ 
-689,17 +686,6 @@ static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes 
*mes, int pipe)
mes_set_hw_res_1_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
mes_set_hw_res_1_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
mes_set_hw_res_1_pkt.mes_kiq_unmap_timeout = 0xa;
-
-   ret = amdgpu_bo_create_kernel(adev, alloc_size, PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   &mes->resource_1,
-   &mes->resource_1_gpu_addr,
-   &mes->resource_1_addr);
-   if (ret) {
-   dev_err(adev->dev, "(%d) failed to create mes resource_1 bo\n", 
ret);
-   return ret;
-   }
-
mes_set_hw_res_1_pkt.cleaner_shader_fence_mc_addr =
mes->resource_1_gpu_addr;

@@ -1550,6 +1536,20 @@ static int mes_v12_0_sw_init(struct amdgpu_ip_block 
*ip_block)
return r;
}

+   if (adev->enable_uni_mes) {
+   int ret;
+
+   ret = amdgpu_bo_create_kernel(adev, AMDGPU_GPU_PAGE_SIZE, 
PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_VRAM,
+ &adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+   if (ret) {
+   dev_err(adev->dev, "(%d) failed to create mes 
resource_1 bo\n", ret);
+   return ret;
+   }
+   }
+
return 0;
 }

@@ -1558,6 +1558,11 @@ static int mes_v12_0_sw_fini(struct amdgpu_ip_block 
*ip_block)
struct amdgpu_device *adev = ip_block->adev;
int pipe;

+   if (adev->enable_uni_mes)
+   amdgpu_bo_free_kernel(&adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resou

RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

2025-02-14 Thread Liu, Shaoyun

[AMD Official Use Only - AMD Internal Distribution Only]

Oh,  you right.  It's only for unified MES , for none-unified , it will still 
use the kiq from CP directly on pipe1 . So there is no MES API for it at all . 
It's my fault . please ignore my previous comments . Your current change for 
this serials is good enough.

Regards
Shaoyun.liu

From: Deucher, Alexander 
Sent: Friday, February 14, 2025 12:42 PM
To: Liu, Shaoyun ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once


[AMD Official Use Only - AMD Internal Distribution Only]

Does it matter which pipe we use for these packets?

Alex


From: Liu, Shaoyun mailto:shaoyun@amd.com>>
Sent: Friday, February 14, 2025 12:36 PM
To: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; 
amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Subject: RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once


[AMD Official Use Only - AMD Internal Distribution Only]


Ok .  From MES point of view , we expecting  both set_hw_resource and 
set_hw_resource_1 been called all the time.



Reviewed-by: Shaoyun.liu mailto:shaoyun@amd.com>>



From: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Sent: Friday, February 14, 2025 11:53 AM
To: Liu, Shaoyun mailto:shaoyun@amd.com>>; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once



[AMD Official Use Only - AMD Internal Distribution Only]



I can add that as a follow up patch as I don't want to change the current 
behavior to avoid a potential regression.  Should we submit both the resource 
and resource_1 packets all the time?



Thanks,



Alex





From: Liu, Shaoyun mailto:shaoyun@amd.com>>
Sent: Friday, February 14, 2025 11:45 AM
To: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>; 
amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once



[AMD Official Use Only - AMD Internal Distribution Only]

I'd suggest remove the  enable_uni_mes check, set_hw_resource_1 is always 
required for gfx12 and  up. Especially after add the  cleaner_shader_fence_addr 
there.

Regards
Shaoyun.liu

-Original Message-
From: amd-gfx 
mailto:amd-gfx-boun...@lists.freedesktop.org>>
 On Behalf Of Alex Deucher
Sent: Friday, February 14, 2025 10:19 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
mailto:alexander.deuc...@amd.com>>
Subject: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

Allocate the buffer at sw init time so we don't alloc and free it for every 
suspend/resume or reset cycle.

Signed-off-by: Alex Deucher 
mailto:alexander.deuc...@amd.com>>
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 39 +-
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 8dbab3834d82d..6db88584dd529 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -678,9 +678,6 @@ static int mes_v12_0_misc_op(struct amdgpu_mes *mes,

 static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes *mes, int pipe)  {
-   unsigned int alloc_size = AMDGPU_GPU_PAGE_SIZE;
-   int ret = 0;
-   struct amdgpu_device *adev = mes->adev;
union MESAPI_SET_HW_RESOURCES_1 mes_set_hw_res_1_pkt;

memset(&mes_set_hw_res_1_pkt, 0, sizeof(mes_set_hw_res_1_pkt)); @@ 
-689,17 +686,6 @@ static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes 
*mes, int pipe)
mes_set_hw_res_1_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
mes_set_hw_res_1_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
mes_set_hw_res_1_pkt.mes_kiq_unmap_timeout = 0xa;
-
-   ret = amdgpu_bo_create_kernel(adev, alloc_size, PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   &mes->resource_1,
-   &mes->resource_1_gpu_addr,
-   &mes->resource_1_addr);
-   if (ret) {
-   dev_err(adev->dev, "(%d) failed to create mes resource_1 bo\n", 
ret);
-   return ret;
-   }
-
mes_set_hw_res_1_pkt.cleaner_shader_fence_mc_addr =
mes->resource_1_gpu_addr;

@@ -1550,6 +1536,20 @@ static int mes_v12_0_sw_init(struct amdgpu_ip_block 
*ip_block)
return r;
}

+   if (adev->enable_uni_mes) {
+   int ret;
+
+   ret = amdgpu_bo_create_kernel(adev, AMDGPU_GPU_PAGE_SIZE, 
PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_VRAM

Re: [PATCH] drm/amd/pm: extend the gfxoff delay for compute workload

2025-02-14 Thread Alex Deucher

On Fri, Feb 14, 2025 at 7:32 AM Kenneth Feng  wrote:
>
> extend the gfxoff delay for compute workload on smu 14.0.2/3
> to fix the kfd test issue.

This doesn't make sense.  We explicitly disallow gfxoff in
amdgpu_amdkfd_set_compute_idle() already so it should already be
disallowed.

Alex


>
> Signed-off-by: Kenneth Feng 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c   |  3 +++
>  drivers/gpu/drm/amd/pm/amdgpu_dpm.c   | 14 ++
>  drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h   |  1 +
>  drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c | 15 +++
>  drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h |  2 ++
>  5 files changed, 35 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index b9bd6654f317..4ae6fde6c69c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -776,6 +776,9 @@ static void amdgpu_gfx_do_off_ctrl(struct amdgpu_device 
> *adev, bool enable,
>  {
> unsigned long delay = GFX_OFF_DELAY_ENABLE;
>
> +   if (amdgpu_dpm_need_extra_gfxoff_delay(adev))
> +   delay *= 5;
> +
> if (!(adev->pm.pp_feature & PP_GFXOFF_MASK))
> return;
>
> diff --git a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c 
> b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> index 7a22aef6e59c..87de50b73a0e 100644
> --- a/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> +++ b/drivers/gpu/drm/amd/pm/amdgpu_dpm.c
> @@ -873,6 +873,20 @@ int amdgpu_dpm_get_status_gfxoff(struct amdgpu_device 
> *adev, uint32_t *value)
> return ret;
>  }
>
> +bool amdgpu_dpm_need_extra_gfxoff_delay(struct amdgpu_device *adev)
> +{
> +   struct smu_context *smu = adev->powerplay.pp_handle;
> +   bool ret = false;
> +
> +   if (is_support_sw_smu(adev)) {
> +   mutex_lock(&adev->pm.mutex);
> +   ret = smu_need_extra_gfxoff_delay(smu);
> +   mutex_unlock(&adev->pm.mutex);
> +   }
> +
> +   return ret;
> +}
> +
>  uint64_t amdgpu_dpm_get_thermal_throttling_counter(struct amdgpu_device 
> *adev)
>  {
> struct smu_context *smu = adev->powerplay.pp_handle;
> diff --git a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h 
> b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
> index 1f5ac7e0230d..312ad348ce82 100644
> --- a/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
> +++ b/drivers/gpu/drm/amd/pm/inc/amdgpu_dpm.h
> @@ -470,6 +470,7 @@ int amdgpu_dpm_get_residency_gfxoff(struct amdgpu_device 
> *adev, u32 *value);
>  int amdgpu_dpm_set_residency_gfxoff(struct amdgpu_device *adev, bool value);
>  int amdgpu_dpm_get_entrycount_gfxoff(struct amdgpu_device *adev, u64 *value);
>  int amdgpu_dpm_get_status_gfxoff(struct amdgpu_device *adev, uint32_t 
> *value);
> +bool amdgpu_dpm_need_extra_gfxoff_delay(struct amdgpu_device *adev);
>  uint64_t amdgpu_dpm_get_thermal_throttling_counter(struct amdgpu_device 
> *adev);
>  void amdgpu_dpm_gfx_state_change(struct amdgpu_device *adev,
>  enum gfx_change_state state);
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c 
> b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> index a1164912f674..61cd170ec30a 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/amdgpu_smu.c
> @@ -133,6 +133,21 @@ int smu_get_status_gfxoff(struct smu_context *smu, 
> uint32_t *value)
> return 0;
>  }
>
> +bool smu_need_extra_gfxoff_delay(struct smu_context *smu)
> +{
> +   bool ret = false;
> +
> +   if (!smu->pm_enabled)
> +   return false;
> +
> +   if (((amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(14, 0, 
> 2)) ||
> +   (amdgpu_ip_version(smu->adev, MP1_HWIP, 0) == IP_VERSION(14, 0, 
> 3))) &&
> +smu->workload_mask & (1 << PP_SMC_POWER_PROFILE_COMPUTE))
> +   return true;
> +
> +   return ret;
> +}
> +
>  int smu_set_soft_freq_range(struct smu_context *smu,
> enum smu_clk_type clk_type,
> uint32_t min,
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h 
> b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> index 3630593bce61..82f06c2a752d 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> +++ b/drivers/gpu/drm/amd/pm/swsmu/inc/amdgpu_smu.h
> @@ -1626,6 +1626,8 @@ int smu_set_residency_gfxoff(struct smu_context *smu, 
> bool value);
>
>  int smu_get_status_gfxoff(struct smu_context *smu, uint32_t *value);
>
> +bool smu_need_extra_gfxoff_delay(struct smu_context *smu);
> +
>  int smu_handle_passthrough_sbr(struct smu_context *smu, bool enable);
>
>  int smu_wait_for_event(struct smu_context *smu, enum smu_event_type event,
> --
> 2.34.1
>

RE: [PATCH 1/2] drm/amdgpu/mes11: allocate hw_resource_1 buffer once

2025-02-14 Thread Liu, Shaoyun

[AMD Official Use Only - AMD Internal Distribution Only]

Looks good to me .
Reviewed-by: Shaoyun.liu < shaouyun@amd.com>

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Friday, February 14, 2025 10:19 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH 1/2] drm/amdgpu/mes11: allocate hw_resource_1 buffer once

Allocate the buffer at sw init time so we don't alloc and free it for every 
suspend/resume or reset cycle.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 52 +-
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 26af0af718b5e..530371e6a7aee 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -64,6 +64,7 @@ static int mes_v11_0_kiq_hw_fini(struct amdgpu_device *adev);

 #define MES_EOP_SIZE   2048
 #define GFX_MES_DRAM_SIZE  0x8
+#define MES11_HW_RESOURCE_1_SIZE (128 * AMDGPU_GPU_PAGE_SIZE)

 static void mes_v11_0_ring_set_wptr(struct amdgpu_ring *ring)  { @@ -743,11 
+744,6 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes *mes)

 static int mes_v11_0_set_hw_resources_1(struct amdgpu_mes *mes)  {
-   unsigned int hw_rsrc_size = 128 * AMDGPU_GPU_PAGE_SIZE;
-   /* add a page for the cleaner shader fence */
-   unsigned int alloc_size = hw_rsrc_size + AMDGPU_GPU_PAGE_SIZE;
-   int ret = 0;
-   struct amdgpu_device *adev = mes->adev;
union MESAPI_SET_HW_RESOURCES_1 mes_set_hw_res_pkt;
memset(&mes_set_hw_res_pkt, 0, sizeof(mes_set_hw_res_pkt));

@@ -755,21 +751,10 @@ static int mes_v11_0_set_hw_resources_1(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
mes_set_hw_res_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
mes_set_hw_res_pkt.enable_mes_info_ctx = 1;
-
-   ret = amdgpu_bo_create_kernel(adev, alloc_size, PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   &mes->resource_1,
-   &mes->resource_1_gpu_addr,
-   &mes->resource_1_addr);
-   if (ret) {
-   dev_err(adev->dev, "(%d) failed to create mes resource_1 bo\n", 
ret);
-   return ret;
-   }
-
mes_set_hw_res_pkt.mes_info_ctx_mc_addr = mes->resource_1_gpu_addr;
-   mes_set_hw_res_pkt.mes_info_ctx_size = hw_rsrc_size;
+   mes_set_hw_res_pkt.mes_info_ctx_size = MES11_HW_RESOURCE_1_SIZE;
mes_set_hw_res_pkt.cleaner_shader_fence_mc_addr =
-   mes->resource_1_gpu_addr + hw_rsrc_size;
+   mes->resource_1_gpu_addr + MES11_HW_RESOURCE_1_SIZE;

return mes_v11_0_submit_pkt_and_poll_completion(mes,
&mes_set_hw_res_pkt, sizeof(mes_set_hw_res_pkt), @@ 
-1442,6 +1427,23 @@ static int mes_v11_0_sw_init(struct amdgpu_ip_block 
*ip_block)
if (r)
return r;

+   if (amdgpu_sriov_is_mes_info_enable(adev) ||
+   adev->gfx.enable_cleaner_shader) {
+   int ret;
+
+   ret = amdgpu_bo_create_kernel(adev,
+ MES11_HW_RESOURCE_1_SIZE + 
AMDGPU_GPU_PAGE_SIZE,
+ PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_VRAM,
+ &adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+   if (ret) {
+   dev_err(adev->dev, "(%d) failed to create mes 
resource_1 bo\n", ret);
+   return ret;
+   }
+   }
+
return 0;
 }

@@ -1450,6 +1452,12 @@ static int mes_v11_0_sw_fini(struct amdgpu_ip_block 
*ip_block)
struct amdgpu_device *adev = ip_block->adev;
int pipe;

+   if (amdgpu_sriov_is_mes_info_enable(adev) ||
+   adev->gfx.enable_cleaner_shader) {
+   amdgpu_bo_free_kernel(&adev->mes.resource_1, 
&adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+   }
+
for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
kfree(adev->mes.mqd_backup[pipe]);

@@ -1670,14 +1678,6 @@ static int mes_v11_0_hw_init(struct amdgpu_ip_block 
*ip_block)

 static int mes_v11_0_hw_fini(struct amdgpu_ip_block *ip_block)  {
-   struct amdgpu_device *adev = ip_block->adev;
-
-   if (amdgpu_sriov_is_mes_info_enable(adev) ||
-   adev->gfx.enable_cleaner_shader) {
-   amdgpu_bo_free_kernel(&adev->mes.resource_1, 
&adev->mes.resource_1_gpu_addr,
- &adev->mes.resource_1_addr);
-   }
-
return 0;
 }

--
2.48.1

RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

2025-02-14 Thread Liu, Shaoyun

[AMD Official Use Only - AMD Internal Distribution Only]

I'd suggest remove the  enable_uni_mes check, set_hw_resource_1 is always 
required for gfx12 and  up. Especially after add the  cleaner_shader_fence_addr 
there.

Regards
Shaoyun.liu

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Friday, February 14, 2025 10:19 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

Allocate the buffer at sw init time so we don't alloc and free it for every 
suspend/resume or reset cycle.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 39 +-
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 8dbab3834d82d..6db88584dd529 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -678,9 +678,6 @@ static int mes_v12_0_misc_op(struct amdgpu_mes *mes,

 static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes *mes, int pipe)  {
-   unsigned int alloc_size = AMDGPU_GPU_PAGE_SIZE;
-   int ret = 0;
-   struct amdgpu_device *adev = mes->adev;
union MESAPI_SET_HW_RESOURCES_1 mes_set_hw_res_1_pkt;

memset(&mes_set_hw_res_1_pkt, 0, sizeof(mes_set_hw_res_1_pkt)); @@ 
-689,17 +686,6 @@ static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes 
*mes, int pipe)
mes_set_hw_res_1_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
mes_set_hw_res_1_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
mes_set_hw_res_1_pkt.mes_kiq_unmap_timeout = 0xa;
-
-   ret = amdgpu_bo_create_kernel(adev, alloc_size, PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   &mes->resource_1,
-   &mes->resource_1_gpu_addr,
-   &mes->resource_1_addr);
-   if (ret) {
-   dev_err(adev->dev, "(%d) failed to create mes resource_1 bo\n", 
ret);
-   return ret;
-   }
-
mes_set_hw_res_1_pkt.cleaner_shader_fence_mc_addr =
mes->resource_1_gpu_addr;

@@ -1550,6 +1536,20 @@ static int mes_v12_0_sw_init(struct amdgpu_ip_block 
*ip_block)
return r;
}

+   if (adev->enable_uni_mes) {
+   int ret;
+
+   ret = amdgpu_bo_create_kernel(adev, AMDGPU_GPU_PAGE_SIZE, 
PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_VRAM,
+ &adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+   if (ret) {
+   dev_err(adev->dev, "(%d) failed to create mes 
resource_1 bo\n", ret);
+   return ret;
+   }
+   }
+
return 0;
 }

@@ -1558,6 +1558,11 @@ static int mes_v12_0_sw_fini(struct amdgpu_ip_block 
*ip_block)
struct amdgpu_device *adev = ip_block->adev;
int pipe;

+   if (adev->enable_uni_mes)
+   amdgpu_bo_free_kernel(&adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+
for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
kfree(adev->mes.mqd_backup[pipe]);

@@ -1786,12 +1791,6 @@ static int mes_v12_0_hw_init(struct amdgpu_ip_block 
*ip_block)

 static int mes_v12_0_hw_fini(struct amdgpu_ip_block *ip_block)  {
-   struct amdgpu_device *adev = ip_block->adev;
-
-   if (adev->enable_uni_mes)
-   amdgpu_bo_free_kernel(&adev->mes.resource_1,
- &adev->mes.resource_1_gpu_addr,
- &adev->mes.resource_1_addr);
return 0;
 }

--
2.48.1

Re: [PATCH 1/2] drm/amdgpu/mes11: allocate hw_resource_1 buffer once

2025-02-14 Thread Alex Deucher

On Fri, Feb 14, 2025 at 11:42 AM Liu, Shaoyun  wrote:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> Looks good to me .
> Reviewed-by: Shaoyun.liu < shaouyun@amd.com>

Thanks, is this for the whole series or just this patch?

Alex

>
> -Original Message-
> From: amd-gfx  On Behalf Of Alex 
> Deucher
> Sent: Friday, February 14, 2025 10:19 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH 1/2] drm/amdgpu/mes11: allocate hw_resource_1 buffer once
>
> Allocate the buffer at sw init time so we don't alloc and free it for every 
> suspend/resume or reset cycle.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 52 +-
>  1 file changed, 26 insertions(+), 26 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
> b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> index 26af0af718b5e..530371e6a7aee 100644
> --- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> +++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
> @@ -64,6 +64,7 @@ static int mes_v11_0_kiq_hw_fini(struct amdgpu_device 
> *adev);
>
>  #define MES_EOP_SIZE   2048
>  #define GFX_MES_DRAM_SIZE  0x8
> +#define MES11_HW_RESOURCE_1_SIZE (128 * AMDGPU_GPU_PAGE_SIZE)
>
>  static void mes_v11_0_ring_set_wptr(struct amdgpu_ring *ring)  { @@ -743,11 
> +744,6 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes *mes)
>
>  static int mes_v11_0_set_hw_resources_1(struct amdgpu_mes *mes)  {
> -   unsigned int hw_rsrc_size = 128 * AMDGPU_GPU_PAGE_SIZE;
> -   /* add a page for the cleaner shader fence */
> -   unsigned int alloc_size = hw_rsrc_size + AMDGPU_GPU_PAGE_SIZE;
> -   int ret = 0;
> -   struct amdgpu_device *adev = mes->adev;
> union MESAPI_SET_HW_RESOURCES_1 mes_set_hw_res_pkt;
> memset(&mes_set_hw_res_pkt, 0, sizeof(mes_set_hw_res_pkt));
>
> @@ -755,21 +751,10 @@ static int mes_v11_0_set_hw_resources_1(struct 
> amdgpu_mes *mes)
> mes_set_hw_res_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
> mes_set_hw_res_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
> mes_set_hw_res_pkt.enable_mes_info_ctx = 1;
> -
> -   ret = amdgpu_bo_create_kernel(adev, alloc_size, PAGE_SIZE,
> -   AMDGPU_GEM_DOMAIN_VRAM,
> -   &mes->resource_1,
> -   &mes->resource_1_gpu_addr,
> -   &mes->resource_1_addr);
> -   if (ret) {
> -   dev_err(adev->dev, "(%d) failed to create mes resource_1 
> bo\n", ret);
> -   return ret;
> -   }
> -
> mes_set_hw_res_pkt.mes_info_ctx_mc_addr = mes->resource_1_gpu_addr;
> -   mes_set_hw_res_pkt.mes_info_ctx_size = hw_rsrc_size;
> +   mes_set_hw_res_pkt.mes_info_ctx_size = MES11_HW_RESOURCE_1_SIZE;
> mes_set_hw_res_pkt.cleaner_shader_fence_mc_addr =
> -   mes->resource_1_gpu_addr + hw_rsrc_size;
> +   mes->resource_1_gpu_addr + MES11_HW_RESOURCE_1_SIZE;
>
> return mes_v11_0_submit_pkt_and_poll_completion(mes,
> &mes_set_hw_res_pkt, sizeof(mes_set_hw_res_pkt), @@ 
> -1442,6 +1427,23 @@ static int mes_v11_0_sw_init(struct amdgpu_ip_block 
> *ip_block)
> if (r)
> return r;
>
> +   if (amdgpu_sriov_is_mes_info_enable(adev) ||
> +   adev->gfx.enable_cleaner_shader) {
> +   int ret;
> +
> +   ret = amdgpu_bo_create_kernel(adev,
> + MES11_HW_RESOURCE_1_SIZE + 
> AMDGPU_GPU_PAGE_SIZE,
> + PAGE_SIZE,
> + AMDGPU_GEM_DOMAIN_VRAM,
> + &adev->mes.resource_1,
> + &adev->mes.resource_1_gpu_addr,
> + &adev->mes.resource_1_addr);
> +   if (ret) {
> +   dev_err(adev->dev, "(%d) failed to create mes 
> resource_1 bo\n", ret);
> +   return ret;
> +   }
> +   }
> +
> return 0;
>  }
>
> @@ -1450,6 +1452,12 @@ static int mes_v11_0_sw_fini(struct amdgpu_ip_block 
> *ip_block)
> struct amdgpu_device *adev = ip_block->adev;
> int pipe;
>
> +   if (amdgpu_sriov_is_mes_info_enable(adev) ||
> +   adev->gfx.enable_cleaner_shader) {
> +   amdgpu_bo_free_kernel(&adev->mes.resource_1, 
> &adev->mes.resource_1_gpu_addr,
> + &adev->mes.resource_1_addr);
> +   }
> +
> for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
> kfree(adev->mes.mqd_backup[pipe]);
>
> @@ -1670,14 +1678,6 @@ static int mes_v11_0_hw_init(struct amdgpu_ip_block 
> *ip_block)
>
>  static int mes_v11_0_hw_fini(struct amdgpu_ip_block *ip_block)  {
> -   struct amdgpu_device *adev = ip_block->adev;
> -
> -

Re: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

2025-02-14 Thread Deucher, Alexander

[AMD Official Use Only - AMD Internal Distribution Only]

I can add that as a follow up patch as I don't want to change the current 
behavior to avoid a potential regression.  Should we submit both the resource 
and resource_1 packets all the time?

Thanks,

Alex


From: Liu, Shaoyun 
Sent: Friday, February 14, 2025 11:45 AM
To: Deucher, Alexander ; 
amd-gfx@lists.freedesktop.org 
Cc: Deucher, Alexander 
Subject: RE: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

[AMD Official Use Only - AMD Internal Distribution Only]

I'd suggest remove the  enable_uni_mes check, set_hw_resource_1 is always 
required for gfx12 and  up. Especially after add the  cleaner_shader_fence_addr 
there.

Regards
Shaoyun.liu

-Original Message-
From: amd-gfx  On Behalf Of Alex Deucher
Sent: Friday, February 14, 2025 10:19 AM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander 
Subject: [PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

Allocate the buffer at sw init time so we don't alloc and free it for every 
suspend/resume or reset cycle.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 39 +-
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 8dbab3834d82d..6db88584dd529 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -678,9 +678,6 @@ static int mes_v12_0_misc_op(struct amdgpu_mes *mes,

 static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes *mes, int pipe)  {
-   unsigned int alloc_size = AMDGPU_GPU_PAGE_SIZE;
-   int ret = 0;
-   struct amdgpu_device *adev = mes->adev;
union MESAPI_SET_HW_RESOURCES_1 mes_set_hw_res_1_pkt;

memset(&mes_set_hw_res_1_pkt, 0, sizeof(mes_set_hw_res_1_pkt)); @@ 
-689,17 +686,6 @@ static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes 
*mes, int pipe)
mes_set_hw_res_1_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
mes_set_hw_res_1_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
mes_set_hw_res_1_pkt.mes_kiq_unmap_timeout = 0xa;
-
-   ret = amdgpu_bo_create_kernel(adev, alloc_size, PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   &mes->resource_1,
-   &mes->resource_1_gpu_addr,
-   &mes->resource_1_addr);
-   if (ret) {
-   dev_err(adev->dev, "(%d) failed to create mes resource_1 bo\n", 
ret);
-   return ret;
-   }
-
mes_set_hw_res_1_pkt.cleaner_shader_fence_mc_addr =
mes->resource_1_gpu_addr;

@@ -1550,6 +1536,20 @@ static int mes_v12_0_sw_init(struct amdgpu_ip_block 
*ip_block)
return r;
}

+   if (adev->enable_uni_mes) {
+   int ret;
+
+   ret = amdgpu_bo_create_kernel(adev, AMDGPU_GPU_PAGE_SIZE, 
PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_VRAM,
+ &adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+   if (ret) {
+   dev_err(adev->dev, "(%d) failed to create mes 
resource_1 bo\n", ret);
+   return ret;
+   }
+   }
+
return 0;
 }

@@ -1558,6 +1558,11 @@ static int mes_v12_0_sw_fini(struct amdgpu_ip_block 
*ip_block)
struct amdgpu_device *adev = ip_block->adev;
int pipe;

+   if (adev->enable_uni_mes)
+   amdgpu_bo_free_kernel(&adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+
for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
kfree(adev->mes.mqd_backup[pipe]);

@@ -1786,12 +1791,6 @@ static int mes_v12_0_hw_init(struct amdgpu_ip_block 
*ip_block)

 static int mes_v12_0_hw_fini(struct amdgpu_ip_block *ip_block)  {
-   struct amdgpu_device *adev = ip_block->adev;
-
-   if (adev->enable_uni_mes)
-   amdgpu_bo_free_kernel(&adev->mes.resource_1,
- &adev->mes.resource_1_gpu_addr,
- &adev->mes.resource_1_addr);
return 0;
 }

--
2.48.1

RE: [PATCH] drm/amdgpu/display: Allow DCC for video formats on GFX12

2025-02-14 Thread Dong, Ruijing

[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Ruijing Dong 

-Original Message-
From: amd-gfx  On Behalf Of David Rosca
Sent: Thursday, February 13, 2025 12:07 PM
To: amd-gfx@lists.freedesktop.org
Cc: Rosca, David 
Subject: [PATCH] drm/amdgpu/display: Allow DCC for video formats on GFX12

We advertise DCC as supported for NV12/P010 formats on GFX12, but it would fail 
on this check on commit.

Signed-off-by: David Rosca 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
index 774cc3f4f3fd..92472109f84a 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm_plane.c
@@ -277,8 +277,11 @@ static int amdgpu_dm_plane_validate_dcc(struct 
amdgpu_device *adev,
if (!dcc->enable)
return 0;

-   if (format >= SURFACE_PIXEL_FORMAT_VIDEO_BEGIN ||
-   !dc->cap_funcs.get_dcc_compression_cap)
+   if (adev->family < AMDGPU_FAMILY_GC_12_0_0 &&
+   format >= SURFACE_PIXEL_FORMAT_VIDEO_BEGIN)
+   return -EINVAL;
+
+   if (!dc->cap_funcs.get_dcc_compression_cap)
return -EINVAL;

input.format = format;
--
2.43.0

Re: [PATCH] drm/amdgpu: Replace Mutex with Spinlock for RLCG register access to avoid Priority Inversion in SRIOV

2025-02-14 Thread Skvortsov, Victor






>> On 2/14/2025 2:39 PM, Christian König wrote:
>>> Am 14.02.25 um 09:57 schrieb Srinivasan Shanmugam:
>>> RLCG Register Access is a way for virtual functions to safely access GPU
>>> registers in a virtualized environment., including TLB flushes and
>>> register reads. When multiple threads or VFs try to access the same
>>> registers simultaneously, it can lead to race conditions. By using the
>>> RLCG interface, the driver can serialize access to the registers. This
>>> means that only one thread can access the registers at a time,
>>> preventing conflicts and ensuring that operations are performed
>>> correctly. Additionally, when a low-priority task holds a mutex that a
>>> high-priority task needs, ie., If a thread holding a spinlock tries to
>>> acquire a mutex, it can lead to priority inversion. register access in
>>> amdgpu_virt_rlcg_reg_rw especially in a fast code path is critical.
>>>
>>> The call stack shows that the function amdgpu_virt_rlcg_reg_rw is being
>>> called, which attempts to acquire the mutex. This function is invoked
>>> from amdgpu_sriov_wreg, which in turn is called from
>>> gmc_v11_0_flush_gpu_tlb.
>>>
>>> The warning [ BUG: Invalid wait context ] indicates that a thread is
>>> trying to acquire a mutex while it is in a context that does not allow
>>> it to sleep (like holding a spinlock).
>>>
>>> Fixes the below:
>>>
>>> [  253.013423] =
>>> [  253.013434] [ BUG: Invalid wait context ]
>>> [  253.013446] 6.12.0-amdstaging-drm-next-lol-050225 #14 Tainted: G U   
>>>   OE
>>> [  253.013464] -
>>> [  253.013475] kworker/0:1/10 is trying to lock:
>>> [  253.013487] 9f30542e3cf8 (&adev->virt.rlcg_reg_lock){+.+.}-{3:3}, 
>>> at: amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
>>> [  253.013815] other info that might help us debug this:
>>> [  253.013827] context-{4:4}
>>> [  253.013835] 3 locks held by kworker/0:1/10:
>>> [  253.013847]  #0: 9f3040050f58 ((wq_completion)events){+.+.}-{0:0}, 
>>> at: process_one_work+0x3f5/0x680
>>> [  253.013877]  #1: b789c008be40 
>>> ((work_completion)(&wfc.work)){+.+.}-{0:0}, at: process_one_work+0x1d6/0x680
>>> [  253.013905]  #2: 9f3054281838 
>>> (&adev->gmc.invalidate_lock){+.+.}-{2:2}, at: 
>>> gmc_v11_0_flush_gpu_tlb+0x198/0x4f0 [amdgpu]
>>> [  253.014154] stack backtrace:
>>> [  253.014164] CPU: 0 UID: 0 PID: 10 Comm: kworker/0:1 Tainted: G U 
>>> OE  6.12.0-amdstaging-drm-next-lol-050225 #14
>>> [  253.014189] Tainted: [U]=USER, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
>>> [  253.014203] Hardware name: Microsoft Corporation Virtual Machine/Virtual 
>>> Machine, BIOS Hyper-V UEFI Release v4.1 11/18/2024
>>> [  253.014224] Workqueue: events work_for_cpu_fn
>>> [  253.014241] Call Trace:
>>> [  253.014250]  
>>> [  253.014260]  dump_stack_lvl+0x9b/0xf0
>>> [  253.014275]  dump_stack+0x10/0x20
>>> [  253.014287]  __lock_acquire+0xa47/0x2810
>>> [  253.014303]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> [  253.014321]  lock_acquire+0xd1/0x300
>>> [  253.014333]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
>>> [  253.014562]  ? __lock_acquire+0xa6b/0x2810
>>> [  253.014578]  __mutex_lock+0x85/0xe20
>>> [  253.014591]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
>>> [  253.014782]  ? sched_clock_noinstr+0x9/0x10
>>> [  253.014795]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> [  253.014808]  ? local_clock_noinstr+0xe/0xc0
>>> [  253.014822]  ? amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
>>> [  253.015012]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> [  253.015029]  mutex_lock_nested+0x1b/0x30
>>> [  253.015044]  ? mutex_lock_nested+0x1b/0x30
>>> [  253.015057]  amdgpu_virt_rlcg_reg_rw+0xf6/0x330 [amdgpu]
>>> [  253.015249]  amdgpu_sriov_wreg+0xc5/0xd0 [amdgpu]
>>> [  253.015435]  gmc_v11_0_flush_gpu_tlb+0x44b/0x4f0 [amdgpu]
>>> [  253.015667]  gfx_v11_0_hw_init+0x499/0x29c0 [amdgpu]
>>> [  253.015901]  ? __pfx_smu_v13_0_update_pcie_parameters+0x10/0x10 [amdgpu]
>>> [  253.016159]  ? srso_alias_return_thunk+0x5/0xfbef5
>>> [  253.016173]  ? smu_hw_init+0x18d/0x300 [amdgpu]
>>> [  253.016403]  amdgpu_device_init+0x29ad/0x36a0 [amdgpu]
>>> [  253.016614]  amdgpu_driver_load_kms+0x1a/0xc0 [amdgpu]
>>> [  253.017057]  amdgpu_pci_probe+0x1c2/0x660 [amdgpu]
>>> [  253.017493]  local_pci_probe+0x4b/0xb0
>>> [  253.017746]  work_for_cpu_fn+0x1a/0x30
>>> [  253.017995]  process_one_work+0x21e/0x680
>>> [  253.018248]  worker_thread+0x190/0x330
>>> [  253.018500]  ? __pfx_worker_thread+0x10/0x10
>>> [  253.018746]  kthread+0xe7/0x120
>>> [  253.018988]  ? __pfx_kthread+0x10/0x10
>>> [  253.019231]  ret_from_fork+0x3c/0x60
>>> [  253.019468]  ? __pfx_kthread+0x10/0x10
>>> [  253.019701]  ret_from_fork_asm+0x1a/0x30
>>> [  253.019939]  
>>>
>>> Fixes: e864180ee49b ("drm/amdgpu: Add lock around VF RLCG interface")
>>> Cc: lin cao mailto:lin@amd.com
>>> Cc: Jingwen Chen mailto:jingwen.ch...@amd.com
>>> Cc: Victor Skvortsov mailto:victor.skvort...@amd.com
>>> Cc: Zhigang Luo m

[PATCH 1/2] drm/amdgpu/mes11: allocate hw_resource_1 buffer once

2025-02-14 Thread Alex Deucher

Allocate the buffer at sw init time so we don't alloc
and free it for every suspend/resume or reset cycle.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 52 +-
 1 file changed, 26 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 26af0af718b5e..530371e6a7aee 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -64,6 +64,7 @@ static int mes_v11_0_kiq_hw_fini(struct amdgpu_device *adev);
 
 #define MES_EOP_SIZE   2048
 #define GFX_MES_DRAM_SIZE  0x8
+#define MES11_HW_RESOURCE_1_SIZE (128 * AMDGPU_GPU_PAGE_SIZE)
 
 static void mes_v11_0_ring_set_wptr(struct amdgpu_ring *ring)
 {
@@ -743,11 +744,6 @@ static int mes_v11_0_set_hw_resources(struct amdgpu_mes 
*mes)
 
 static int mes_v11_0_set_hw_resources_1(struct amdgpu_mes *mes)
 {
-   unsigned int hw_rsrc_size = 128 * AMDGPU_GPU_PAGE_SIZE;
-   /* add a page for the cleaner shader fence */
-   unsigned int alloc_size = hw_rsrc_size + AMDGPU_GPU_PAGE_SIZE;
-   int ret = 0;
-   struct amdgpu_device *adev = mes->adev;
union MESAPI_SET_HW_RESOURCES_1 mes_set_hw_res_pkt;
memset(&mes_set_hw_res_pkt, 0, sizeof(mes_set_hw_res_pkt));
 
@@ -755,21 +751,10 @@ static int mes_v11_0_set_hw_resources_1(struct amdgpu_mes 
*mes)
mes_set_hw_res_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
mes_set_hw_res_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
mes_set_hw_res_pkt.enable_mes_info_ctx = 1;
-
-   ret = amdgpu_bo_create_kernel(adev, alloc_size, PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   &mes->resource_1,
-   &mes->resource_1_gpu_addr,
-   &mes->resource_1_addr);
-   if (ret) {
-   dev_err(adev->dev, "(%d) failed to create mes resource_1 bo\n", 
ret);
-   return ret;
-   }
-
mes_set_hw_res_pkt.mes_info_ctx_mc_addr = mes->resource_1_gpu_addr;
-   mes_set_hw_res_pkt.mes_info_ctx_size = hw_rsrc_size;
+   mes_set_hw_res_pkt.mes_info_ctx_size = MES11_HW_RESOURCE_1_SIZE;
mes_set_hw_res_pkt.cleaner_shader_fence_mc_addr =
-   mes->resource_1_gpu_addr + hw_rsrc_size;
+   mes->resource_1_gpu_addr + MES11_HW_RESOURCE_1_SIZE;
 
return mes_v11_0_submit_pkt_and_poll_completion(mes,
&mes_set_hw_res_pkt, sizeof(mes_set_hw_res_pkt),
@@ -1442,6 +1427,23 @@ static int mes_v11_0_sw_init(struct amdgpu_ip_block 
*ip_block)
if (r)
return r;
 
+   if (amdgpu_sriov_is_mes_info_enable(adev) ||
+   adev->gfx.enable_cleaner_shader) {
+   int ret;
+
+   ret = amdgpu_bo_create_kernel(adev,
+ MES11_HW_RESOURCE_1_SIZE + 
AMDGPU_GPU_PAGE_SIZE,
+ PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_VRAM,
+ &adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+   if (ret) {
+   dev_err(adev->dev, "(%d) failed to create mes 
resource_1 bo\n", ret);
+   return ret;
+   }
+   }
+
return 0;
 }
 
@@ -1450,6 +1452,12 @@ static int mes_v11_0_sw_fini(struct amdgpu_ip_block 
*ip_block)
struct amdgpu_device *adev = ip_block->adev;
int pipe;
 
+   if (amdgpu_sriov_is_mes_info_enable(adev) ||
+   adev->gfx.enable_cleaner_shader) {
+   amdgpu_bo_free_kernel(&adev->mes.resource_1, 
&adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+   }
+
for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
kfree(adev->mes.mqd_backup[pipe]);
 
@@ -1670,14 +1678,6 @@ static int mes_v11_0_hw_init(struct amdgpu_ip_block 
*ip_block)
 
 static int mes_v11_0_hw_fini(struct amdgpu_ip_block *ip_block)
 {
-   struct amdgpu_device *adev = ip_block->adev;
-
-   if (amdgpu_sriov_is_mes_info_enable(adev) ||
-   adev->gfx.enable_cleaner_shader) {
-   amdgpu_bo_free_kernel(&adev->mes.resource_1, 
&adev->mes.resource_1_gpu_addr,
- &adev->mes.resource_1_addr);
-   }
-
return 0;
 }
 
-- 
2.48.1

[PATCH 2/2] drm/amdgpu/mes12: allocate hw_resource_1 buffer once

2025-02-14 Thread Alex Deucher

Allocate the buffer at sw init time so we don't alloc
and free it for every suspend/resume or reset cycle.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 39 +-
 1 file changed, 19 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
index 8dbab3834d82d..6db88584dd529 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v12_0.c
@@ -678,9 +678,6 @@ static int mes_v12_0_misc_op(struct amdgpu_mes *mes,
 
 static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes *mes, int pipe)
 {
-   unsigned int alloc_size = AMDGPU_GPU_PAGE_SIZE;
-   int ret = 0;
-   struct amdgpu_device *adev = mes->adev;
union MESAPI_SET_HW_RESOURCES_1 mes_set_hw_res_1_pkt;
 
memset(&mes_set_hw_res_1_pkt, 0, sizeof(mes_set_hw_res_1_pkt));
@@ -689,17 +686,6 @@ static int mes_v12_0_set_hw_resources_1(struct amdgpu_mes 
*mes, int pipe)
mes_set_hw_res_1_pkt.header.opcode = MES_SCH_API_SET_HW_RSRC_1;
mes_set_hw_res_1_pkt.header.dwsize = API_FRAME_SIZE_IN_DWORDS;
mes_set_hw_res_1_pkt.mes_kiq_unmap_timeout = 0xa;
-
-   ret = amdgpu_bo_create_kernel(adev, alloc_size, PAGE_SIZE,
-   AMDGPU_GEM_DOMAIN_VRAM,
-   &mes->resource_1,
-   &mes->resource_1_gpu_addr,
-   &mes->resource_1_addr);
-   if (ret) {
-   dev_err(adev->dev, "(%d) failed to create mes resource_1 bo\n", 
ret);
-   return ret;
-   }
-
mes_set_hw_res_1_pkt.cleaner_shader_fence_mc_addr =
mes->resource_1_gpu_addr;
 
@@ -1550,6 +1536,20 @@ static int mes_v12_0_sw_init(struct amdgpu_ip_block 
*ip_block)
return r;
}
 
+   if (adev->enable_uni_mes) {
+   int ret;
+
+   ret = amdgpu_bo_create_kernel(adev, AMDGPU_GPU_PAGE_SIZE, 
PAGE_SIZE,
+ AMDGPU_GEM_DOMAIN_VRAM,
+ &adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+   if (ret) {
+   dev_err(adev->dev, "(%d) failed to create mes 
resource_1 bo\n", ret);
+   return ret;
+   }
+   }
+
return 0;
 }
 
@@ -1558,6 +1558,11 @@ static int mes_v12_0_sw_fini(struct amdgpu_ip_block 
*ip_block)
struct amdgpu_device *adev = ip_block->adev;
int pipe;
 
+   if (adev->enable_uni_mes)
+   amdgpu_bo_free_kernel(&adev->mes.resource_1,
+ &adev->mes.resource_1_gpu_addr,
+ &adev->mes.resource_1_addr);
+
for (pipe = 0; pipe < AMDGPU_MAX_MES_PIPES; pipe++) {
kfree(adev->mes.mqd_backup[pipe]);
 
@@ -1786,12 +1791,6 @@ static int mes_v12_0_hw_init(struct amdgpu_ip_block 
*ip_block)
 
 static int mes_v12_0_hw_fini(struct amdgpu_ip_block *ip_block)
 {
-   struct amdgpu_device *adev = ip_block->adev;
-
-   if (adev->enable_uni_mes)
-   amdgpu_bo_free_kernel(&adev->mes.resource_1,
- &adev->mes.resource_1_gpu_addr,
- &adev->mes.resource_1_addr);
return 0;
 }
 
-- 
2.48.1

Re: [PATCH] drm/amdgpu: simplify xgmi peer info calls

2025-02-14 Thread Lazar, Lijo

[Public]

For minimum bandwidth, we should keep the possibility of going to FW to get the 
data when XGMI DPM is in place. So it is all wrapped inside the API when the 
devices passed are connected. The caller doesn't need to know.

BTW, what is the real requirement of bandwidth data without any peer device? In 
what way that is useful?

Thanks,
Lijo

From: Kim, Jonathan 
Sent: Friday, February 14, 2025 8:27:28 PM
To: Lazar, Lijo ; amd-gfx@lists.freedesktop.org 

Subject: RE: [PATCH] drm/amdgpu: simplify xgmi peer info calls

[Public]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Friday, February 14, 2025 12:58 AM
> To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: simplify xgmi peer info calls
>
>
>
> On 2/13/2025 9:20 PM, Kim, Jonathan wrote:
> > [Public]
> >
> >> -Original Message-
> >> From: Lazar, Lijo 
> >> Sent: Thursday, February 13, 2025 1:35 AM
> >> To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org
> >> Subject: Re: [PATCH] drm/amdgpu: simplify xgmi peer info calls
> >>
> >>
> >>
> >> On 2/12/2025 9:27 PM, Jonathan Kim wrote:
> >>> Deprecate KFD XGMI peer info calls in favour of calling directly from
> >>> simplified XGMI peer info functions.
> >>>
> >>> Signed-off-by: Jonathan Kim 
> >>> ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 42 --
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  5 ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 51 +-
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  6 +--
> >>>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c  | 11 +++--
> >>>  5 files changed, 48 insertions(+), 67 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >>> index 0312231b703e..4cec3a873995 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >>> @@ -555,48 +555,6 @@ int amdgpu_amdkfd_get_dmabuf_info(struct
> >> amdgpu_device *adev, int dma_buf_fd,
> >>> return r;
> >>>  }
> >>>
> >>> -uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct amdgpu_device *dst,
> >>> - struct amdgpu_device *src)
> >>> -{
> >>> -   struct amdgpu_device *peer_adev = src;
> >>> -   struct amdgpu_device *adev = dst;
> >>> -   int ret = amdgpu_xgmi_get_hops_count(adev, peer_adev);
> >>> -
> >>> -   if (ret < 0) {
> >>> -   DRM_ERROR("amdgpu: failed to get  xgmi hops count between
> >> node %d and %d. ret = %d\n",
> >>> -   adev->gmc.xgmi.physical_node_id,
> >>> -   peer_adev->gmc.xgmi.physical_node_id, ret);
> >>> -   ret = 0;
> >>> -   }
> >>> -   return  (uint8_t)ret;
> >>> -}
> >>> -
> >>> -int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
> >>> -   struct amdgpu_device *src,
> >>> -   bool is_min)
> >>> -{
> >>> -   struct amdgpu_device *adev = dst, *peer_adev;
> >>> -   int num_links;
> >>> -
> >>> -   if (amdgpu_ip_version(adev, GC_HWIP, 0) < IP_VERSION(9, 4, 2))
> >>> -   return 0;
> >>> -
> >>> -   if (src)
> >>> -   peer_adev = src;
> >>> -
> >>> -   /* num links returns 0 for indirect peers since indirect route is 
> >>> unknown. */
> >>> -   num_links = is_min ? 1 : amdgpu_xgmi_get_num_links(adev, peer_adev);
> >>> -   if (num_links < 0) {
> >>> -   DRM_ERROR("amdgpu: failed to get xgmi num links between
> >> node %d and %d. ret = %d\n",
> >>> -   adev->gmc.xgmi.physical_node_id,
> >>> -   peer_adev->gmc.xgmi.physical_node_id, num_links);
> >>> -   num_links = 0;
> >>> -   }
> >>> -
> >>> -   /* Aldebaran xGMI DPM is defeatured so assume x16 x 25Gbps for
> >> bandwidth. */
> >>> -   return (num_links * 16 * 25000)/BITS_PER_BYTE;
> >>> -}
> >>> -
> >>>  int amdgpu_amdkfd_get_pcie_bandwidth_mbytes(struct amdgpu_device *adev,
> >> bool is_min)
> >>>  {
> >>> int num_lanes_shift = (is_min ? ffs(adev->pm.pcie_mlw_mask) :
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> >>> index 092dbd8bec97..28eb1cd0eb5a 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
> >>> @@ -255,11 +255,6 @@ int amdgpu_amdkfd_get_dmabuf_info(struct
> >> amdgpu_device *adev, int dma_buf_fd,
> >>>   uint64_t *bo_size, void *metadata_buffer,
> >>>   size_t buffer_size, uint32_t *metadata_size,
> >>>   uint32_t *flags, int8_t *xcp_id);
> >>> -uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct amdgpu_device *dst,
> >>> - struct amdgpu_device *src);
> >>> -int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
> >>> -   stru

RE: [PATCH] drm/amdgpu: simplify xgmi peer info calls

2025-02-14 Thread Kim, Jonathan

[Public]

We could be talking about 2 types of bandwidth here.

  1.  Bandwidth per link
  2.  Bandwidth per peer i.e. multiple xgmi links that are used for SDMA gang 
submissions for effective max bandwidth * num_link copy speed.  The is 
currently used by runtime i.e. max divide by min.  The number of links per peer 
can be variable.

The peerless request is requesting for #1 because there should be no speed 
variability of links based on peer i.e. requesting max bandwidth per link for 1 
link.

The interface could look like amdgpu_xgmi_get_bandwidth(adev, peer, enum 
unit_type, int *min, int *max) then.
Unit_type could be defined for illustration:
#define AMDGPU_XGMI_BW_MBYTES_MIN_MAX_PER_LINK 0
#define AMDGPU_XGMI_BW_MBYTES_MIN_MAX_PER_PEER 1

Where if unit_type == AMDGPU_XGMI_BW_*_MIN_MAX_PER_LINK, call would ignore peer 
and populate *min/max with per link min/max (keeps it open for powerplay range 
per link)
While unit_type  == AMDGPU_XGMI_BW_*_MIN_MAX_PER_PEER, call would populate 
*min/max with per peer, where min/max is max_bw * num_link range.

Jon

From: Lazar, Lijo 
Sent: Friday, February 14, 2025 10:39 AM
To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH] drm/amdgpu: simplify xgmi peer info calls


[Public]

For minimum bandwidth, we should keep the possibility of going to FW to get the 
data when XGMI DPM is in place. So it is all wrapped inside the API when the 
devices passed are connected. The caller doesn't need to know.

BTW, what is the real requirement of bandwidth data without any peer device? In 
what way that is useful?

Thanks,
Lijo

From: Kim, Jonathan mailto:jonathan@amd.com>>
Sent: Friday, February 14, 2025 8:27:28 PM
To: Lazar, Lijo mailto:lijo.la...@amd.com>>; 
amd-gfx@lists.freedesktop.org 
mailto:amd-gfx@lists.freedesktop.org>>
Subject: RE: [PATCH] drm/amdgpu: simplify xgmi peer info calls

[Public]

> -Original Message-
> From: Lazar, Lijo mailto:lijo.la...@amd.com>>
> Sent: Friday, February 14, 2025 12:58 AM
> To: Kim, Jonathan mailto:jonathan@amd.com>>; 
> amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdgpu: simplify xgmi peer info calls
>
>
>
> On 2/13/2025 9:20 PM, Kim, Jonathan wrote:
> > [Public]
> >
> >> -Original Message-
> >> From: Lazar, Lijo mailto:lijo.la...@amd.com>>
> >> Sent: Thursday, February 13, 2025 1:35 AM
> >> To: Kim, Jonathan mailto:jonathan@amd.com>>; 
> >> amd-gfx@lists.freedesktop.org
> >> Subject: Re: [PATCH] drm/amdgpu: simplify xgmi peer info calls
> >>
> >>
> >>
> >> On 2/12/2025 9:27 PM, Jonathan Kim wrote:
> >>> Deprecate KFD XGMI peer info calls in favour of calling directly from
> >>> simplified XGMI peer info functions.
> >>>
> >>> Signed-off-by: Jonathan Kim 
> >>> mailto:jonathan@amd.com>>
> >>> ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c | 42 --
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h |  5 ---
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c   | 51 +-
> >>>  drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.h   |  6 +--
> >>>  drivers/gpu/drm/amd/amdkfd/kfd_crat.c  | 11 +++--
> >>>  5 files changed, 48 insertions(+), 67 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >> b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >>> index 0312231b703e..4cec3a873995 100644
> >>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.c
> >>> @@ -555,48 +555,6 @@ int amdgpu_amdkfd_get_dmabuf_info(struct
> >> amdgpu_device *adev, int dma_buf_fd,
> >>> return r;
> >>>  }
> >>>
> >>> -uint8_t amdgpu_amdkfd_get_xgmi_hops_count(struct amdgpu_device *dst,
> >>> - struct amdgpu_device *src)
> >>> -{
> >>> -   struct amdgpu_device *peer_adev = src;
> >>> -   struct amdgpu_device *adev = dst;
> >>> -   int ret = amdgpu_xgmi_get_hops_count(adev, peer_adev);
> >>> -
> >>> -   if (ret < 0) {
> >>> -   DRM_ERROR("amdgpu: failed to get  xgmi hops count between
> >> node %d and %d. ret = %d\n",
> >>> -   adev->gmc.xgmi.physical_node_id,
> >>> -   peer_adev->gmc.xgmi.physical_node_id, ret);
> >>> -   ret = 0;
> >>> -   }
> >>> -   return  (uint8_t)ret;
> >>> -}
> >>> -
> >>> -int amdgpu_amdkfd_get_xgmi_bandwidth_mbytes(struct amdgpu_device *dst,
> >>> -   struct amdgpu_device *src,
> >>> -   bool is_min)
> >>> -{
> >>> -   struct amdgpu_device *adev = dst, *peer_adev;
> >>> -   int num_links;
> >>> -
> >>> -   if (amdgpu_ip_version(adev, GC_HWIP, 0) < IP_VERSION(9, 4, 2))
> >>> -   return 0;
> >>> -
> >>> -   if (src)
> >>> -   peer_adev = src;
> >>> -
> >>> -   /* num links returns 0 for indirect peers since indirect route is 
> >>> unknown. */
> >>> -   num_l

Re: [PATCH] drm/amd/display: Disable -Wenum-float-conversion for dml2_dpmm_dcn4.c

2025-02-14 Thread Nathan Chancellor

On Thu, Dec 19, 2024 at 05:21:41PM -0500, Alex Deucher wrote:
> On Thu, Dec 19, 2024 at 12:23 PM Nathan Chancellor  wrote:
> >
> > Commit be4e3509314a ("drm/amd/display: DML21 Reintegration For Various
> > Fixes") blew away commit fdedd77b0eb3 ("drm/amd/display: Reapply
> > 2fde4fdddc1f"), which itself was a reapplication for the same reason,
> > which results in that compiler warning returning:
> >
> >   
> > drivers/gpu/drm/amd/amdgpu/../display/dc/dml2/dml21/src/dml2_dpmm/dml2_dpmm_dcn4.c:215:58:
> >  error: arithmetic between enumeration type 'enum dentist_divider_range' 
> > and floating-point type 'double' [-Werror,-Wenum-float-conversion]
> > 215 | divider = (unsigned int)(DFS_DIVIDER_RANGE_SCALE_FACTOR * 
> > (vco_freq_khz / clock_khz));
> > |  ~~ ^ 
> > ~~
> >
> > Just disable the warning for the whole file via Makefile to avoid having
> > to reapply the same fix every time the code syncs from wherever it is
> > actually maintained.
> >
> > Fixes: be4e3509314a ("drm/amd/display: DML21 Reintegration For Various 
> > Fixes")
> > Signed-off-by: Nathan Chancellor 
> > ---
> > If you would prefer reapplying the local fix, feel free to do so, but I
> > would like for it to be in the upstream source so it does not have to
> > keep being applied.
> 
> I've reapplied the original fix and I've confirmed that the fix will
> be pushed to the DML tree as well this time.

Did that actually end up happening? Commit 1b30456150e5
("drm/amd/display: DML21 Reintegration") in next-20250214 reintroduces
this warning... I guess it may be a timing thing because the author date
is three weeks ago or so. Should I send my "Reapply" patch or will you
take care of it?

Cheers,
Nathan

Re: [PATCH 1/3] drm/amdgpu: Do not program AGP BAR regs under SRIOV

2025-02-14 Thread Deucher, Alexander

[Public]

Are there any cases where the asic_type check would cause this register to fail 
to get programmed?

Alex


From: amd-gfx  on behalf of Victor Lu 

Sent: Thursday, February 13, 2025 7:13 PM
To: amd-gfx@lists.freedesktop.org 
Cc: Lu, Victor Cheng Chi (Victor) ; 
hoarce.c...@amd.com 
Subject: [PATCH 1/3] drm/amdgpu: Do not program AGP BAR regs under SRIOV

SRIOV VF does not have write access to AGP BAR regs.
Skip the writes to avoid a dmesg warning.

Signed-off-by: Victor Lu 
---
 drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
index 0e3ddea7b8e0..a7bfc9f41d0e 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c
@@ -92,12 +92,12 @@ static void gfxhub_v1_0_init_system_aperture_regs(struct 
amdgpu_device *adev)
 {
 uint64_t value;

-   /* Program the AGP BAR */
-   WREG32_SOC15_RLC(GC, 0, mmMC_VM_AGP_BASE, 0);
-   WREG32_SOC15_RLC(GC, 0, mmMC_VM_AGP_BOT, adev->gmc.agp_start >> 24);
-   WREG32_SOC15_RLC(GC, 0, mmMC_VM_AGP_TOP, adev->gmc.agp_end >> 24);
-
 if (!amdgpu_sriov_vf(adev) || adev->asic_type <= CHIP_VEGA10) {
+   /* Program the AGP BAR */
+   WREG32_SOC15_RLC(GC, 0, mmMC_VM_AGP_BASE, 0);
+   WREG32_SOC15_RLC(GC, 0, mmMC_VM_AGP_BOT, adev->gmc.agp_start >> 
24);
+   WREG32_SOC15_RLC(GC, 0, mmMC_VM_AGP_TOP, adev->gmc.agp_end >> 
24);
+
 /* Program the system aperture low logical page number. */
 WREG32_SOC15_RLC(GC, 0, mmMC_VM_SYSTEM_APERTURE_LOW_ADDR,
 min(adev->gmc.fb_start, adev->gmc.agp_start) >> 18);
--
2.34.1

Re: [PATCH] drm/amdgpu/mes: keep enforce isolation up to date

2025-02-14 Thread SRINIVASAN SHANMUGAM



On 2/14/2025 11:05 PM, Alex Deucher wrote:

Re-send the mes message on resume to make sure the
mes state is up to date.

Fixes: 8521e3c5f058 ("drm/amd/amdgpu: limit single process inside MES")
Signed-off-by: Alex Deucher
Cc: Shaoyun Liu
Cc: Srinivasan Shanmugam
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 13 -
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c | 20 +++-
  drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h |  2 +-
  drivers/gpu/drm/amd/amdgpu/mes_v11_0.c  |  4 
  drivers/gpu/drm/amd/amdgpu/mes_v12_0.c  |  4 
  5 files changed, 32 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index b9bd6654f3172..a194bf3347cbc 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -1665,24 +1665,19 @@ static ssize_t amdgpu_gfx_set_enforce_isolation(struct 
device *dev,
}
  
  	mutex_lock(&adev->enforce_isolation_mutex);

-
for (i = 0; i < num_partitions; i++) {
-   if (adev->enforce_isolation[i] && !partition_values[i]) {
+   if (adev->enforce_isolation[i] && !partition_values[i])
/* Going from enabled to disabled */
amdgpu_vmid_free_reserved(adev, AMDGPU_GFXHUB(i));
-   if (adev->enable_mes && adev->gfx.enable_cleaner_shader)
-   amdgpu_mes_set_enforce_isolation(adev, i, 
false);
-   } else if (!adev->enforce_isolation[i] && partition_values[i]) {
+   else if (!adev->enforce_isolation[i] && partition_values[i])
/* Going from disabled to enabled */
amdgpu_vmid_alloc_reserved(adev, AMDGPU_GFXHUB(i));
-   if (adev->enable_mes && adev->gfx.enable_cleaner_shader)
-   amdgpu_mes_set_enforce_isolation(adev, i, true);
-   }
adev->enforce_isolation[i] = partition_values[i];
}
-
mutex_unlock(&adev->enforce_isolation_mutex);
  
+	amdgpu_mes_update_enforce_isolation(adev);

+
return count;
  }
  
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c

index cee38bb6cfaf2..ca076306adba4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c
@@ -1508,7 +1508,8 @@ bool amdgpu_mes_suspend_resume_all_supported(struct 
amdgpu_device *adev)
  }
  
  /* Fix me -- node_id is used to identify the correct MES instances in the future */

-int amdgpu_mes_set_enforce_isolation(struct amdgpu_device *adev, uint32_t 
node_id, bool enable)
+static int amdgpu_mes_set_enforce_isolation(struct amdgpu_device *adev,
+   uint32_t node_id, bool enable)
  {
struct mes_misc_op_input op_input = {0};
int r;
@@ -1530,6 +1531,23 @@ int amdgpu_mes_set_enforce_isolation(struct 
amdgpu_device *adev, uint32_t node_i
return r;
  }
  
+int amdgpu_mes_update_enforce_isolation(struct amdgpu_device *adev)

+{
+   int i, r = 0;
+
+   if (adev->enable_mes && adev->gfx.enable_cleaner_shader) {
+   mutex_lock(&adev->enforce_isolation_mutex);
+   for (i = 0; i < (adev->xcp_mgr ? adev->xcp_mgr->num_xcps : 1); 
i++) {
+   if (adev->enforce_isolation[i])
+   r |= amdgpu_mes_set_enforce_isolation(adev, i, 
true);
+   else
+   r |= amdgpu_mes_set_enforce_isolation(adev, i, 
false);
+   }
+   mutex_unlock(&adev->enforce_isolation_mutex);
+   }
+   return r;
+}
+
  #if defined(CONFIG_DEBUG_FS)
  
  static int amdgpu_debugfs_mes_event_log_show(struct seq_file *m, void *unused)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
index 6a792ffc81e33..3a65c3788956d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mes.h
@@ -532,6 +532,6 @@ static inline void amdgpu_mes_unlock(struct amdgpu_mes *mes)
  
  bool amdgpu_mes_suspend_resume_all_supported(struct amdgpu_device *adev);
  
-int amdgpu_mes_set_enforce_isolation(struct amdgpu_device *adev, uint32_t node_id, bool enable);

+int amdgpu_mes_update_enforce_isolation(struct amdgpu_device *adev);
  
  #endif /* __AMDGPU_MES_H__ */

diff --git a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
index 530371e6a7aee..fc7b17463cb4d 100644
--- a/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/mes_v11_0.c
@@ -1660,6 +1660,10 @@ static int mes_v11_0_hw_init(struct amdgpu_ip_block 
*ip_block)
goto failure;
}
  
+	r = amdgpu_mes_update_enforce_isolation(adev);

+   if (r)
+   goto failure;
+


Hi Alex,

Should this also be moved to mes_v11_0_hw_init. Please let me know your 
thoughts?



  out:

90 matches

Mail list logo