RE: [PATCH v3] drm/amdkfd: Fix bug in config_dequeue_wait_counts

2025-03-14 Thread Kim, Jonathan
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of Harish
> Kasiviswanathan
> Sent: Friday, March 14, 2025 8:44 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Kasiviswanathan, Harish 
> Subject: [PATCH v3] drm/amdkfd: Fix bug in config_dequeue_wait_counts
>
> For certain ASICs where dequeue_wait_count don't need to be initialized,
> pm_config_dequeue_wait_counts_v9 return without filling in the packet
> information. However, the calling function interprets this as a success
> and sends the uninitialized packet to firmware causing hang.
>
> Fix the above bug by not calling pm_config_dequeue_wait_counts_v9 for
> ASICs that don't need the value to be initialized.
>
> v2: Removed redudant code.
> Tidy up code based on review comments
> v3: Don't call pm_config_dequeue_wait_counts_v9 for certain ASICs
>
> Fixes: <98a5af8103f> ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")
>
> Signed-off-by: Harish Kasiviswanathan 

Nit-picks below.
With those addressed and as long as this has been tested on optimized and 
unoptimized HW:
Reviewed-by: Jonathan Kim 

> ---
>  .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 16 ++
>  .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 30 +++
>  2 files changed, 27 insertions(+), 19 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> index 3f574d82b5fc..8a47b7259a10 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> @@ -418,6 +418,10 @@ int pm_config_dequeue_wait_counts(struct
> packet_manager *pm,
>   !pm->pmf->config_dequeue_wait_counts_size)
>   return 0;
>
> + if (cmd == KFD_DEQUEUE_WAIT_INIT && (KFD_GC_VERSION(pm->dqm-
> >dev) < IP_VERSION(9, 4, 1) ||
> +KFD_GC_VERSION(pm->dqm->dev) >= IP_VERSION(10, 0, 0)))
> + return 0;
> +
>   size = pm->pmf->config_dequeue_wait_counts_size;
>
>   mutex_lock(&pm->lock);
> @@ -436,16 +440,16 @@ int pm_config_dequeue_wait_counts(struct
> packet_manager *pm,
>
>   retval = pm->pmf->config_dequeue_wait_counts(pm, buffer,
>cmd, value);
> - if (!retval)
> + if (!retval) {
>   retval = kq_submit_packet(pm->priv_queue);
> +
> + /* If default value is modified, cache that in 
> dqm->wait_times
> */
> + if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> + update_dqm_wait_times(pm->dqm);
> + }
>   else
>   kq_rollback_packet(pm->priv_queue);

Put else statement next to brace in line above and put braces after else 
statement to balance the if statement braces.

>   }
> -
> - /* If default value is modified, cache that value in dqm->wait_times */
> - if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> - update_dqm_wait_times(pm->dqm);
> -
>  out:
>   mutex_unlock(&pm->lock);
>   return retval;
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> index d440df602393..f059041902bc 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> @@ -310,6 +310,13 @@ static inline void
> pm_build_dequeue_wait_counts_packet_info(struct packet_manage
>   reg_data);
>  }
>
> +/* pm_config_dequeue_wait_counts_v9: Builds WRITE_DATA packet with
> + *register/value for configuring dequeue wait counts
> + *
> + * @return: -ve for failure and 0 for success and buffer is
> + *  filled in with packet
> + *
> + **/
>  static int pm_config_dequeue_wait_counts_v9(struct packet_manager *pm,
>   uint32_t *buffer,
>   enum kfd_config_dequeue_wait_counts_cmd cmd,
> @@ -321,24 +328,21 @@ static int pm_config_dequeue_wait_counts_v9(struct
> packet_manager *pm,
>
>   switch (cmd) {
>   case KFD_DEQUEUE_WAIT_INIT: {
> - uint32_t sch_wave = 0, que_sleep = 0;
> - /* Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default
> 0x40.
> + uint32_t sch_wave = 0, que_sleep = 1;

Do the following here (start of case WAIT_INIT) for safety to explicitly state 
certain devices are not permitted to do WAIT_INIT since you're unconditionally 
setting que_sleep to 1:
if (KFD_GC_VERSION(pm->dqm->dev) < IP_VERSION(9, 4, 1) ||
KFD_GC_VERSION(pm->dqm->dev) >= IP_VERSION(10, 0, 0))
return -EPERM;

Jon

> +
> + /* For all gfx9 ASICs > gfx941,
> +  * Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default
> 0x40.
>* On a 1GHz machine this is roughly 1 microsecond, which is
>* about how long it takes to load data out of memory during
>* queue connect
>* QUE_SLEEP: Wait Count for Dequeue Retry.
> +

Re: [PATCH V2 1/2] drm/amdgpu/gfx: fix ref counting for ring based profile handling

2025-03-14 Thread Alex Deucher
On Fri, Mar 14, 2025 at 6:31 AM Lazar, Lijo  wrote:
>
>
>
> On 3/12/2025 7:49 PM, Alex Deucher wrote:
> > We need to make sure the workload profile ref counts are
> > balanced.  This isn't currently the case because we can
> > increment the count on submissions, but the decrement may
> > be delayed as work comes in.  Track when we enable the
> > workload profile so the references are balanced.
> >
> > v2: switch to a mutex and active flag
> >
> > Fixes: 8fdb3958e396 ("drm/amdgpu/gfx: add ring helpers for setting workload 
> > profile")
> > Cc: Yang Wang 
> > Cc: Kenneth Feng 
> > Signed-off-by: Alex Deucher 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 30 -
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  2 ++
> >  2 files changed, 22 insertions(+), 10 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > index 984e6ff6e4632..099329d15b9ff 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > @@ -2160,11 +2160,16 @@ void amdgpu_gfx_profile_idle_work_handler(struct 
> > work_struct *work)
> >   for (i = 0; i < (AMDGPU_MAX_COMPUTE_RINGS * AMDGPU_MAX_GC_INSTANCES); 
> > ++i)
> >   fences += 
> > amdgpu_fence_count_emitted(&adev->gfx.compute_ring[i]);
> >   if (!fences && !atomic_read(&adev->gfx.total_submission_cnt)) {
> > - r = amdgpu_dpm_switch_power_profile(adev, profile, false);
> > - if (r)
> > - dev_warn(adev->dev, "(%d) failed to disable %s power 
> > profile mode\n", r,
> > -  profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D 
> > ?
> > -  "fullscreen 3D" : "compute");
> > + mutex_lock(&adev->gfx.workload_profile_mutex);
> > + if (adev->gfx.workload_profile_active) {
> > + r = amdgpu_dpm_switch_power_profile(adev, profile, 
> > false);
> > + if (r)
> > + dev_warn(adev->dev, "(%d) failed to disable 
> > %s power profile mode\n", r,
> > +  profile == 
> > PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> > +  "fullscreen 3D" : "compute");
> > + adev->gfx.workload_profile_active = false;
> > + }
> > + mutex_unlock(&adev->gfx.workload_profile_mutex);
> >   } else {
> >   schedule_delayed_work(&adev->gfx.idle_work, 
> > GFX_PROFILE_IDLE_TIMEOUT);
> >   }
> > @@ -2184,11 +2189,16 @@ void amdgpu_gfx_profile_ring_begin_use(struct 
> > amdgpu_ring *ring)
> >   atomic_inc(&adev->gfx.total_submission_cnt);
> >
> >   if (!cancel_delayed_work_sync(&adev->gfx.idle_work)) {
>
> I guess this has the same problem as mentioned in the earlier patch.
> AFAIU, this will switch profile only if no idle work is scheduled. If a
> begin_use call comes when idle_work is being executed, there is a chance
> that idle_work completes amdgpu_dpm_switch_power_profile(adev, profile,
> false). Then this would skip changing the profile.

I think that problem exists already independent of the ref counting.
I suppose there isn't actually a need to make the workload change
dependent on the cancelling of the delayed work.  I'll send out some
patches to address this.

Alex

>
> Thanks,
> Lijo
>
> > - r = amdgpu_dpm_switch_power_profile(adev, profile, true);
> > - if (r)
> > - dev_warn(adev->dev, "(%d) failed to disable %s power 
> > profile mode\n", r,
> > -  profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D 
> > ?
> > -  "fullscreen 3D" : "compute");
> > + mutex_lock(&adev->gfx.workload_profile_mutex);
> > + if (!adev->gfx.workload_profile_active) {
> > + r = amdgpu_dpm_switch_power_profile(adev, profile, 
> > true);
> > + if (r)
> > + dev_warn(adev->dev, "(%d) failed to disable 
> > %s power profile mode\n", r,
> > +  profile == 
> > PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> > +  "fullscreen 3D" : "compute");
> > + adev->gfx.workload_profile_active = true;
> > + }
> > + mutex_unlock(&adev->gfx.workload_profile_mutex);
> >   }
> >  }
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > index ddf4533614bac..75af4f25a133b 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> > @@ -483,6 +483,8 @@ struct amdgpu_gfx {
> >
> >   atomic_ttotal_submission_cnt;
> >   struct delayed_work idle_work;
> > + boolworkload_profile_active;
> > + struct mutexw

Re: [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers

2025-03-14 Thread Alex Deucher
On Fri, Mar 14, 2025 at 8:55 AM Dieter Faulbaum
 wrote:
>
>
> Hello Philip,
>
> Philip Yang  writes:
>
> > On 2025-02-12 17:42, Uwe Kleine-König wrote:
> >
> >  #regzbot introduced: 68e599db7a549f010a329515f3508d8a8c3467a4
> > #regzbot monitor: https://bugs.debian.org/1093124
> >
> > Hello,
> >
> > On Thu, Jul 18, 2024 at 05:05:53PM -0400, Philip Yang wrote:
> >
> >  Find user queue rptr, ring buf, eop buffer and cwsr area BOs,
> >  and
> > check BOs are mapped on the GPU with correct size and take the
> > BO
> > reference.
> >
> > Signed-off-by: Philip Yang 
> > Reviewed-by: Felix Kuehling 
> >
> >
> > This change made it into v6.12-rc1 as 68e599db7a54 ("drm/amdkfd:
> > Validate user queue buffers"). A Debian user (Dieter Faulbaum,
> > on Cc)
> > reported that this change introduced a regression using a gfx803
> > device
> > resulting in a HSA exception when e.g. darktable is used. I
> > didn't even
> > try to understand the problem, but maybe one of you have an idea
> > about
> > the issue?!
> >
> > Try this patch
> >
> > https://lore.kernel.org/all/2025013412.29812-1-philip.y...@amd.com/T/
>
> It seems for me, that your patch isn't applied in the mainline
> kernel.
> What do you think, will it once happen?-)
> Is it falling through cracks?

It's in drm-next:
https://gitlab.freedesktop.org/drm/kernel/-/commit/e7a477735f1771b9a9346a5fbd09d7ff0641723a

I'll cherry-pick it for stable next week.

Alex

>
>
>
> With regards
> Dieter


[PATCH] drm/amdkfd: set precise mem ops caps to disabled for gfx 11 and 12

2025-03-14 Thread Jonathan Kim
Clause instructions with precise memory enabled currently hang the
shader so set capabilities flag to disabled since it's unsafe to use
for debugging.

Signed-off-by: Jonathan Kim 
---
 drivers/gpu/drm/amd/amdkfd/kfd_topology.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
index 27e7356eed6f..e477d7509646 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_topology.c
@@ -2006,10 +2006,6 @@ static void kfd_topology_set_capabilities(struct 
kfd_topology_device *dev)
dev->node_props.debug_prop |= 
HSA_DBG_WATCH_ADDR_MASK_LO_BIT_GFX10 |
HSA_DBG_WATCH_ADDR_MASK_HI_BIT;
 
-   if (KFD_GC_VERSION(dev->gpu) >= IP_VERSION(11, 0, 0))
-   dev->node_props.capability |=
-   
HSA_CAP_TRAP_DEBUG_PRECISE_MEMORY_OPERATIONS_SUPPORTED;
-
if (KFD_GC_VERSION(dev->gpu) >= IP_VERSION(12, 0, 0))
dev->node_props.capability |=

HSA_CAP_TRAP_DEBUG_PRECISE_ALU_OPERATIONS_SUPPORTED;
-- 
2.34.1



RE: [PATCH v2] drm/amdkfd: Update return value of config_dequeue_wait_counts

2025-03-14 Thread Kasiviswanathan, Harish
[Public]

-Original Message-
From: Kim, Jonathan 
Sent: Friday, March 14, 2025 4:41 PM
To: Kasiviswanathan, Harish ; 
amd-gfx@lists.freedesktop.org
Cc: Kasiviswanathan, Harish 
Subject: RE: [PATCH v2] drm/amdkfd: Update return value of 
config_dequeue_wait_counts

[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of Harish
> Kasiviswanathan
> Sent: Friday, March 14, 2025 4:22 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Kasiviswanathan, Harish 
> Subject: [PATCH v2] drm/amdkfd: Update return value of
> config_dequeue_wait_counts
>
> .config_dequeue_wait_counts returns a nop case. Modify return parameter
> to reflect that since the caller also needs to ignore this condition.
>
> v2: Removed redudant code.
> Tidy up code based on review comments
>
> Fixes: <98a5af8103f> ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")
>
> Signed-off-by: Harish Kasiviswanathan 
> ---
>  .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 14 
>  .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 36 +++
>  2 files changed, 29 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> index 3f574d82b5fc..502b89639a9f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> @@ -436,19 +436,19 @@ int pm_config_dequeue_wait_counts(struct
> packet_manager *pm,
>
>   retval = pm->pmf->config_dequeue_wait_counts(pm, buffer,
>cmd, value);
> - if (!retval)
> + if (retval > 0) {
>   retval = kq_submit_packet(pm->priv_queue);
> +
> + /* If default value is modified, cache that in 
> dqm->wait_times
> */
> + if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> + update_dqm_wait_times(pm->dqm);
> + }
>   else
>   kq_rollback_packet(pm->priv_queue);
>   }
> -
> - /* If default value is modified, cache that value in dqm->wait_times */
> - if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> - update_dqm_wait_times(pm->dqm);
> -
>  out:
>   mutex_unlock(&pm->lock);
> - return retval;
> + return retval < 0 ? retval : 0;
>  }
>
>  int pm_send_unmap_queue(struct packet_manager *pm,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> index d440df602393..3c6134d61b2b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> @@ -310,6 +310,13 @@ static inline void
> pm_build_dequeue_wait_counts_packet_info(struct packet_manage
>   reg_data);
>  }
>
> +/* pm_config_dequeue_wait_counts_v9: Builds WRITE_DATA packet with
> + *register/value for configuring dequeue wait counts
> + *
> + * @return: -ve for failure, 0 for nop and +ve for success and buffer is
> + *  filled in with packet
> + *
> + **/
>  static int pm_config_dequeue_wait_counts_v9(struct packet_manager *pm,
>   uint32_t *buffer,
>   enum kfd_config_dequeue_wait_counts_cmd cmd,
> @@ -321,24 +328,25 @@ static int pm_config_dequeue_wait_counts_v9(struct
> packet_manager *pm,
>
>   switch (cmd) {
>   case KFD_DEQUEUE_WAIT_INIT: {
> - uint32_t sch_wave = 0, que_sleep = 0;
> - /* Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default
> 0x40.
> + uint32_t sch_wave = 0, que_sleep = 1;
> +
> + if (KFD_GC_VERSION(pm->dqm->dev) < IP_VERSION(9, 4, 1) ||
> + KFD_GC_VERSION(pm->dqm->dev) > IP_VERSION(10, 0, 0))
> + return 0;

>From my last comment, I suggested to put this at the beginning of the non-v9 
>pm_config_dequeue_wait_counts call that calls this function.
Then you don't have to make the return value more complicated than it currently 
is.

[HK]: Ah ok. I didn't really want to put asic specific code in there but in 
this case code it is fine as you mentioned we have already overloading these 
functions.

Also KFD_GC_VERSION(pm->dqm->dev) > IP_VERSION(10, 0, 0) is incorrect and 
should be >= because want to also exclude anything with a major version of 10.
[HK]: good catch

Jon

> +
> + /* For all other gfx9 ASICs,
> +  * Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default
> 0x40.
>* On a 1GHz machine this is roughly 1 microsecond, which is
>* about how long it takes to load data out of memory during
>* queue connect
>* QUE_SLEEP: Wait Count for Dequeue Retry.
> +  *
> +  * Set CWSR grace period to 1x1000 cycle for GFX9.4.3 APU
>*/
> - if (KFD_GC_VERSION(pm->dqm->dev) >= IP_VERSION(9, 4, 1) &&
> - KFD_GC_VERSION(pm->dqm->dev) <

[PATCH v2] drm/amdkfd: Update return value of config_dequeue_wait_counts

2025-03-14 Thread Harish Kasiviswanathan
.config_dequeue_wait_counts returns a nop case. Modify return parameter
to reflect that since the caller also needs to ignore this condition.

v2: Removed redudant code.
Tidy up code based on review comments

Fixes: <98a5af8103f> ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")

Signed-off-by: Harish Kasiviswanathan 
---
 .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 14 
 .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 36 +++
 2 files changed, 29 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 3f574d82b5fc..502b89639a9f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -436,19 +436,19 @@ int pm_config_dequeue_wait_counts(struct packet_manager 
*pm,
 
retval = pm->pmf->config_dequeue_wait_counts(pm, buffer,
 cmd, value);
-   if (!retval)
+   if (retval > 0) {
retval = kq_submit_packet(pm->priv_queue);
+
+   /* If default value is modified, cache that in 
dqm->wait_times */
+   if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
+   update_dqm_wait_times(pm->dqm);
+   }
else
kq_rollback_packet(pm->priv_queue);
}
-
-   /* If default value is modified, cache that value in dqm->wait_times */
-   if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
-   update_dqm_wait_times(pm->dqm);
-
 out:
mutex_unlock(&pm->lock);
-   return retval;
+   return retval < 0 ? retval : 0;
 }
 
 int pm_send_unmap_queue(struct packet_manager *pm,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index d440df602393..3c6134d61b2b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -310,6 +310,13 @@ static inline void 
pm_build_dequeue_wait_counts_packet_info(struct packet_manage
reg_data);
 }
 
+/* pm_config_dequeue_wait_counts_v9: Builds WRITE_DATA packet with
+ *register/value for configuring dequeue wait counts
+ *
+ * @return: -ve for failure, 0 for nop and +ve for success and buffer is
+ *  filled in with packet
+ *
+ **/
 static int pm_config_dequeue_wait_counts_v9(struct packet_manager *pm,
uint32_t *buffer,
enum kfd_config_dequeue_wait_counts_cmd cmd,
@@ -321,24 +328,25 @@ static int pm_config_dequeue_wait_counts_v9(struct 
packet_manager *pm,
 
switch (cmd) {
case KFD_DEQUEUE_WAIT_INIT: {
-   uint32_t sch_wave = 0, que_sleep = 0;
-   /* Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default 0x40.
+   uint32_t sch_wave = 0, que_sleep = 1;
+
+   if (KFD_GC_VERSION(pm->dqm->dev) < IP_VERSION(9, 4, 1) ||
+   KFD_GC_VERSION(pm->dqm->dev) > IP_VERSION(10, 0, 0))
+   return 0;
+
+   /* For all other gfx9 ASICs,
+* Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default 0x40.
 * On a 1GHz machine this is roughly 1 microsecond, which is
 * about how long it takes to load data out of memory during
 * queue connect
 * QUE_SLEEP: Wait Count for Dequeue Retry.
+*
+* Set CWSR grace period to 1x1000 cycle for GFX9.4.3 APU
 */
-   if (KFD_GC_VERSION(pm->dqm->dev) >= IP_VERSION(9, 4, 1) &&
-   KFD_GC_VERSION(pm->dqm->dev) < IP_VERSION(10, 0, 0)) {
-   que_sleep = 1;
-
-   /* Set CWSR grace period to 1x1000 cycle for GFX9.4.3 
APU */
-   if (amdgpu_emu_mode == 0 && 
pm->dqm->dev->adev->gmc.is_app_apu &&
-   (KFD_GC_VERSION(pm->dqm->dev) == IP_VERSION(9, 4, 3)))
-   sch_wave = 1;
-   } else {
-   return 0;
-   }
+   if (amdgpu_emu_mode == 0 && pm->dqm->dev->adev->gmc.is_app_apu 
&&
+   (KFD_GC_VERSION(pm->dqm->dev) == IP_VERSION(9, 4, 3)))
+   sch_wave = 1;
+
pm_build_dequeue_wait_counts_packet_info(pm, sch_wave, 
que_sleep,
®_offset, ®_data);
 
@@ -377,7 +385,7 @@ static int pm_config_dequeue_wait_counts_v9(struct 
packet_manager *pm,
 
packet->data = reg_data;
 
-   return 0;
+   return sizeof(struct pm4_mec_write_data_mmio);
 }
 
 static int pm_unmap_queues_v9(struct packet_manager *pm, uint32_t *buffer,
-- 
2.34.1



RE: [PATCH v2] drm/amdkfd: Update return value of config_dequeue_wait_counts

2025-03-14 Thread Kim, Jonathan
[Public]

> -Original Message-
> From: Kasiviswanathan, Harish 
> Sent: Friday, March 14, 2025 5:04 PM
> To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH v2] drm/amdkfd: Update return value of
> config_dequeue_wait_counts
>
> [Public]
>
> -Original Message-
> From: Kim, Jonathan 
> Sent: Friday, March 14, 2025 4:41 PM
> To: Kasiviswanathan, Harish ; amd-
> g...@lists.freedesktop.org
> Cc: Kasiviswanathan, Harish 
> Subject: RE: [PATCH v2] drm/amdkfd: Update return value of
> config_dequeue_wait_counts
>
> [Public]
>
> > -Original Message-
> > From: amd-gfx  On Behalf Of Harish
> > Kasiviswanathan
> > Sent: Friday, March 14, 2025 4:22 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Kasiviswanathan, Harish 
> > Subject: [PATCH v2] drm/amdkfd: Update return value of
> > config_dequeue_wait_counts
> >
> > .config_dequeue_wait_counts returns a nop case. Modify return parameter
> > to reflect that since the caller also needs to ignore this condition.
> >
> > v2: Removed redudant code.
> > Tidy up code based on review comments
> >
> > Fixes: <98a5af8103f> ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")
> >
> > Signed-off-by: Harish Kasiviswanathan 
> > ---
> >  .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 14 
> >  .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 36 +++
> >  2 files changed, 29 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> > index 3f574d82b5fc..502b89639a9f 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> > @@ -436,19 +436,19 @@ int pm_config_dequeue_wait_counts(struct
> > packet_manager *pm,
> >
> >   retval = pm->pmf->config_dequeue_wait_counts(pm, buffer,
> >cmd, value);
> > - if (!retval)
> > + if (retval > 0) {
> >   retval = kq_submit_packet(pm->priv_queue);
> > +
> > + /* If default value is modified, cache that in 
> > dqm->wait_times
> > */
> > + if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> > + update_dqm_wait_times(pm->dqm);
> > + }
> >   else
> >   kq_rollback_packet(pm->priv_queue);
> >   }
> > -
> > - /* If default value is modified, cache that value in dqm->wait_times 
> > */
> > - if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> > - update_dqm_wait_times(pm->dqm);
> > -
> >  out:
> >   mutex_unlock(&pm->lock);
> > - return retval;
> > + return retval < 0 ? retval : 0;
> >  }
> >
> >  int pm_send_unmap_queue(struct packet_manager *pm,
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> > index d440df602393..3c6134d61b2b 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> > @@ -310,6 +310,13 @@ static inline void
> > pm_build_dequeue_wait_counts_packet_info(struct packet_manage
> >   reg_data);
> >  }
> >
> > +/* pm_config_dequeue_wait_counts_v9: Builds WRITE_DATA packet with
> > + *register/value for configuring dequeue wait counts
> > + *
> > + * @return: -ve for failure, 0 for nop and +ve for success and buffer is
> > + *  filled in with packet
> > + *
> > + **/
> >  static int pm_config_dequeue_wait_counts_v9(struct packet_manager *pm,
> >   uint32_t *buffer,
> >   enum kfd_config_dequeue_wait_counts_cmd cmd,
> > @@ -321,24 +328,25 @@ static int pm_config_dequeue_wait_counts_v9(struct
> > packet_manager *pm,
> >
> >   switch (cmd) {
> >   case KFD_DEQUEUE_WAIT_INIT: {
> > - uint32_t sch_wave = 0, que_sleep = 0;
> > - /* Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default
> > 0x40.
> > + uint32_t sch_wave = 0, que_sleep = 1;
> > +
> > + if (KFD_GC_VERSION(pm->dqm->dev) < IP_VERSION(9, 4, 1) ||
> > + KFD_GC_VERSION(pm->dqm->dev) > IP_VERSION(10, 0, 0))
> > + return 0;
>
> From my last comment, I suggested to put this at the beginning of the non-v9
> pm_config_dequeue_wait_counts call that calls this function.
> Then you don't have to make the return value more complicated than it 
> currently is.
>
> [HK]: Ah ok. I didn't really want to put asic specific code in there but in 
> this case
> code it is fine as you mentioned we have already overloading these functions.

Right.  Which is why I also suggested that you could create a front loaded flag 
or mask if you didn't like this idea.

e.g. of a mask:
declare dqm->wait_times_override_mask in the kfd_device_queue_manager struct.

Do some defines in a header somewhere:
#define KFD_DEQUEUE_WAIT_SCH_WAVE_OVERRIDE_FLAG 0x1
#define K

Re: [PATCH 1/7] drm/amd/amdgpu: Simplify SDMA reset mechanism by removing dynamic callbacks

2025-03-14 Thread Alex Deucher
On Fri, Mar 14, 2025 at 10:43 AM Kim, Jonathan  wrote:
>
> [Public]
>
> > -Original Message-
> > From: Alex Deucher 
> > Sent: Thursday, March 13, 2025 10:38 AM
> > To: Zhang, Jesse(Jie) 
> > Cc: Koenig, Christian ; amd-
> > g...@lists.freedesktop.org; Deucher, Alexander ;
> > Kim, Jonathan ; Zhu, Jiadong
> > 
> > Subject: Re: [PATCH 1/7] drm/amd/amdgpu: Simplify SDMA reset mechanism by
> > removing dynamic callbacks
> >
> > I think as long as the locking is correct, the src shouldn't matter.
> > You just need to stop the kernel queues (and save state) and evict the
> > user queues (since HWS is responsible for saving the MQDs of the
> > non-guilty user queues).  If KFD detected the hang (e.g., queue
> > eviction fails when called for memory pressure, etc.), we just need to
> > make sure that it's ok for the sdma reset routine to call evict queues
> > again even if it was already called (presumably it should exit early
> > since the queues are already evicted).  If KGD initiates the reset, it
> > will call into KFD to evict queues.  We just need to make sure the
> > evict queues function we call just evicts the queues and doesn't also
> > try and reset.
>
> If we're removing the src parameter, we need to be careful we don't end up in 
> a double lock scenario in the KFD.
> i.e. kgd inits reset -> kfd detects hang on kgd reset trigger and calls back 
> to kgd -> amdgpu_amdkfd_suspend gets called again but is blocked on previous 
> suspend call from original kgd reset (which is holding a bunch of KFD locks) 
> while KFD is trying to clean up immediately.
>

How would this work even with the src parameter?  I think you could
still get into that case.  E.g., KGD detects a hang and initiates an
engine reset.  In parallel, KFD tries to unmap all queues for some
reason and detects a hang, so it tries to reset SDMA.  Assuming there
is a lock that protects SDMA reset, that should work.  However, it
requires that the prerequisites on each side don't attempt to reset
anything.

sdma_engine_reset()
{

KFD pre reset requirements
1. unmap all SDMA queues (CP firmware will save non-guilty state in MQDs)
2. detect guilty queue

KGD pre reset requirements:
1. stop relevant drm schedulers
2. detect guilt queue
3. save non-guilty queue state

Do engine reset

KGD post reset requirements:
1. restore non-guilty queue state
2. start relevant drm schedulers

KFD post reset requirements
1. map all non-guilty SDMA queues

}

I think what we need on the KFD side, if we don't have it already, is
a function to just umap queues and update guilty state, but not to
attempt to reset anything.  Then on the KFD side, in your normal
flows, you could call this function to unamp queues and update queue
state, and then after calling that walk through the queue state and
trigger any resets based on queues flagged as bad.  On the KFD side,
in a normal flow you will have called this unmap and update state
function and now you have a list of bad queues.  You can then initiate
an engine reset for the engine the bad queue is on.  This is safe
because you've already unmapped the queues, so if the unmap queues
function gets called again from the sdma reset function, it will
return early as the queues are already unmapped and marked guilty if
they are.  The engine will reset, the reset sdma reset function will
clean up the KGD side and then call the KFD map_queues().  Once it
returns you are done.  If KGD detects the hang, it will call the sdma
reset function and that will call the KFD unmap queues and update
state function.  This will update the KFD side state, but not initiate
any resets.  The engine will then be reset and then the KGD state will
be restored and finally it will call the KFD map_queues() function to
remap the non-guilty queues.  At completion both sides should be
functional again.  I'm not too familiar with the KFD unmap and reset
flows, but I think they will need to be decoupled if they are
currently intermixed.

Alex

}


> Safest way to remove the parameter seems like to move the KFD suspend/restore 
> outside of the common call and have KGD call suspend/resume when it's calling 
> the common call itself.
>



> Jon
>
> >
> > Alex
> >
> > On Wed, Mar 12, 2025 at 5:24 AM Zhang, Jesse(Jie) 
> > wrote:
> > >
> > > [AMD Official Use Only - AMD Internal Distribution Only]
> > >
> > >
> > >
> > >
> > >
> > >
> > > From: Koenig, Christian 
> > > Sent: Wednesday, March 12, 2025 4:39 PM
> > > To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org
> > > Cc: Deucher, Alexander ; Kim, Jonathan
> > ; Zhu, Jiadong 
> > > Subject: Re: [PATCH 1/7] drm/amd/amdgpu: Simplify SDMA reset mechanism by
> > removing dynamic callbacks
> > >
> > >
> > >
> > > Am 12.03.25 um 09:15 schrieb Zhang, Jesse(Jie):
> > >
> > > [SNIP9
> > >
> > > -
> > >
> > > + gfx_ring->funcs->stop_queue(adev, instance_id);
> > >
> > >
> > >
> > > Yeah that starts to look good. Question here is who is calling
> > amdgpu_sdma_reset_engine()?
> > >
> > >
> > >
> > > If thi

Re: commit 7ffb791423c7 breaks steam game

2025-03-14 Thread Bert Karwatzki
Am Freitag, dem 14.03.2025 um 08:54 +1100 schrieb Balbir Singh:
> On 3/14/25 05:12, Bert Karwatzki wrote:
> > Am Donnerstag, dem 13.03.2025 um 22:47 +1100 schrieb Balbir Singh:
> > >
> > >
> > > Anyway, I think the nokaslr result is interesting, it seems like with 
> > > nokaslr
> > > even the older kernels have problems with the game
> > >
> > > Could you confirm if with nokaslr
> > >
> > Now I've tested kernel 6.8.12 with nokaslr
> >
> > > 1. Only one single game stellaris is not working?
> > > 2. The entire laptop does not work?
> > > 3. Laptop works and other games work? Just one game is not working as
> > expected?
> >
> >
> > Stellaris is showing the input lag and the entire graphical user interface 
> > shows
> > the same input lag as long as stellaris is running.
> > Civilization 6 shows the same input lag as stellaris, probably even worse.
> > Magic the Gathering: Arena (with wine) works normally.
> > Valheim also works normally.
> > Crusader Kings 2 works normally
> > Rogue Heroes: Ruins of Tasos (a Zelda lookalike) works normally.
> > Baldur's Gate I & II and Icewind Dale work normally.
> >
> > Also the input lag is only in the GUI, if I switch to a text console (ctrl 
> > + alt
> > + Fn), input works normally even while the affected games are running.
> >
> > Games aside everything else (e.g. compiling kernels) seems to work with 
> > nokaslr.
> >
>
> Would it be fair to assume that anything Xorg/Wayland is working fine
> when the game is not running, even with nokaslr?
>
Yes, Xorg (I'm normally using xfce4 as desktop) works fine. I also tested with
gnome using Xwayland, here the buggy behaviour also exists, with the addtion
that mouse position is off, i.e. to click a button in the game you have to click
somewhat above it.

> +amd-gfx@lists.freedesktop.org to see if there are known issues with
> nokaslr and the games you mentioned.
>
>
> Balbir Singh
>
> PS: I came across an interesting link
> https://www.alex-ionescu.com/behind-windows-x64s-44-bit-memory-addressing-limit/
>
> I think SLIST_HEADER is used by wine as well for user space and I am not sure
> if in this situation the game is hitting this scenario, but surprisingly the 
> other
> games are not. This is assuming the game uses wine. I am not sure it's 
> related,
> but the 44 bits caught my attention.

Stellaris is a native linux game (x86_64), the one game (MTGA) I tested with
wine worked fine.

By the way, the warning
[ T8562] WARNING: CPU: 14 PID: 8562 at mm/slub.c:5028
__kvmalloc_node_noprof+0x2fd/0x360
that appeared in the dmesg I sent you is caused by the upgrade of mesa from
25.0.0 to 25.0.1. (I'm still bisecting ...)

Bert Karwatzki


Re: [PATCH V2 1/2] drm/amdgpu/gfx: fix ref counting for ring based profile handling

2025-03-14 Thread Lazar, Lijo



On 3/12/2025 7:49 PM, Alex Deucher wrote:
> We need to make sure the workload profile ref counts are
> balanced.  This isn't currently the case because we can
> increment the count on submissions, but the decrement may
> be delayed as work comes in.  Track when we enable the
> workload profile so the references are balanced.
> 
> v2: switch to a mutex and active flag
> 
> Fixes: 8fdb3958e396 ("drm/amdgpu/gfx: add ring helpers for setting workload 
> profile")
> Cc: Yang Wang 
> Cc: Kenneth Feng 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 30 -
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  2 ++
>  2 files changed, 22 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 984e6ff6e4632..099329d15b9ff 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -2160,11 +2160,16 @@ void amdgpu_gfx_profile_idle_work_handler(struct 
> work_struct *work)
>   for (i = 0; i < (AMDGPU_MAX_COMPUTE_RINGS * AMDGPU_MAX_GC_INSTANCES); 
> ++i)
>   fences += 
> amdgpu_fence_count_emitted(&adev->gfx.compute_ring[i]);
>   if (!fences && !atomic_read(&adev->gfx.total_submission_cnt)) {
> - r = amdgpu_dpm_switch_power_profile(adev, profile, false);
> - if (r)
> - dev_warn(adev->dev, "(%d) failed to disable %s power 
> profile mode\n", r,
> -  profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> -  "fullscreen 3D" : "compute");
> + mutex_lock(&adev->gfx.workload_profile_mutex);
> + if (adev->gfx.workload_profile_active) {
> + r = amdgpu_dpm_switch_power_profile(adev, profile, 
> false);
> + if (r)
> + dev_warn(adev->dev, "(%d) failed to disable %s 
> power profile mode\n", r,
> +  profile == 
> PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> +  "fullscreen 3D" : "compute");
> + adev->gfx.workload_profile_active = false;
> + }
> + mutex_unlock(&adev->gfx.workload_profile_mutex);
>   } else {
>   schedule_delayed_work(&adev->gfx.idle_work, 
> GFX_PROFILE_IDLE_TIMEOUT);
>   }
> @@ -2184,11 +2189,16 @@ void amdgpu_gfx_profile_ring_begin_use(struct 
> amdgpu_ring *ring)
>   atomic_inc(&adev->gfx.total_submission_cnt);
>  
>   if (!cancel_delayed_work_sync(&adev->gfx.idle_work)) {

I guess this has the same problem as mentioned in the earlier patch.
AFAIU, this will switch profile only if no idle work is scheduled. If a
begin_use call comes when idle_work is being executed, there is a chance
that idle_work completes amdgpu_dpm_switch_power_profile(adev, profile,
false). Then this would skip changing the profile.

Thanks,
Lijo

> - r = amdgpu_dpm_switch_power_profile(adev, profile, true);
> - if (r)
> - dev_warn(adev->dev, "(%d) failed to disable %s power 
> profile mode\n", r,
> -  profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> -  "fullscreen 3D" : "compute");
> + mutex_lock(&adev->gfx.workload_profile_mutex);
> + if (!adev->gfx.workload_profile_active) {
> + r = amdgpu_dpm_switch_power_profile(adev, profile, 
> true);
> + if (r)
> + dev_warn(adev->dev, "(%d) failed to disable %s 
> power profile mode\n", r,
> +  profile == 
> PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> +  "fullscreen 3D" : "compute");
> + adev->gfx.workload_profile_active = true;
> + }
> + mutex_unlock(&adev->gfx.workload_profile_mutex);
>   }
>  }
>  
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> index ddf4533614bac..75af4f25a133b 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
> @@ -483,6 +483,8 @@ struct amdgpu_gfx {
>  
>   atomic_ttotal_submission_cnt;
>   struct delayed_work idle_work;
> + boolworkload_profile_active;
> + struct mutexworkload_profile_mutex;
>  };
>  
>  struct amdgpu_gfx_ras_reg_entry {



[PATCH 1/2] drm/amdgpu/gfx: adjust workload profile handling

2025-03-14 Thread Alex Deucher
No need to make the workload profile setup dependent
on the results of cancelling the delayed work thread.
We have all of the necessary checking in place for the
workload profile reference counting, so separate the
two.  As it is now, we can theoretically end up with
the call from begin_use happening while the worker
thread is executing which would result in the profile
not getting set for that submission.  It should not
affect the reference counting.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 22 +++---
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 099329d15b9ff..20424f8c4925f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -2188,18 +2188,18 @@ void amdgpu_gfx_profile_ring_begin_use(struct 
amdgpu_ring *ring)
 
atomic_inc(&adev->gfx.total_submission_cnt);
 
-   if (!cancel_delayed_work_sync(&adev->gfx.idle_work)) {
-   mutex_lock(&adev->gfx.workload_profile_mutex);
-   if (!adev->gfx.workload_profile_active) {
-   r = amdgpu_dpm_switch_power_profile(adev, profile, 
true);
-   if (r)
-   dev_warn(adev->dev, "(%d) failed to disable %s 
power profile mode\n", r,
-profile == 
PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
-"fullscreen 3D" : "compute");
-   adev->gfx.workload_profile_active = true;
-   }
-   mutex_unlock(&adev->gfx.workload_profile_mutex);
+   cancel_delayed_work_sync(&adev->gfx.idle_work);
+
+   mutex_lock(&adev->gfx.workload_profile_mutex);
+   if (!adev->gfx.workload_profile_active) {
+   r = amdgpu_dpm_switch_power_profile(adev, profile, true);
+   if (r)
+   dev_warn(adev->dev, "(%d) failed to disable %s power 
profile mode\n", r,
+profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
+"fullscreen 3D" : "compute");
+   adev->gfx.workload_profile_active = true;
}
+   mutex_unlock(&adev->gfx.workload_profile_mutex);
 }
 
 void amdgpu_gfx_profile_ring_end_use(struct amdgpu_ring *ring)
-- 
2.48.1



Re: commit 7ffb791423c7 breaks steam game

2025-03-14 Thread Balbir Singh
On 3/15/25 01:18, Bert Karwatzki wrote:
> Am Samstag, dem 15.03.2025 um 00:34 +1100 schrieb Balbir Singh:
>> On 3/14/25 17:14, Balbir Singh wrote:
>>> On 3/14/25 09:22, Bert Karwatzki wrote:
 Am Freitag, dem 14.03.2025 um 08:54 +1100 schrieb Balbir Singh:
> On 3/14/25 05:12, Bert Karwatzki wrote:
>> Am Donnerstag, dem 13.03.2025 um 22:47 +1100 schrieb Balbir Singh:
>>>
>>>
>>> Anyway, I think the nokaslr result is interesting, it seems like with 
>>> nokaslr
>>> even the older kernels have problems with the game
>>>
>>> Could you confirm if with nokaslr
>>>
>> Now I've tested kernel 6.8.12 with nokaslr
>>
>>> 1. Only one single game stellaris is not working?
>>> 2. The entire laptop does not work?
>>> 3. Laptop works and other games work? Just one game is not working as
>> expected?
>>
>>
>> Stellaris is showing the input lag and the entire graphical user 
>> interface shows
>> the same input lag as long as stellaris is running.
>> Civilization 6 shows the same input lag as stellaris, probably even 
>> worse.
>> Magic the Gathering: Arena (with wine) works normally.
>> Valheim also works normally.
>> Crusader Kings 2 works normally
>> Rogue Heroes: Ruins of Tasos (a Zelda lookalike) works normally.
>> Baldur's Gate I & II and Icewind Dale work normally.
>>
>> Also the input lag is only in the GUI, if I switch to a text console 
>> (ctrl + alt
>> + Fn), input works normally even while the affected games are running.
>>
>> Games aside everything else (e.g. compiling kernels) seems to work with 
>> nokaslr.
>>
>
> Would it be fair to assume that anything Xorg/Wayland is working fine
> when the game is not running, even with nokaslr?
>
 Yes, Xorg (I'm normally using xfce4 as desktop) works fine. I also tested 
 with
 gnome using Xwayland, here the buggy behaviour also exists, with the 
 addtion
 that mouse position is off, i.e. to click a button in the game you have to 
 click
 somewhat above it.
>>>
>>> So the issue is narrowed down to just the games you've mentioned with 
>>> nokaslr/patch?
>>>

> +amd-gfx@lists.freedesktop.org to see if there are known issues with
> nokaslr and the games you mentioned.
>
>
> Balbir Singh
>
> PS: I came across an interesting link
> https://www.alex-ionescu.com/behind-windows-x64s-44-bit-memory-addressing-limit/
>
> I think SLIST_HEADER is used by wine as well for user space and I am not 
> sure
> if in this situation the game is hitting this scenario, but surprisingly 
> the other
> games are not. This is assuming the game uses wine. I am not sure it's 
> related,
> but the 44 bits caught my attention.

 Stellaris is a native linux game (x86_64), the one game (MTGA) I tested 
 with
 wine worked fine.

>>>
>>> Thanks for the update! I wonder if there are any game logs. If you can look 
>>> for any
>>> warnings/errors it might be interesting to see where the difference is 
>>> coming from?
>>>
>>
>> In addition to the above, does radeontop give you any info about what might 
>> be
>> going on? I am also curious if
> 
> This got me to test stellaris and Civ6 using the discrete Graphics card:
> 03:00.0 Display controller: Advanced Micro Devices, Inc. [AMD/ATI] Navi 23
> [Radeon RX 6600/6600 XT/6600M] (rev c3)
>  (with DRI_PRIME=1) and here the problems do not occur.
> 
> This is the CPU-builtin GPU, which I normally use to play stellaris as 
> graphics
> aren't very demanding, here the problems occur when using nokaslr
> 08:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI]
> Cezanne [Radeon Vega Series / Radeon Vega Mobile Series] (rev c5)
> 

Aaah.. this is strange, because the only movement in iomem was for the discrete 
GPU

>From your eariler message

 fee0-fee00fff : pnp 00:04
 ff00- : pnp 00:04
 1-fee2f : System RAM
-  3a7e0-3a89d2f56 : Kernel code
-  3a8a0-3a8e31fff : Kernel rodata
-  3a900-3a912a5ff : Kernel data
-  3a969c000-3a97f : Kernel bss
+  d3220-d32dd0f56 : Kernel code
+  d32e0-d33231fff : Kernel rodata
+  d3340-d3352a5ff : Kernel data
+  d33a9c000-d33bf : Kernel bss
 fee30-100fff : Reserved
 101000-ff : PCI Bus :00
   fc-fe0fff : PCI Bus :01
@@ -104,4 +104,4 @@
   fe3030-fe303f : :04:00.0
 fe3040-fe30403fff : :04:00.0
 fe30404000-fe30404fff : :04:00.0
-3ffe-3fff : :03:00.0
+afe-aff : :03:00.0

I am hoping someone from amd-gfx to chime in with known issues of the driver
and nokaslr or help debug further.

Balbir Singh




[pull] amdgpu, amdkfd, radeon drm-next-6.15

2025-03-14 Thread Alex Deucher
Hi Dave, Simona,

Updates for 6.15.

The following changes since commit 236f475d29f8e585a72fb6fac7f8bb4dc4b162b7:

  Merge tag 'amd-drm-next-6.15-2025-03-07' of 
https://gitlab.freedesktop.org/agd5f/linux into drm-next (2025-03-10 09:04:52 
+1000)

are available in the Git repository at:

  https://gitlab.freedesktop.org/agd5f/linux.git 
tags/amd-drm-next-6.15-2025-03-14

for you to fetch changes up to eb6cdfb807d038d9b9986b5c87188f28a4071eae:

  drm/amdgpu: Restore uncached behaviour on GFX12 (2025-03-13 23:18:02 -0400)


amd-drm-next-6.15-2025-03-14:

amdgpu:
- GC 12.x DCC fixes
- VCN 2.5 fix
- Replay/PSR fixes
- HPD fixes
- DMUB fixes
- Backlight fixes
- DM suspend/resume cleanup
- Misc DC fixes
- HDCP UAF fix
- Misc code cleanups
- VCE 2.x fix
- Wedged event support
- GC 12.x PTE fixes
- Misc multimedia cap fixes
- Enable unique id support for GC 12.x
- XGMI code cleanup
- GC 11.x and 12.x MQD cleanups
- SMU 13.x updates
- SMU 14.x fan speed reporting
- Enable VCN activity reporting for additional chips
- SR-IOV fixes
- RAS fixes
- MES fixes

amdkfd:
- Dequeue wait count API cleanups
- Queue eviction cleanup fixes
- Retry fault fixes
- Dequeue retry timeout adjustments
- GC 12.x trap handler fixes
- GC 9.5.x updates

radeon:
- VCE command parser fix


Alex Deucher (11):
  drm/amdgpu/vcn: fix idle work handler for VCN 2.5
  drm/amdgpu/vce2: fix ip block reference
  drm/amdgpu/gfx11: don't read registers in mqd init
  drm/amdgpu/gfx12: don't read registers in mqd init
  drm/amdgpu/pm: wire up hwmon fan speed for smu 14.0.2
  drm/amdgpu/pm: add VCN activity for renoir
  drm/amdgpu/pm: add VCN activity for SMU 13.0.0/7
  drm/amdgpu/pm: add VCN activity for SMU 14.0.2
  drm/amdgpu/pm: enable vcn busy sysfs for additional GC 11.x
  drm/amdgpu/pm: enable vcn busy sysfs for GC 12.x
  drm/amdgpu/pm: enable vcn busy sysfs for GC 9.3.0

Alex Hung (2):
  drm/amd/display: Assign normalized_pix_clk when color depth = 14
  drm/amd/display: Remove incorrect macro guard

Alexandre Demers (3):
  drm/amdgpu: prepare DCE6 uniformisation with DCE8 and DCE10
  drm/amdgpu: fix SI's GB_ADDR_CONFIG_GOLDEN values and wire up sid.h in 
GFX6
  drm/amdgpu: finish wiring up sid.h in DCE6

Amber Lin (1):
  drm/amdkfd: Correct F8_MODE for gfx950

André Almeida (1):
  drm/amdgpu: Trigger a wedged event for ring reset

Asad Kamal (1):
  drm/amd/pm: Update feature list for smu_v13_0_12

Charlene Liu (2):
  drm/amd/display: assume VBIOS supports DSC as default
  drm/amd/display: remove minimum Dispclk and apply oem panel timing.

Dan Carpenter (2):
  drm/amdgpu/gfx: delete stray tabs
  drm/amdkfd: delete stray tab in kfd_dbg_set_mes_debug_mode()

Danny Wang (1):
  drm/amd/display: Do not enable replay when vtotal update is pending.

David Belanger (1):
  drm/amdgpu: Restore uncached behaviour on GFX12

David Rosca (4):
  drm/amdgpu: Fix MPEG2, MPEG4 and VC1 video caps max size
  drm/amdgpu: Fix JPEG video caps max size for navi1x and raven
  drm/amdgpu: Remove JPEG from vega and carrizo video caps
  drm/amdgpu: Update SRIOV video codec caps

Dillon Varone (1):
  drm/amd/display: Add Support for reg inbox0 for host->DMUB CMDs

Emily Deng (3):
  drm/amdgpu: Fix the race condition for draining retry fault
  drm/amdgpu: Add amdgpu_sriov_multi_vf_mode function
  drm/amdgpu: set CP_HQD_PQ_DOORBELL_CONTROL.DOORBELL_MODE to 1 for sriov 
multiple vf.

Ethan Carter Edwards (4):
  drm/amd/display: change kzalloc to kcalloc in dcn30_validate_bandwidth()
  drm/amd/display: change kzalloc to kcalloc in dcn31_validate_bandwidth()
  drm/amd/display: change kzalloc to kcalloc in dcn314_validate_bandwidth()
  drm/amd/display: change kzalloc to kcalloc in dml1_validate()

George Shen (1):
  drm/amd/display: Implement PCON regulated autonomous mode handling

Harish Kasiviswanathan (3):
  drm/amdkfd: Add pm_config_dequeue_wait_counts API
  drm/amd/pm: add unique_id for gfx12
  drm/amdgpu: Reduce dequeue retry timeout for gfx9 family

Jay Cornwall (1):
  drm/amdkfd: Fix instruction hazard in gfx12 trap handler

Joshua Aberback (1):
  drm/amd/display: Add more debug data to dmub_srv

Leo Li (1):
  drm/amd/display: Disable unneeded hpd interrupts during dm_init

Leo Zeng (1):
  drm/amd/display: Fix visual confirm color not updating

Leon Huang (1):
  drm/amd/display: Fix incorrect DPCD configs while Replay/PSR switch

Lijo Lazar (2):
  drm/amdgpu: Remove unsupported xgmi versions
  drm/amdgpu: Calculate IP specific xgmi bandwidth

Marek Olšák (1):
  drm/amd/display: allow 256B DCC max compressed block sizes on gfx12

Mario Limonciello (6):
  drm/amd/display: fix default brightness
  drm/amd/display: Restore correct backlight brightness

RE: [PATCH] drm/amdkfd: Update return value of config_dequeue_wait_counts

2025-03-14 Thread Kim, Jonathan
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of Harish
> Kasiviswanathan
> Sent: Friday, March 14, 2025 12:17 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Kasiviswanathan, Harish 
> Subject: [PATCH] drm/amdkfd: Update return value of config_dequeue_wait_counts
>
> .config_dequeue_wait_counts returns a nop case. Modify return parameter
> to reflect that since the caller also needs to ignore this condition.
>
> Fixes: <98a5af8103f> ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")
>
> Signed-off-by: Harish Kasiviswanathan 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c| 11 ---
>  drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c |  9 -
>  2 files changed, 16 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> index 3f574d82b5fc..47de572741e7 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> @@ -436,19 +436,24 @@ int pm_config_dequeue_wait_counts(struct
> packet_manager *pm,
>
>   retval = pm->pmf->config_dequeue_wait_counts(pm, buffer,
>cmd, value);
> - if (!retval)
> + if (retval > 0) {
>   retval = kq_submit_packet(pm->priv_queue);
> +
> + /* If default value is modified, cache that in 
> dqm->wait_times
> */
> + if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> + update_dqm_wait_times(pm->dqm);
> + }
>   else
>   kq_rollback_packet(pm->priv_queue);
>   }
>
>   /* If default value is modified, cache that value in dqm->wait_times */
> - if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> + if (retval > 0 && cmd == KFD_DEQUEUE_WAIT_INIT)
>   update_dqm_wait_times(pm->dqm);

Why are we caching this twice?

>
>  out:
>   mutex_unlock(&pm->lock);
> - return retval;
> + return retval < 0 ? retval : 0;
>  }
>
>  int pm_send_unmap_queue(struct packet_manager *pm,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> index d440df602393..af3a18d81900 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> @@ -310,6 +310,13 @@ static inline void
> pm_build_dequeue_wait_counts_packet_info(struct packet_manage
>   reg_data);
>  }
>
> +/* pm_config_dequeue_wait_counts_v9: Builds WRITE_DATA packet with
> + *register/value for configuring dequeue wait counts
> + *
> + * @return: -ve for failure, 0 for nop and +ve for success and buffer is
> + *  filled in with packet
> + *
> + **/
>  static int pm_config_dequeue_wait_counts_v9(struct packet_manager *pm,
>   uint32_t *buffer,
>   enum kfd_config_dequeue_wait_counts_cmd cmd,
> @@ -377,7 +384,7 @@ static int pm_config_dequeue_wait_counts_v9(struct
> packet_manager *pm,
>
>   packet->data = reg_data;
>
> - return 0;
> + return sizeof(struct pm4_mec_write_data_mmio);

The return value handling is getting really complicated here.
Either return a unique error to handle NOP or pull some of the IP checker 
higher to early return sooner.
We're already overloading v9 with other asics so there's no harm in -> if 
(WAIT_INIT && ) return 0 in the abtracted 
wait_counts function that calls this.
You can always set a new dev->has_optimized_wait_count flag in an earlier init 
if you don't like defining IP checking there.

Jon

>  }
>
>  static int pm_unmap_queues_v9(struct packet_manager *pm, uint32_t *buffer,
> --
> 2.34.1



Re: [PATCH 3/3] drm/amdgpu/discovery: optionally use fw based ip discovery

2025-03-14 Thread Alex Deucher
On Fri, Mar 14, 2025 at 6:28 AM Lazar, Lijo  wrote:
>
>
>
> On 3/14/2025 3:24 PM, Flora Cui wrote:
> > From: Alex Deucher 
> >
> > On chips without native IP discovery support, use the fw binary
> > if available, otherwise we can continue without it.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 38 +++
> >  1 file changed, 30 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> > index fff438baf64b..cf286fde18d5 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> > @@ -2536,6 +2536,36 @@ int amdgpu_discovery_set_ip_blocks(struct 
> > amdgpu_device *adev)
> >  {
> >   int r;
> >
> > + switch (adev->asic_type) {
> > + case CHIP_VEGA10:
> > + case CHIP_VEGA12:
> > + case CHIP_RAVEN:
> > + case CHIP_VEGA20:
> > + case CHIP_ARCTURUS:
> > + case CHIP_ALDEBARAN:
> > + /* this is not fatal.  We have a fallback below
> > +  * if the new firmwares are not present.
> > +  */
> > + r = amdgpu_discovery_reg_base_init(adev);
> > + if (!r) {
> > + amdgpu_discovery_harvest_ip(adev);
> > + amdgpu_discovery_get_gfx_info(adev);
> > + amdgpu_discovery_get_mall_info(adev);
> > + amdgpu_discovery_get_vcn_info(adev);
> > + }
> > + break;
> > + default:
> > + r = amdgpu_discovery_reg_base_init(adev);
> > + if (r)
> > + return -EINVAL;
> > +
> > + amdgpu_discovery_harvest_ip(adev);
> > + amdgpu_discovery_get_gfx_info(adev);
> > + amdgpu_discovery_get_mall_info(adev);
> > + amdgpu_discovery_get_vcn_info(adev);
> > + break;
> > + }
> > +
> >   switch (adev->asic_type) {
>
> Looks like this fallback gets executed regardless of the
> presence/absence of new firmware.

That's by design.  The hardcoded settings we have today are not quite
the same as what the ip discovery table provides so we want them to
override what comes from the firmware either way.  We really just want
the ip discovery table so that we can setup the sysfs ip discovery
files because the ROCm profiler uses them for getting the GC register
offsets and wants to have them available for all gfx9+ parts.

Alex

>
> Thanks,
> Lijo
>
> >   case CHIP_VEGA10:
> >   vega10_reg_base_init(adev);
> > @@ -2700,14 +2730,6 @@ int amdgpu_discovery_set_ip_blocks(struct 
> > amdgpu_device *adev)
> >   adev->ip_versions[XGMI_HWIP][0] = IP_VERSION(6, 1, 0);
> >   break;
> >   default:
> > - r = amdgpu_discovery_reg_base_init(adev);
> > - if (r)
> > - return -EINVAL;
> > -
> > - amdgpu_discovery_harvest_ip(adev);
> > - amdgpu_discovery_get_gfx_info(adev);
> > - amdgpu_discovery_get_mall_info(adev);
> > - amdgpu_discovery_get_vcn_info(adev);
> >   break;
> >   }
> >
>


Re: [PATCH 5/8] drm/amdgpu: rework how the cleaner shader is emitted v3

2025-03-14 Thread Christian König
Am 14.03.25 um 05:24 schrieb SRINIVASAN SHANMUGAM:
> On 3/7/2025 7:18 PM, Christian König wrote:
>> Instead of emitting the cleaner shader for every job which has the
>> enforce_isolation flag set only emit it for the first submission from
>> every client.
>>
>> v2: add missing NULL check
>> v3: fix another NULL pointer deref
>>
>> Signed-off-by: Christian König 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 27 --
>>  1 file changed, 21 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> index ef4fe2df8398..dc10bea836db 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c
>> @@ -643,6 +643,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
>> amdgpu_job *job,
>>  bool need_pipe_sync)
>>  {
>>  struct amdgpu_device *adev = ring->adev;
>> +struct amdgpu_isolation *isolation = &adev->isolation[ring->xcp_id];
>>  unsigned vmhub = ring->vm_hub;
>>  struct amdgpu_vmid_mgr *id_mgr = &adev->vm_manager.id_mgr[vmhub];
>>  struct amdgpu_vmid *id = &id_mgr->ids[job->vmid];
>> @@ -650,8 +651,9 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
>> amdgpu_job *job,
>>  bool gds_switch_needed = ring->funcs->emit_gds_switch &&
>>  job->gds_switch_needed;
>>  bool vm_flush_needed = job->vm_needs_flush;
>> -struct dma_fence *fence = NULL;
>> +bool cleaner_shader_needed = false;
>>  bool pasid_mapping_needed = false;
>> +struct dma_fence *fence = NULL;
>>  unsigned int patch;
>>  int r;
>>  
>> @@ -674,8 +676,12 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
>> amdgpu_job *job,
>>  pasid_mapping_needed &= adev->gmc.gmc_funcs->emit_pasid_mapping &&
>>  ring->funcs->emit_wreg;
>>  
>> +cleaner_shader_needed = adev->gfx.enable_cleaner_shader &&
>> +ring->funcs->emit_cleaner_shader && job->base.s_fence &&
>> +&job->base.s_fence->scheduled == isolation->spearhead;

*here*

>> +
>>  if (!vm_flush_needed && !gds_switch_needed && !need_pipe_sync &&
>> -!(job->enforce_isolation && !job->vmid))
>> +!cleaner_shader_needed)
>>  return 0;
>>  
>>  amdgpu_ring_ib_begin(ring);
>> @@ -686,9 +692,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
>> amdgpu_job *job,
>>  if (need_pipe_sync)
>>  amdgpu_ring_emit_pipeline_sync(ring);
>>  
>> -if (adev->gfx.enable_cleaner_shader &&
>> -ring->funcs->emit_cleaner_shader &&
>> -job->enforce_isolation)
>> +if (cleaner_shader_needed)
>
> Here should we also need to check, for ring->funcs->emit_cleaner_shader?
>

I moved that up to where cleaner_shader_needed is set. See the *here* above.

That makes it easier to decide if we need fence after the preamble or not.

Regards,
Christian.

> if (cleaner_shader_needed && ring->funcs->emit_cleaner_shader)
>
>>  ring->funcs->emit_cleaner_shader(ring);
>>  
>>  if (vm_flush_needed) {
>> @@ -710,7 +714,7 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
>> amdgpu_job *job,
>>  job->oa_size);
>>  }
>>  
>> -if (vm_flush_needed || pasid_mapping_needed) {
>> +if (vm_flush_needed || pasid_mapping_needed || cleaner_shader_needed) {
>>  r = amdgpu_fence_emit(ring, &fence, NULL, 0);
>>  if (r)
>>  return r;
>> @@ -732,6 +736,17 @@ int amdgpu_vm_flush(struct amdgpu_ring *ring, struct 
>> amdgpu_job *job,
>>  id->pasid_mapping = dma_fence_get(fence);
>>  mutex_unlock(&id_mgr->lock);
>>  }
>> +
>> +/*
>> + * Make sure that all other submissions wait for the cleaner shader to
>> + * finish before we push them to the HW.
>> + */
>> +if (cleaner_shader_needed) {
>> +mutex_lock(&adev->enforce_isolation_mutex);
>> +dma_fence_put(isolation->spearhead);
>> +isolation->spearhead = dma_fence_get(fence);
>> +mutex_unlock(&adev->enforce_isolation_mutex);
>> +}
>>  dma_fence_put(fence);
>>  
>>  amdgpu_ring_patch_cond_exec(ring, patch);


RE: [PATCH v2] drm/amdkfd: Update return value of config_dequeue_wait_counts

2025-03-14 Thread Kim, Jonathan
[Public]

> -Original Message-
> From: amd-gfx  On Behalf Of Harish
> Kasiviswanathan
> Sent: Friday, March 14, 2025 4:22 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Kasiviswanathan, Harish 
> Subject: [PATCH v2] drm/amdkfd: Update return value of
> config_dequeue_wait_counts
>
> .config_dequeue_wait_counts returns a nop case. Modify return parameter
> to reflect that since the caller also needs to ignore this condition.
>
> v2: Removed redudant code.
> Tidy up code based on review comments
>
> Fixes: <98a5af8103f> ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")
>
> Signed-off-by: Harish Kasiviswanathan 
> ---
>  .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 14 
>  .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 36 +++
>  2 files changed, 29 insertions(+), 21 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> index 3f574d82b5fc..502b89639a9f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> @@ -436,19 +436,19 @@ int pm_config_dequeue_wait_counts(struct
> packet_manager *pm,
>
>   retval = pm->pmf->config_dequeue_wait_counts(pm, buffer,
>cmd, value);
> - if (!retval)
> + if (retval > 0) {
>   retval = kq_submit_packet(pm->priv_queue);
> +
> + /* If default value is modified, cache that in 
> dqm->wait_times
> */
> + if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> + update_dqm_wait_times(pm->dqm);
> + }
>   else
>   kq_rollback_packet(pm->priv_queue);
>   }
> -
> - /* If default value is modified, cache that value in dqm->wait_times */
> - if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> - update_dqm_wait_times(pm->dqm);
> -
>  out:
>   mutex_unlock(&pm->lock);
> - return retval;
> + return retval < 0 ? retval : 0;
>  }
>
>  int pm_send_unmap_queue(struct packet_manager *pm,
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> index d440df602393..3c6134d61b2b 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> @@ -310,6 +310,13 @@ static inline void
> pm_build_dequeue_wait_counts_packet_info(struct packet_manage
>   reg_data);
>  }
>
> +/* pm_config_dequeue_wait_counts_v9: Builds WRITE_DATA packet with
> + *register/value for configuring dequeue wait counts
> + *
> + * @return: -ve for failure, 0 for nop and +ve for success and buffer is
> + *  filled in with packet
> + *
> + **/
>  static int pm_config_dequeue_wait_counts_v9(struct packet_manager *pm,
>   uint32_t *buffer,
>   enum kfd_config_dequeue_wait_counts_cmd cmd,
> @@ -321,24 +328,25 @@ static int pm_config_dequeue_wait_counts_v9(struct
> packet_manager *pm,
>
>   switch (cmd) {
>   case KFD_DEQUEUE_WAIT_INIT: {
> - uint32_t sch_wave = 0, que_sleep = 0;
> - /* Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default
> 0x40.
> + uint32_t sch_wave = 0, que_sleep = 1;
> +
> + if (KFD_GC_VERSION(pm->dqm->dev) < IP_VERSION(9, 4, 1) ||
> + KFD_GC_VERSION(pm->dqm->dev) > IP_VERSION(10, 0, 0))
> + return 0;

>From my last comment, I suggested to put this at the beginning of the non-v9 
>pm_config_dequeue_wait_counts call that calls this function.
Then you don't have to make the return value more complicated than it currently 
is.

Also KFD_GC_VERSION(pm->dqm->dev) > IP_VERSION(10, 0, 0) is incorrect and 
should be >= because want to also exclude anything with a major version of 10.

Jon

> +
> + /* For all other gfx9 ASICs,
> +  * Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default
> 0x40.
>* On a 1GHz machine this is roughly 1 microsecond, which is
>* about how long it takes to load data out of memory during
>* queue connect
>* QUE_SLEEP: Wait Count for Dequeue Retry.
> +  *
> +  * Set CWSR grace period to 1x1000 cycle for GFX9.4.3 APU
>*/
> - if (KFD_GC_VERSION(pm->dqm->dev) >= IP_VERSION(9, 4, 1) &&
> - KFD_GC_VERSION(pm->dqm->dev) < IP_VERSION(10, 0, 0)) {
> - que_sleep = 1;
> -
> - /* Set CWSR grace period to 1x1000 cycle for GFX9.4.3 
> APU
> */
> - if (amdgpu_emu_mode == 0 && pm->dqm->dev->adev-
> >gmc.is_app_apu &&
> - (KFD_GC_VERSION(pm->dqm->dev) == IP_VERSION(9, 4,
> 3)))
> - sch_wave = 1;
> - } else {
> - return 0;
> - }
> 

[PATCH v3] drm/amdkfd: Fix bug in config_dequeue_wait_counts

2025-03-14 Thread Harish Kasiviswanathan
For certain ASICs where dequeue_wait_count don't need to be initialized,
pm_config_dequeue_wait_counts_v9 return without filling in the packet
information. However, the calling function interprets this as a success
and sends the uninitialized packet to firmware causing hang.

Fix the above bug by not calling pm_config_dequeue_wait_counts_v9 for
ASICs that don't need the value to be initialized.

v2: Removed redudant code.
Tidy up code based on review comments
v3: Don't call pm_config_dequeue_wait_counts_v9 for certain ASICs

Fixes: <98a5af8103f> ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")

Signed-off-by: Harish Kasiviswanathan 
---
 .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 16 ++
 .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 30 +++
 2 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 3f574d82b5fc..8a47b7259a10 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -418,6 +418,10 @@ int pm_config_dequeue_wait_counts(struct packet_manager 
*pm,
!pm->pmf->config_dequeue_wait_counts_size)
return 0;
 
+   if (cmd == KFD_DEQUEUE_WAIT_INIT && (KFD_GC_VERSION(pm->dqm->dev) < 
IP_VERSION(9, 4, 1) ||
+  KFD_GC_VERSION(pm->dqm->dev) >= IP_VERSION(10, 0, 0)))
+   return 0;
+
size = pm->pmf->config_dequeue_wait_counts_size;
 
mutex_lock(&pm->lock);
@@ -436,16 +440,16 @@ int pm_config_dequeue_wait_counts(struct packet_manager 
*pm,
 
retval = pm->pmf->config_dequeue_wait_counts(pm, buffer,
 cmd, value);
-   if (!retval)
+   if (!retval) {
retval = kq_submit_packet(pm->priv_queue);
+
+   /* If default value is modified, cache that in 
dqm->wait_times */
+   if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
+   update_dqm_wait_times(pm->dqm);
+   }
else
kq_rollback_packet(pm->priv_queue);
}
-
-   /* If default value is modified, cache that value in dqm->wait_times */
-   if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
-   update_dqm_wait_times(pm->dqm);
-
 out:
mutex_unlock(&pm->lock);
return retval;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index d440df602393..f059041902bc 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -310,6 +310,13 @@ static inline void 
pm_build_dequeue_wait_counts_packet_info(struct packet_manage
reg_data);
 }
 
+/* pm_config_dequeue_wait_counts_v9: Builds WRITE_DATA packet with
+ *register/value for configuring dequeue wait counts
+ *
+ * @return: -ve for failure and 0 for success and buffer is
+ *  filled in with packet
+ *
+ **/
 static int pm_config_dequeue_wait_counts_v9(struct packet_manager *pm,
uint32_t *buffer,
enum kfd_config_dequeue_wait_counts_cmd cmd,
@@ -321,24 +328,21 @@ static int pm_config_dequeue_wait_counts_v9(struct 
packet_manager *pm,
 
switch (cmd) {
case KFD_DEQUEUE_WAIT_INIT: {
-   uint32_t sch_wave = 0, que_sleep = 0;
-   /* Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default 0x40.
+   uint32_t sch_wave = 0, que_sleep = 1;
+
+   /* For all gfx9 ASICs > gfx941,
+* Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default 0x40.
 * On a 1GHz machine this is roughly 1 microsecond, which is
 * about how long it takes to load data out of memory during
 * queue connect
 * QUE_SLEEP: Wait Count for Dequeue Retry.
+*
+* Set CWSR grace period to 1x1000 cycle for GFX9.4.3 APU
 */
-   if (KFD_GC_VERSION(pm->dqm->dev) >= IP_VERSION(9, 4, 1) &&
-   KFD_GC_VERSION(pm->dqm->dev) < IP_VERSION(10, 0, 0)) {
-   que_sleep = 1;
-
-   /* Set CWSR grace period to 1x1000 cycle for GFX9.4.3 
APU */
-   if (amdgpu_emu_mode == 0 && 
pm->dqm->dev->adev->gmc.is_app_apu &&
-   (KFD_GC_VERSION(pm->dqm->dev) == IP_VERSION(9, 4, 3)))
-   sch_wave = 1;
-   } else {
-   return 0;
-   }
+   if (amdgpu_emu_mode == 0 && pm->dqm->dev->adev->gmc.is_app_apu 
&&
+   (KFD_GC_VERSION(pm->dqm->dev) == IP_VERSION(9, 4, 3)))
+   sch_wave = 1;
+
pm_build_dequeue_wait_counts_packet_info(pm, sch_wave, 
que_sleep,
   

Re: [PATCH] add mes type in mes ring dump.

2025-03-14 Thread StDenis, Tom
[AMD Official Use Only - AMD Internal Distribution Only]

Thanks this was pushed out to main just now.

Cheers,
Tom


From: Zhang, Yifan 
Sent: Wednesday, March 12, 2025 21:40
To: amd-gfx@lists.freedesktop.org
Cc: StDenis, Tom; Zhang, Yifan
Subject: [PATCH] add mes type in mes ring dump.

According to MES API spec, type 0 means invald. Pass mes type in ring
dump function to avoid confusion like below:

[0x0@0x + 0x7200]   [0x000400e1]Opcode 0xe 
[MES_SCH_API_MISC] (64 words, type: 0, hdr: 0x400e1)

Signed-off-by: Yifan Zhang 
---
 src/lib/read_mes_stream.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/lib/read_mes_stream.c b/src/lib/read_mes_stream.c
index 6146fd3b..b92ad9fc 100644
--- a/src/lib/read_mes_stream.c
+++ b/src/lib/read_mes_stream.c
@@ -295,7 +295,7 @@ struct umr_mes_stream *umr_mes_decode_stream_opcodes(struct 
umr_asic *asic, stru
ui->start_ib(ui, ib_addr, ib_vmid, 0, 0, 0, 0);
while (stream && opcodes-- && stream->nwords) {
opcode_name = STR_LOOKUP(mes_v10_opcodes, stream->opcode, 
"MES_UNK");
-   ui->start_opcode(ui, ib_addr, ib_vmid, 0, stream->opcode, 0, 
stream->nwords, opcode_name, stream->header, stream->words);
+   ui->start_opcode(ui, ib_addr, ib_vmid, stream->type, 
stream->opcode, 0, stream->nwords, opcode_name, stream->header, stream->words);

i = 0;
ib_addr += 4; // skip over header
--
2.43.0



[PATCH v1 2/3] drm/ci: uprev IGT

2025-03-14 Thread Vignesh Raman
Uprev IGT to the latest version and update expectation files.

Signed-off-by: Vignesh Raman 
---
 drivers/gpu/drm/ci/gitlab-ci.yml  |   2 +-
 .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt |   8 +-
 .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt |   1 +
 drivers/gpu/drm/ci/xfails/i915-amly-fails.txt |  23 +-
 drivers/gpu/drm/ci/xfails/i915-amly-skips.txt |   1 +
 drivers/gpu/drm/ci/xfails/i915-apl-fails.txt  |   8 +-
 drivers/gpu/drm/ci/xfails/i915-apl-skips.txt  |   1 +
 drivers/gpu/drm/ci/xfails/i915-cml-fails.txt  |  20 +-
 drivers/gpu/drm/ci/xfails/i915-cml-skips.txt  |   2 +-
 drivers/gpu/drm/ci/xfails/i915-glk-fails.txt  |  32 +-
 drivers/gpu/drm/ci/xfails/i915-glk-skips.txt  |   1 +
 drivers/gpu/drm/ci/xfails/i915-jsl-fails.txt  |  13 +-
 drivers/gpu/drm/ci/xfails/i915-jsl-skips.txt  |   1 +
 drivers/gpu/drm/ci/xfails/i915-kbl-fails.txt  |   5 -
 drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt  |   1 +
 drivers/gpu/drm/ci/xfails/i915-tgl-fails.txt  |   9 +-
 drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt  |   1 +
 drivers/gpu/drm/ci/xfails/i915-whl-fails.txt  |  22 +-
 drivers/gpu/drm/ci/xfails/i915-whl-skips.txt  |   1 +
 .../drm/ci/xfails/mediatek-mt8173-fails.txt   |  20 ++
 .../drm/ci/xfails/mediatek-mt8173-flakes.txt  |   7 +
 .../drm/ci/xfails/mediatek-mt8173-skips.txt   |   1 +
 .../drm/ci/xfails/mediatek-mt8183-fails.txt   |  28 +-
 .../drm/ci/xfails/mediatek-mt8183-flakes.txt  |  21 ++
 .../drm/ci/xfails/mediatek-mt8183-skips.txt   |   1 +
 .../gpu/drm/ci/xfails/meson-g12b-skips.txt|   1 +
 .../gpu/drm/ci/xfails/msm-apq8016-fails.txt   |   4 -
 .../gpu/drm/ci/xfails/msm-apq8016-skips.txt   |   1 +
 .../gpu/drm/ci/xfails/msm-apq8096-skips.txt   |   1 +
 .../msm-sc7180-trogdor-kingoftown-flakes.txt  |   7 +
 .../msm-sc7180-trogdor-kingoftown-skips.txt   |   4 +
 ...m-sc7180-trogdor-lazor-limozeen-flakes.txt |   7 +
 ...sm-sc7180-trogdor-lazor-limozeen-skips.txt |   1 +
 .../gpu/drm/ci/xfails/msm-sdm845-flakes.txt   |   7 +
 .../gpu/drm/ci/xfails/msm-sdm845-skips.txt| 313 ++
 .../drm/ci/xfails/msm-sm8350-hdk-skips.txt|   1 +
 .../gpu/drm/ci/xfails/panfrost-g12b-skips.txt |   1 +
 .../drm/ci/xfails/panfrost-mt8183-skips.txt   |   1 +
 .../drm/ci/xfails/panfrost-rk3288-skips.txt   |   1 +
 .../drm/ci/xfails/panfrost-rk3399-skips.txt   |   1 +
 .../drm/ci/xfails/rockchip-rk3288-fails.txt   |   1 -
 .../drm/ci/xfails/rockchip-rk3288-skips.txt   |   1 +
 .../drm/ci/xfails/rockchip-rk3399-fails.txt   |   2 +-
 .../drm/ci/xfails/rockchip-rk3399-flakes.txt  |  30 +-
 .../drm/ci/xfails/rockchip-rk3399-skips.txt   |   1 +
 .../drm/ci/xfails/virtio_gpu-none-fails.txt   |   1 +
 .../drm/ci/xfails/virtio_gpu-none-skips.txt   |   1 +
 .../gpu/drm/ci/xfails/vkms-none-flakes.txt|  28 ++
 drivers/gpu/drm/ci/xfails/vkms-none-skips.txt |   2 +
 49 files changed, 554 insertions(+), 94 deletions(-)

diff --git a/drivers/gpu/drm/ci/gitlab-ci.yml b/drivers/gpu/drm/ci/gitlab-ci.yml
index 55b540c4cf92..65adcd97e06b 100644
--- a/drivers/gpu/drm/ci/gitlab-ci.yml
+++ b/drivers/gpu/drm/ci/gitlab-ci.yml
@@ -5,7 +5,7 @@ variables:
   UPSTREAM_REPO: https://gitlab.freedesktop.org/drm/kernel.git
   TARGET_BRANCH: drm-next
 
-  IGT_VERSION: 33adea9ebafd059ac88a5ccfec60536394f36c7c
+  IGT_VERSION: 04bedb9238586b81d4d4ca62b02e584f6cfc77af
 
   DEQP_RUNNER_GIT_URL: https://gitlab.freedesktop.org/mesa/deqp-runner.git
   DEQP_RUNNER_GIT_TAG: v0.20.0
diff --git a/drivers/gpu/drm/ci/xfails/amdgpu-stoney-fails.txt 
b/drivers/gpu/drm/ci/xfails/amdgpu-stoney-fails.txt
index 75374085f40f..f44dbce3151a 100644
--- a/drivers/gpu/drm/ci/xfails/amdgpu-stoney-fails.txt
+++ b/drivers/gpu/drm/ci/xfails/amdgpu-stoney-fails.txt
@@ -14,16 +14,10 @@ amdgpu/amd_plane@mpo-scale-nv12,Fail
 amdgpu/amd_plane@mpo-scale-p010,Fail
 amdgpu/amd_plane@mpo-scale-rgb,Crash
 amdgpu/amd_plane@mpo-swizzle-toggle,Fail
-amdgpu/amd_uvd_dec@amdgpu_uvd_decode,Crash
+amdgpu/amd_uvd_dec@amdgpu_uvd_decode,Fail
 kms_addfb_basic@bad-pitch-65536,Fail
 kms_addfb_basic@bo-too-small,Fail
 kms_addfb_basic@too-high,Fail
-kms_async_flips@alternate-sync-async-flip,Fail
-kms_async_flips@alternate-sync-async-flip-atomic,Fail
-kms_async_flips@test-cursor,Fail
-kms_async_flips@test-cursor-atomic,Fail
-kms_async_flips@test-time-stamp,Fail
-kms_async_flips@test-time-stamp-atomic,Fail
 kms_atomic_transition@plane-all-modeset-transition-internal-panels,Fail
 kms_atomic_transition@plane-all-transition,Fail
 kms_atomic_transition@plane-all-transition-nonblocking,Fail
diff --git a/drivers/gpu/drm/ci/xfails/amdgpu-stoney-skips.txt 
b/drivers/gpu/drm/ci/xfails/amdgpu-stoney-skips.txt
index 3879c4812a22..902d54027506 100644
--- a/drivers/gpu/drm/ci/xfails/amdgpu-stoney-skips.txt
+++ b/drivers/gpu/drm/ci/xfails/amdgpu-stoney-skips.txt
@@ -14,6 +14,7 @@ gem_.*
 i915_.*
 xe_.*
 tools_test.*
+kms_dp_link_training.*
 
 # Currently fails and causes coverage loss for other tests
 # since core_getversion also fails.
diff --git a/drivers/gpu/drm/ci/xfails/i915-amly-fai

[PATCH v1 1/3] drm/ci: uprev mesa

2025-03-14 Thread Vignesh Raman
LAVA was recently patched [1] with a fix on how parameters are parsed in
`lava-test-case`, so we don't need to repeat quotes to send the
arguments properly to it. Uprev mesa to fix this issue.

[1] https://gitlab.com/lava/lava/-/commit/18c9cf79

Signed-off-by: Vignesh Raman 
---
 drivers/gpu/drm/ci/build.sh   | 16 
 drivers/gpu/drm/ci/build.yml  |  8 
 drivers/gpu/drm/ci/container.yml  | 24 +++
 drivers/gpu/drm/ci/gitlab-ci.yml  | 32 ++-
 drivers/gpu/drm/ci/image-tags.yml |  4 +++-
 drivers/gpu/drm/ci/lava-submit.sh |  3 ++-
 drivers/gpu/drm/ci/test.yml   |  2 +-
 7 files changed, 77 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/ci/build.sh b/drivers/gpu/drm/ci/build.sh
index 19fe01257ab9..284873e94d8d 100644
--- a/drivers/gpu/drm/ci/build.sh
+++ b/drivers/gpu/drm/ci/build.sh
@@ -98,14 +98,14 @@ done
 
 make ${KERNEL_IMAGE_NAME}
 
-mkdir -p /lava-files/
+mkdir -p /kernel/
 for image in ${KERNEL_IMAGE_NAME}; do
-cp arch/${KERNEL_ARCH}/boot/${image} /lava-files/.
+cp arch/${KERNEL_ARCH}/boot/${image} /kernel/.
 done
 
 if [[ -n ${DEVICE_TREES} ]]; then
 make dtbs
-cp ${DEVICE_TREES} /lava-files/.
+cp ${DEVICE_TREES} /kernel/.
 fi
 
 make modules
@@ -121,11 +121,11 @@ if [[ ${DEBIAN_ARCH} = "arm64" ]]; then
 -d arch/arm64/boot/Image.lzma \
 -C lzma\
 -b arch/arm64/boot/dts/qcom/sdm845-cheza-r3.dtb \
-/lava-files/cheza-kernel
+/kernel/cheza-kernel
 KERNEL_IMAGE_NAME+=" cheza-kernel"
 
 # Make a gzipped copy of the Image for db410c.
-gzip -k /lava-files/Image
+gzip -k /kernel/Image
 KERNEL_IMAGE_NAME+=" Image.gz"
 fi
 
@@ -139,7 +139,7 @@ cp -rfv drivers/gpu/drm/ci/* install/.
 . .gitlab-ci/container/container_post_build.sh
 
 if [[ "$UPLOAD_TO_MINIO" = "1" ]]; then
-xz -7 -c -T${FDO_CI_CONCURRENT:-4} vmlinux > /lava-files/vmlinux.xz
+xz -7 -c -T${FDO_CI_CONCURRENT:-4} vmlinux > /kernel/vmlinux.xz
 FILES_TO_UPLOAD="$KERNEL_IMAGE_NAME vmlinux.xz"
 
 if [[ -n $DEVICE_TREES ]]; then
@@ -148,7 +148,7 @@ if [[ "$UPLOAD_TO_MINIO" = "1" ]]; then
 
 ls -l "${S3_JWT_FILE}"
 for f in $FILES_TO_UPLOAD; do
-ci-fairy s3cp --token-file "${S3_JWT_FILE}" /lava-files/$f \
+ci-fairy s3cp --token-file "${S3_JWT_FILE}" /kernel/$f \
 https://${PIPELINE_ARTIFACTS_BASE}/${DEBIAN_ARCH}/$f
 done
 
@@ -165,7 +165,7 @@ ln -s common artifacts/install/ci-common
 cp .config artifacts/${CI_JOB_NAME}_config
 
 for image in ${KERNEL_IMAGE_NAME}; do
-cp /lava-files/$image artifacts/install/.
+cp /kernel/$image artifacts/install/.
 done
 
 tar -C artifacts -cf artifacts/install.tar install
diff --git a/drivers/gpu/drm/ci/build.yml b/drivers/gpu/drm/ci/build.yml
index 6c0dc10b547c..8eb56ebcf4aa 100644
--- a/drivers/gpu/drm/ci/build.yml
+++ b/drivers/gpu/drm/ci/build.yml
@@ -143,6 +143,10 @@ debian-arm64-release:
   rules:
 - when: never
 
+debian-arm64-ubsan:
+  rules:
+- when: never
+
 debian-build-testing:
   rules:
 - when: never
@@ -183,6 +187,10 @@ debian-testing-msan:
   rules:
 - when: never
 
+debian-testing-ubsan:
+  rules:
+- when: never
+
 debian-vulkan:
   rules:
 - when: never
diff --git a/drivers/gpu/drm/ci/container.yml b/drivers/gpu/drm/ci/container.yml
index 07dc13ff865d..56c95c2f91ae 100644
--- a/drivers/gpu/drm/ci/container.yml
+++ b/drivers/gpu/drm/ci/container.yml
@@ -24,6 +24,18 @@ alpine/x86_64_build:
   rules:
 - when: never
 
+debian/arm32_test-base:
+  rules:
+- when: never
+
+debian/arm32_test-gl:
+  rules:
+- when: never
+
+debian/arm32_test-vk:
+  rules:
+- when: never
+
 debian/arm64_test-gl:
   rules:
 - when: never
@@ -32,6 +44,10 @@ debian/arm64_test-vk:
   rules:
 - when: never
 
+debian/baremetal_arm32_test:
+  rules:
+- when: never
+
 debian/ppc64el_build:
   rules:
 - when: never
@@ -40,6 +56,14 @@ debian/s390x_build:
   rules:
 - when: never
 
+debian/x86_32_build:
+  rules:
+- when: never
+
+debian/x86_64_test-android:
+  rules:
+- when: never
+
 debian/x86_64_test-vk:
   rules:
 - when: never
diff --git a/drivers/gpu/drm/ci/gitlab-ci.yml b/drivers/gpu/drm/ci/gitlab-ci.yml
index b06b9e7d3d09..55b540c4cf92 100644
--- a/drivers/gpu/drm/ci/gitlab-ci.yml
+++ b/drivers/gpu/drm/ci/gitlab-ci.yml
@@ -1,6 +1,6 @@
 variables:
   DRM_CI_PROJECT_PATH: &drm-ci-project-path mesa/mesa
-  DRM_CI_COMMIT_SHA: &drm-ci-commit-sha 
7d3062470f3ccc6cb40540e772e902c7e2248024
+  DRM_CI_COMMIT_SHA: &drm-ci-commit-sha 
82ab58f6c6f94fa80ca7e1615146f08356e3ba69
 
   UPSTREAM_REPO: https://gitlab.freedesktop.org/drm/kernel.git
   TARGET_BRANCH: drm-next
@@ -187,6 +187,36 @@ stages:
 - when: manual
 
 
+# Repeat of the above but with `when: on_success` replaced with
+# `when: delayed` + `start_in:`, for build-only jobs.
+# Note: make sure the branches in this list are the same as in
+# `.container+build-rules` above.
+.bu

[PATCH v1 0/3] drm/ci: uprev mesa, IGT

2025-03-14 Thread Vignesh Raman
Uprev mesa to fix lava-test-case argument parsing. LAVA recently fixed
parameter parsing in `lava-test-case` [1], eliminating the need for
repeated quotes in arguments.

Uprev IGT to the latest version. Also update expectation files.

The mediatek display driver fails to probe on mt8173-elm-hana and
mt8183-kukui-jacuzzi-juniper-sku16 in v6.14-rc4 due to missing PHY
configurations. Enable PHY drivers for MediaTek platforms:

[1] https://gitlab.com/lava/lava/-/commit/18c9cf79  

Testing:
MR pipeline for drm/ci changes: 
https://gitlab.freedesktop.org/vigneshraman/linux/-/merge_requests/13
MR pipeline for kernel changes: 
https://gitlab.freedesktop.org/vigneshraman/linux/-/merge_requests/14
Scheduled pipeline: 
https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1384221
Branch pipeline:
https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1384158 (with 
lockdep patches after rebase with latest drm-misc-next)
https://gitlab.freedesktop.org/vigneshraman/linux/-/pipelines/1383492 (without 
lockdep patches)

Vignesh Raman (3):
  drm/ci: uprev mesa
  drm/ci: uprev IGT
  drm/ci: arm64.config: mediatek: enable PHY drivers

 drivers/gpu/drm/ci/arm64.config   |   2 +
 drivers/gpu/drm/ci/build.sh   |  16 +-
 drivers/gpu/drm/ci/build.yml  |   8 +
 drivers/gpu/drm/ci/container.yml  |  24 ++
 drivers/gpu/drm/ci/gitlab-ci.yml  |  34 +-
 drivers/gpu/drm/ci/image-tags.yml |   4 +-
 drivers/gpu/drm/ci/lava-submit.sh |   3 +-
 drivers/gpu/drm/ci/test.yml   |   2 +-
 .../gpu/drm/ci/xfails/amdgpu-stoney-fails.txt |   8 +-
 .../gpu/drm/ci/xfails/amdgpu-stoney-skips.txt |   1 +
 drivers/gpu/drm/ci/xfails/i915-amly-fails.txt |  23 +-
 drivers/gpu/drm/ci/xfails/i915-amly-skips.txt |   1 +
 drivers/gpu/drm/ci/xfails/i915-apl-fails.txt  |   8 +-
 drivers/gpu/drm/ci/xfails/i915-apl-skips.txt  |   1 +
 drivers/gpu/drm/ci/xfails/i915-cml-fails.txt  |  20 +-
 drivers/gpu/drm/ci/xfails/i915-cml-skips.txt  |   2 +-
 drivers/gpu/drm/ci/xfails/i915-glk-fails.txt  |  32 +-
 drivers/gpu/drm/ci/xfails/i915-glk-skips.txt  |   1 +
 drivers/gpu/drm/ci/xfails/i915-jsl-fails.txt  |  13 +-
 drivers/gpu/drm/ci/xfails/i915-jsl-skips.txt  |   1 +
 drivers/gpu/drm/ci/xfails/i915-kbl-fails.txt  |   5 -
 drivers/gpu/drm/ci/xfails/i915-kbl-skips.txt  |   1 +
 drivers/gpu/drm/ci/xfails/i915-tgl-fails.txt  |   9 +-
 drivers/gpu/drm/ci/xfails/i915-tgl-skips.txt  |   1 +
 drivers/gpu/drm/ci/xfails/i915-whl-fails.txt  |  22 +-
 drivers/gpu/drm/ci/xfails/i915-whl-skips.txt  |   1 +
 .../drm/ci/xfails/mediatek-mt8173-fails.txt   |  20 ++
 .../drm/ci/xfails/mediatek-mt8173-flakes.txt  |   7 +
 .../drm/ci/xfails/mediatek-mt8173-skips.txt   |   1 +
 .../drm/ci/xfails/mediatek-mt8183-fails.txt   |  28 +-
 .../drm/ci/xfails/mediatek-mt8183-flakes.txt  |  21 ++
 .../drm/ci/xfails/mediatek-mt8183-skips.txt   |   1 +
 .../gpu/drm/ci/xfails/meson-g12b-skips.txt|   1 +
 .../gpu/drm/ci/xfails/msm-apq8016-fails.txt   |   4 -
 .../gpu/drm/ci/xfails/msm-apq8016-skips.txt   |   1 +
 .../gpu/drm/ci/xfails/msm-apq8096-skips.txt   |   1 +
 .../msm-sc7180-trogdor-kingoftown-flakes.txt  |   7 +
 .../msm-sc7180-trogdor-kingoftown-skips.txt   |   4 +
 ...m-sc7180-trogdor-lazor-limozeen-flakes.txt |   7 +
 ...sm-sc7180-trogdor-lazor-limozeen-skips.txt |   1 +
 .../gpu/drm/ci/xfails/msm-sdm845-flakes.txt   |   7 +
 .../gpu/drm/ci/xfails/msm-sdm845-skips.txt| 313 ++
 .../drm/ci/xfails/msm-sm8350-hdk-skips.txt|   1 +
 .../gpu/drm/ci/xfails/panfrost-g12b-skips.txt |   1 +
 .../drm/ci/xfails/panfrost-mt8183-skips.txt   |   1 +
 .../drm/ci/xfails/panfrost-rk3288-skips.txt   |   1 +
 .../drm/ci/xfails/panfrost-rk3399-skips.txt   |   1 +
 .../drm/ci/xfails/rockchip-rk3288-fails.txt   |   1 -
 .../drm/ci/xfails/rockchip-rk3288-skips.txt   |   1 +
 .../drm/ci/xfails/rockchip-rk3399-fails.txt   |   2 +-
 .../drm/ci/xfails/rockchip-rk3399-flakes.txt  |  30 +-
 .../drm/ci/xfails/rockchip-rk3399-skips.txt   |   1 +
 .../drm/ci/xfails/virtio_gpu-none-fails.txt   |   1 +
 .../drm/ci/xfails/virtio_gpu-none-skips.txt   |   1 +
 .../gpu/drm/ci/xfails/vkms-none-flakes.txt|  28 ++
 drivers/gpu/drm/ci/xfails/vkms-none-skips.txt |   2 +
 56 files changed, 633 insertions(+), 106 deletions(-)

-- 
2.47.2



Re: [PATCH 03/11] drm/amdgpu/gfx: add generic handling for disable_kq

2025-03-14 Thread Khatri, Sunil

Reviewed-by: Sunil Khatri 

On 3/13/2025 8:11 PM, Alex Deucher wrote:

Add proper checks for disable_kq functionality in
gfx helper functions.  Add special logic for families
that require the clear state setup.

v2: use ring count as per Felix suggestion
v3: fix num_gfx_rings handling in amdgpu_gfx_graphics_queue_acquire()

Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 8 ++--
  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h | 2 ++
  2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 984e6ff6e4632..a08243dd0798e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -258,8 +258,9 @@ void amdgpu_gfx_graphics_queue_acquire(struct amdgpu_device 
*adev)
}
  
  	/* update the number of active graphics rings */

-   adev->gfx.num_gfx_rings =
-   bitmap_weight(adev->gfx.me.queue_bitmap, AMDGPU_MAX_GFX_QUEUES);
+   if (adev->gfx.num_gfx_rings)
+   adev->gfx.num_gfx_rings =
+   bitmap_weight(adev->gfx.me.queue_bitmap, 
AMDGPU_MAX_GFX_QUEUES);
  }
  
  static int amdgpu_gfx_kiq_acquire(struct amdgpu_device *adev,

@@ -1544,6 +1545,9 @@ static ssize_t amdgpu_gfx_set_run_cleaner_shader(struct 
device *dev,
if (adev->in_suspend && !adev->in_runpm)
return -EPERM;
  
+	if (adev->gfx.disable_kq)

+   return -ENOTSUPP;
+
ret = kstrtol(buf, 0, &value);
  
  	if (ret)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index ddf4533614bac..8fa68a4ac34f1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -483,6 +483,8 @@ struct amdgpu_gfx {
  
  	atomic_t			total_submission_cnt;

struct delayed_work idle_work;
+
+   booldisable_kq;
  };
  
  struct amdgpu_gfx_ras_reg_entry {


[PATCH 2/3] drm/amdgpu: use specific ip_discovery.bin for legacy asics

2025-03-14 Thread Flora Cui
vega10/vega12/vega20/raven/raven2/picasso/arcturus/aldebaran

Signed-off-by: Flora Cui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 28 ++-
 1 file changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 2b4854e03821..fff438baf64b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -114,6 +114,12 @@
 #endif
 
 MODULE_FIRMWARE("amdgpu/ip_discovery.bin");
+MODULE_FIRMWARE("amdgpu/vega10_ip_discovery.bin");
+MODULE_FIRMWARE("amdgpu/vega12_ip_discovery.bin");
+MODULE_FIRMWARE("amdgpu/vega20_ip_discovery.bin");
+MODULE_FIRMWARE("amdgpu/raven_ip_discovery.bin");
+MODULE_FIRMWARE("amdgpu/raven2_ip_discovery.bin");
+MODULE_FIRMWARE("amdgpu/picasso_ip_discovery.bin");
 
 #define mmIP_DISCOVERY_VERSION  0x16A00
 #define mmRCC_CONFIG_MEMSIZE   0xde3
@@ -400,7 +406,27 @@ static const char *amdgpu_discovery_get_fw_name(struct 
amdgpu_device *adev)
if (amdgpu_discovery == 2)
return "amdgpu/ip_discovery.bin";
 
-   return NULL;
+   switch (adev->asic_type) {
+   case CHIP_VEGA10:
+   return "amdgpu/vega10_ip_discovery.bin";
+   case CHIP_VEGA12:
+   return "amdgpu/vega12_ip_discovery.bin";
+   case CHIP_RAVEN:
+   if (adev->apu_flags & AMD_APU_IS_RAVEN2)
+   return "amdgpu/raven2_ip_discovery.bin";
+   else if (adev->apu_flags & AMD_APU_IS_PICASSO)
+   return "amdgpu/picasso_ip_discovery.bin";
+   else
+   return "amdgpu/raven_ip_discovery.bin";
+   case CHIP_VEGA20:
+   return "amdgpu/vega20_ip_discovery.bin";
+   case CHIP_ARCTURUS:
+   return "amdgpu/arcturus_ip_discovery.bin";
+   case CHIP_ALDEBARAN:
+   return "amdgpu/aldebaran_ip_discovery.bin";
+   default:
+   return NULL;
+   }
 }
 
 static int amdgpu_discovery_init(struct amdgpu_device *adev)
-- 
2.43.0



Re: [PATCH] drm/amdgpu/gfx12: correct cleanup of 'me' field with gfx_v12_0_me_fini()

2025-03-14 Thread Markus Elfring
…
> can only release 'pfp' field of 'gfx'. The release function of 'me' field
> should be gfx_v12_0_me_fini().

Do you care for an imperative wording in such a change description?
https://web.git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/process/submitting-patches.rst?h=v6.14-rc6#n94

Regards,
Markus


[PATCH RFC v4 0/6] drm/display: dp: add new DPCD access functions

2025-03-14 Thread Dmitry Baryshkov
Existing DPCD access functions return an error code or the number of
bytes being read / write in case of partial access. However a lot of
drivers either (incorrectly) ignore partial access or mishandle error
codes. In other cases this results in a boilerplate code which compares
returned value with the size.

As suggested by Jani implement new set of DPCD access helpers, which
ignore partial access, always return 0 or an error code. Implement
new helpers using existing functions to ensure backwards compatibility
and to assess necessity to handle incomplete reads on a global scale.
Currently only one possible place has been identified, dp-aux-dev, which
needs to handle possible holes in DPCD.

This series targets only the DRM helpers code. If the approach is found
to be acceptable, each of the drivers should be converted on its own.

Signed-off-by: Dmitry Baryshkov 
---
Changes in v4:
- Actually dropped the dp-aux-dev patch (Lyude).
- Added two missing full stops in linuxdoc (Lyude).
- Link to v3: 
https://lore.kernel.org/r/20250307-drm-rework-dpcd-access-v3-0-9044a3a86...@linaro.org

Changes in v3:
- Fixed cover letter (Jani)
- Added intel-gfx and intel-xe to get the series CI-tested (Jani)
- Link to v2: 
https://lore.kernel.org/r/20250301-drm-rework-dpcd-access-v2-0-4d92602fc...@linaro.org

Changes in v2:
- Reimplemented new helpers using old ones (Lyude)
- Reworked the drm_dp_dpcd_read_link_status() patch (Lyude)
- Dropped the dp-aux-dev patch (Jani)
- Link to v1: 
https://lore.kernel.org/r/20250117-drm-rework-dpcd-access-v1-0-7fc020e04...@linaro.org

---
Dmitry Baryshkov (6):
  drm/display: dp: implement new access helpers
  drm/display: dp: change drm_dp_dpcd_read_link_status() return value
  drm/display: dp: use new DCPD access helpers
  drm/display: dp-cec: use new DCPD access helpers
  drm/display: dp-mst-topology: use new DCPD access helpers
  drm/display: dp-tunnel: use new DCPD access helpers

 drivers/gpu/drm/amd/amdgpu/atombios_dp.c   |   8 +-
 .../gpu/drm/bridge/cadence/cdns-mhdp8546-core.c|   2 +-
 drivers/gpu/drm/display/drm_dp_cec.c   |  37 ++-
 drivers/gpu/drm/display/drm_dp_helper.c| 307 +
 drivers/gpu/drm/display/drm_dp_mst_topology.c  | 105 ---
 drivers/gpu/drm/display/drm_dp_tunnel.c|  20 +-
 drivers/gpu/drm/hisilicon/hibmc/dp/dp_link.c   |   4 +-
 drivers/gpu/drm/msm/dp/dp_ctrl.c   |  24 +-
 drivers/gpu/drm/msm/dp/dp_link.c   |  18 +-
 drivers/gpu/drm/radeon/atombios_dp.c   |   8 +-
 include/drm/display/drm_dp_helper.h|  92 +-
 11 files changed, 317 insertions(+), 308 deletions(-)
---
base-commit: b0894e40afe2bd05d1fda68cc364665ac2b00e09
change-id: 20241231-drm-rework-dpcd-access-b0fc2e47d613

Best regards,
-- 
Dmitry Baryshkov 



[PATCH RFC v4 4/6] drm/display: dp-cec: use new DCPD access helpers

2025-03-14 Thread Dmitry Baryshkov
From: Dmitry Baryshkov 

Switch drm_dp_cec.c to use new set of DPCD read / write helpers.

Reviewed-by: Lyude Paul 
Acked-by: Jani Nikula 
Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/display/drm_dp_cec.c | 37 ++--
 1 file changed, 18 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/display/drm_dp_cec.c 
b/drivers/gpu/drm/display/drm_dp_cec.c
index 
56a4965e518cc237c992a2e31b9f6de05c14766a..ed31471bd0e28826254ecedac48c5c126729d470
 100644
--- a/drivers/gpu/drm/display/drm_dp_cec.c
+++ b/drivers/gpu/drm/display/drm_dp_cec.c
@@ -96,7 +96,7 @@ static int drm_dp_cec_adap_enable(struct cec_adapter *adap, 
bool enable)
u32 val = enable ? DP_CEC_TUNNELING_ENABLE : 0;
ssize_t err = 0;
 
-   err = drm_dp_dpcd_writeb(aux, DP_CEC_TUNNELING_CONTROL, val);
+   err = drm_dp_dpcd_write_byte(aux, DP_CEC_TUNNELING_CONTROL, val);
return (enable && err < 0) ? err : 0;
 }
 
@@ -112,7 +112,7 @@ static int drm_dp_cec_adap_log_addr(struct cec_adapter 
*adap, u8 addr)
la_mask |= adap->log_addrs.log_addr_mask | (1 << addr);
mask[0] = la_mask & 0xff;
mask[1] = la_mask >> 8;
-   err = drm_dp_dpcd_write(aux, DP_CEC_LOGICAL_ADDRESS_MASK, mask, 2);
+   err = drm_dp_dpcd_write_data(aux, DP_CEC_LOGICAL_ADDRESS_MASK, mask, 2);
return (addr != CEC_LOG_ADDR_INVALID && err < 0) ? err : 0;
 }
 
@@ -123,15 +123,14 @@ static int drm_dp_cec_adap_transmit(struct cec_adapter 
*adap, u8 attempts,
unsigned int retries = min(5, attempts - 1);
ssize_t err;
 
-   err = drm_dp_dpcd_write(aux, DP_CEC_TX_MESSAGE_BUFFER,
-   msg->msg, msg->len);
+   err = drm_dp_dpcd_write_data(aux, DP_CEC_TX_MESSAGE_BUFFER,
+msg->msg, msg->len);
if (err < 0)
return err;
 
-   err = drm_dp_dpcd_writeb(aux, DP_CEC_TX_MESSAGE_INFO,
-(msg->len - 1) | (retries << 4) |
-DP_CEC_TX_MESSAGE_SEND);
-   return err < 0 ? err : 0;
+   return drm_dp_dpcd_write_byte(aux, DP_CEC_TX_MESSAGE_INFO,
+ (msg->len - 1) | (retries << 4) |
+ DP_CEC_TX_MESSAGE_SEND);
 }
 
 static int drm_dp_cec_adap_monitor_all_enable(struct cec_adapter *adap,
@@ -144,13 +143,13 @@ static int drm_dp_cec_adap_monitor_all_enable(struct 
cec_adapter *adap,
if (!(adap->capabilities & CEC_CAP_MONITOR_ALL))
return 0;
 
-   err = drm_dp_dpcd_readb(aux, DP_CEC_TUNNELING_CONTROL, &val);
-   if (err >= 0) {
+   err = drm_dp_dpcd_read_byte(aux, DP_CEC_TUNNELING_CONTROL, &val);
+   if (!err) {
if (enable)
val |= DP_CEC_SNOOPING_ENABLE;
else
val &= ~DP_CEC_SNOOPING_ENABLE;
-   err = drm_dp_dpcd_writeb(aux, DP_CEC_TUNNELING_CONTROL, val);
+   err = drm_dp_dpcd_write_byte(aux, DP_CEC_TUNNELING_CONTROL, 
val);
}
return (enable && err < 0) ? err : 0;
 }
@@ -194,7 +193,7 @@ static int drm_dp_cec_received(struct drm_dp_aux *aux)
u8 rx_msg_info;
ssize_t err;
 
-   err = drm_dp_dpcd_readb(aux, DP_CEC_RX_MESSAGE_INFO, &rx_msg_info);
+   err = drm_dp_dpcd_read_byte(aux, DP_CEC_RX_MESSAGE_INFO, &rx_msg_info);
if (err < 0)
return err;
 
@@ -202,7 +201,7 @@ static int drm_dp_cec_received(struct drm_dp_aux *aux)
return 0;
 
msg.len = (rx_msg_info & DP_CEC_RX_MESSAGE_LEN_MASK) + 1;
-   err = drm_dp_dpcd_read(aux, DP_CEC_RX_MESSAGE_BUFFER, msg.msg, msg.len);
+   err = drm_dp_dpcd_read_data(aux, DP_CEC_RX_MESSAGE_BUFFER, msg.msg, 
msg.len);
if (err < 0)
return err;
 
@@ -215,7 +214,7 @@ static void drm_dp_cec_handle_irq(struct drm_dp_aux *aux)
struct cec_adapter *adap = aux->cec.adap;
u8 flags;
 
-   if (drm_dp_dpcd_readb(aux, DP_CEC_TUNNELING_IRQ_FLAGS, &flags) < 0)
+   if (drm_dp_dpcd_read_byte(aux, DP_CEC_TUNNELING_IRQ_FLAGS, &flags) < 0)
return;
 
if (flags & DP_CEC_RX_MESSAGE_INFO_VALID)
@@ -230,7 +229,7 @@ static void drm_dp_cec_handle_irq(struct drm_dp_aux *aux)
 (DP_CEC_TX_ADDRESS_NACK_ERROR | DP_CEC_TX_DATA_NACK_ERROR))
cec_transmit_attempt_done(adap, CEC_TX_STATUS_NACK |
CEC_TX_STATUS_MAX_RETRIES);
-   drm_dp_dpcd_writeb(aux, DP_CEC_TUNNELING_IRQ_FLAGS, flags);
+   drm_dp_dpcd_write_byte(aux, DP_CEC_TUNNELING_IRQ_FLAGS, flags);
 }
 
 /**
@@ -253,13 +252,13 @@ void drm_dp_cec_irq(struct drm_dp_aux *aux)
if (!aux->cec.adap)
goto unlock;
 
-   ret = drm_dp_dpcd_readb(aux, DP_DEVICE_SERVICE_IRQ_VECTOR_ESI1,
-   &cec_irq);
+   ret = drm_dp_dpcd_read_byte(aux, DP_DEVICE_SERVICE_IRQ_VECTOR_ESI1,
+   

[PATCH RFC v4 1/6] drm/display: dp: implement new access helpers

2025-03-14 Thread Dmitry Baryshkov
From: Dmitry Baryshkov 

Existing DPCD access functions return an error code or the number of
bytes being read / write in case of partial access. However a lot of
drivers either (incorrectly) ignore partial access or mishandle error
codes. In other cases this results in a boilerplate code which compares
returned value with the size.

Implement new set of DPCD access helpers, which ignore partial access,
always return 0 or an error code.

Suggested-by: Jani Nikula 
Acked-by: Jani Nikula 
Reviewed-by: Lyude Paul 
Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/display/drm_dp_helper.c |  4 ++
 include/drm/display/drm_dp_helper.h | 92 -
 2 files changed, 94 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/display/drm_dp_helper.c 
b/drivers/gpu/drm/display/drm_dp_helper.c
index 
dbce1c3f49691fc687fee2404b723c73d533f23d..e43a8f4a252dae22eeaae1f4ca94da064303033d
 100644
--- a/drivers/gpu/drm/display/drm_dp_helper.c
+++ b/drivers/gpu/drm/display/drm_dp_helper.c
@@ -704,6 +704,8 @@ EXPORT_SYMBOL(drm_dp_dpcd_set_powered);
  * function returns -EPROTO. Errors from the underlying AUX channel transfer
  * function, with the exception of -EBUSY (which causes the transaction to
  * be retried), are propagated to the caller.
+ *
+ * In most of the cases you want to use drm_dp_dpcd_read_data() instead.
  */
 ssize_t drm_dp_dpcd_read(struct drm_dp_aux *aux, unsigned int offset,
 void *buffer, size_t size)
@@ -752,6 +754,8 @@ EXPORT_SYMBOL(drm_dp_dpcd_read);
  * function returns -EPROTO. Errors from the underlying AUX channel transfer
  * function, with the exception of -EBUSY (which causes the transaction to
  * be retried), are propagated to the caller.
+ *
+ * In most of the cases you want to use drm_dp_dpcd_write_data() instead.
  */
 ssize_t drm_dp_dpcd_write(struct drm_dp_aux *aux, unsigned int offset,
  void *buffer, size_t size)
diff --git a/include/drm/display/drm_dp_helper.h 
b/include/drm/display/drm_dp_helper.h
index 
5ae4241959f24e2c1fb581d7c7d770485d603099..21e22289d1caebe616b57a304061b12592ad41ea
 100644
--- a/include/drm/display/drm_dp_helper.h
+++ b/include/drm/display/drm_dp_helper.h
@@ -527,6 +527,64 @@ ssize_t drm_dp_dpcd_read(struct drm_dp_aux *aux, unsigned 
int offset,
 ssize_t drm_dp_dpcd_write(struct drm_dp_aux *aux, unsigned int offset,
  void *buffer, size_t size);
 
+/**
+ * drm_dp_dpcd_read_data() - read a series of bytes from the DPCD
+ * @aux: DisplayPort AUX channel (SST or MST)
+ * @offset: address of the (first) register to read
+ * @buffer: buffer to store the register values
+ * @size: number of bytes in @buffer
+ *
+ * Returns zero (0) on success, or a negative error
+ * code on failure. -EIO is returned if the request was NAKed by the sink or
+ * if the retry count was exceeded. If not all bytes were transferred, this
+ * function returns -EPROTO. Errors from the underlying AUX channel transfer
+ * function, with the exception of -EBUSY (which causes the transaction to
+ * be retried), are propagated to the caller.
+ */
+static inline int drm_dp_dpcd_read_data(struct drm_dp_aux *aux,
+   unsigned int offset,
+   void *buffer, size_t size)
+{
+   int ret;
+
+   ret = drm_dp_dpcd_read(aux, offset, buffer, size);
+   if (ret < 0)
+   return ret;
+   if (ret < size)
+   return -EPROTO;
+
+   return 0;
+}
+
+/**
+ * drm_dp_dpcd_write_data() - write a series of bytes to the DPCD
+ * @aux: DisplayPort AUX channel (SST or MST)
+ * @offset: address of the (first) register to write
+ * @buffer: buffer containing the values to write
+ * @size: number of bytes in @buffer
+ *
+ * Returns zero (0) on success, or a negative error
+ * code on failure. -EIO is returned if the request was NAKed by the sink or
+ * if the retry count was exceeded. If not all bytes were transferred, this
+ * function returns -EPROTO. Errors from the underlying AUX channel transfer
+ * function, with the exception of -EBUSY (which causes the transaction to
+ * be retried), are propagated to the caller.
+ */
+static inline int drm_dp_dpcd_write_data(struct drm_dp_aux *aux,
+unsigned int offset,
+void *buffer, size_t size)
+{
+   int ret;
+
+   ret = drm_dp_dpcd_write(aux, offset, buffer, size);
+   if (ret < 0)
+   return ret;
+   if (ret < size)
+   return -EPROTO;
+
+   return 0;
+}
+
 /**
  * drm_dp_dpcd_readb() - read a single byte from the DPCD
  * @aux: DisplayPort AUX channel
@@ -534,7 +592,8 @@ ssize_t drm_dp_dpcd_write(struct drm_dp_aux *aux, unsigned 
int offset,
  * @valuep: location where the value of the register will be stored
  *
  * Returns the number of bytes transferred (1) on success, or a negative
- * error code on failure.
+ * error code on failure

[PATCH RFC v4 2/6] drm/display: dp: change drm_dp_dpcd_read_link_status() return value

2025-03-14 Thread Dmitry Baryshkov
From: Dmitry Baryshkov 

drm_dp_dpcd_read_link_status() follows the "return error code or number
of bytes read" protocol, with the code returning less bytes than
requested in case of some errors. However most of the drivers
interpreted that as "return error code in case of any error". Switch
drm_dp_dpcd_read_link_status() to drm_dp_dpcd_read_data() and make it
follow that protocol too.

Acked-by: Jani Nikula 
Reviewed-by: Lyude Paul 
Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/amd/amdgpu/atombios_dp.c   |  8 
 .../gpu/drm/bridge/cadence/cdns-mhdp8546-core.c|  2 +-
 drivers/gpu/drm/display/drm_dp_helper.c|  7 +++
 drivers/gpu/drm/hisilicon/hibmc/dp/dp_link.c   |  4 ++--
 drivers/gpu/drm/msm/dp/dp_ctrl.c   | 24 +-
 drivers/gpu/drm/msm/dp/dp_link.c   | 18 
 drivers/gpu/drm/radeon/atombios_dp.c   |  8 
 7 files changed, 28 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c 
b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
index 
521b9faab18059ed92ebb1dc9a9847e8426e7403..492813ab1b54197ba842075bc2909984c39bd5c1
 100644
--- a/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
+++ b/drivers/gpu/drm/amd/amdgpu/atombios_dp.c
@@ -458,8 +458,8 @@ bool amdgpu_atombios_dp_needs_link_train(struct 
amdgpu_connector *amdgpu_connect
u8 link_status[DP_LINK_STATUS_SIZE];
struct amdgpu_connector_atom_dig *dig = amdgpu_connector->con_priv;
 
-   if (drm_dp_dpcd_read_link_status(&amdgpu_connector->ddc_bus->aux, 
link_status)
-   <= 0)
+   if (drm_dp_dpcd_read_link_status(&amdgpu_connector->ddc_bus->aux,
+link_status) < 0)
return false;
if (drm_dp_channel_eq_ok(link_status, dig->dp_lane_count))
return false;
@@ -616,7 +616,7 @@ amdgpu_atombios_dp_link_train_cr(struct 
amdgpu_atombios_dp_link_train_info *dp_i
drm_dp_link_train_clock_recovery_delay(dp_info->aux, 
dp_info->dpcd);
 
if (drm_dp_dpcd_read_link_status(dp_info->aux,
-dp_info->link_status) <= 0) {
+dp_info->link_status) < 0) {
DRM_ERROR("displayport link status failed\n");
break;
}
@@ -681,7 +681,7 @@ amdgpu_atombios_dp_link_train_ce(struct 
amdgpu_atombios_dp_link_train_info *dp_i
drm_dp_link_train_channel_eq_delay(dp_info->aux, dp_info->dpcd);
 
if (drm_dp_dpcd_read_link_status(dp_info->aux,
-dp_info->link_status) <= 0) {
+dp_info->link_status) < 0) {
DRM_ERROR("displayport link status failed\n");
break;
}
diff --git a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c 
b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
index 
81fad14c2cd598045d989c7d51f292bafb92c144..8d5420a5b691180c4d051a450d5d3d869a558d1a
 100644
--- a/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
+++ b/drivers/gpu/drm/bridge/cadence/cdns-mhdp8546-core.c
@@ -2305,7 +2305,7 @@ static int cdns_mhdp_update_link_status(struct 
cdns_mhdp_device *mhdp)
 * If everything looks fine, just return, as we don't handle
 * DP IRQs.
 */
-   if (ret > 0 &&
+   if (!ret &&
drm_dp_channel_eq_ok(status, mhdp->link.num_lanes) &&
drm_dp_clock_recovery_ok(status, mhdp->link.num_lanes))
goto out;
diff --git a/drivers/gpu/drm/display/drm_dp_helper.c 
b/drivers/gpu/drm/display/drm_dp_helper.c
index 
e43a8f4a252dae22eeaae1f4ca94da064303033d..410be0be233ad94702af423262a7d98e21afbfeb
 100644
--- a/drivers/gpu/drm/display/drm_dp_helper.c
+++ b/drivers/gpu/drm/display/drm_dp_helper.c
@@ -778,14 +778,13 @@ EXPORT_SYMBOL(drm_dp_dpcd_write);
  * @aux: DisplayPort AUX channel
  * @status: buffer to store the link status in (must be at least 6 bytes)
  *
- * Returns the number of bytes transferred on success or a negative error
- * code on failure.
+ * Returns a negative error code on failure or 0 on success.
  */
 int drm_dp_dpcd_read_link_status(struct drm_dp_aux *aux,
 u8 status[DP_LINK_STATUS_SIZE])
 {
-   return drm_dp_dpcd_read(aux, DP_LANE0_1_STATUS, status,
-   DP_LINK_STATUS_SIZE);
+   return drm_dp_dpcd_read_data(aux, DP_LANE0_1_STATUS, status,
+DP_LINK_STATUS_SIZE);
 }
 EXPORT_SYMBOL(drm_dp_dpcd_read_link_status);
 
diff --git a/drivers/gpu/drm/hisilicon/hibmc/dp/dp_link.c 
b/drivers/gpu/drm/hisilicon/hibmc/dp/dp_link.c
index 
f6355c16cc0ab2e28408ab8a7246f4ca17710456..a3b78b0fd53ef854a54edf40fb333766da88f1c6
 100644
--- a/drivers/gpu/drm

[PATCH RFC v4 5/6] drm/display: dp-mst-topology: use new DCPD access helpers

2025-03-14 Thread Dmitry Baryshkov
From: Dmitry Baryshkov 

Switch drm_dp_mst_topology.c to use new set of DPCD read / write helpers.

Reviewed-by: Lyude Paul 
Acked-by: Jani Nikula 
Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/display/drm_dp_mst_topology.c | 105 +-
 1 file changed, 51 insertions(+), 54 deletions(-)

diff --git a/drivers/gpu/drm/display/drm_dp_mst_topology.c 
b/drivers/gpu/drm/display/drm_dp_mst_topology.c
index 
3a1f1ffc7b5528b940485e7d96b1913bd85b0153..de3fc6090c906efce3dbaa8462d736ebf9094b34
 100644
--- a/drivers/gpu/drm/display/drm_dp_mst_topology.c
+++ b/drivers/gpu/drm/display/drm_dp_mst_topology.c
@@ -2201,15 +2201,12 @@ static int drm_dp_check_mstb_guid(struct 
drm_dp_mst_branch *mstb, guid_t *guid)
 mstb->port_parent,
 DP_GUID, sizeof(buf), buf);
} else {
-   ret = drm_dp_dpcd_write(mstb->mgr->aux,
-   DP_GUID, buf, sizeof(buf));
+   ret = drm_dp_dpcd_write_data(mstb->mgr->aux,
+DP_GUID, buf, sizeof(buf));
}
}
 
-   if (ret < 16 && ret > 0)
-   return -EPROTO;
-
-   return ret == 16 ? 0 : ret;
+   return ret;
 }
 
 static void build_mst_prop_path(const struct drm_dp_mst_branch *mstb,
@@ -2744,14 +2741,13 @@ static int drm_dp_send_sideband_msg(struct 
drm_dp_mst_topology_mgr *mgr,
do {
tosend = min3(mgr->max_dpcd_transaction_bytes, 16, total);
 
-   ret = drm_dp_dpcd_write(mgr->aux, regbase + offset,
-   &msg[offset],
-   tosend);
-   if (ret != tosend) {
-   if (ret == -EIO && retries < 5) {
-   retries++;
-   goto retry;
-   }
+   ret = drm_dp_dpcd_write_data(mgr->aux, regbase + offset,
+&msg[offset],
+tosend);
+   if (ret == -EIO && retries < 5) {
+   retries++;
+   goto retry;
+   } else if (ret < 0) {
drm_dbg_kms(mgr->dev, "failed to dpcd write %d %d\n", 
tosend, ret);
 
return -EIO;
@@ -3624,7 +3620,7 @@ enum drm_dp_mst_mode drm_dp_read_mst_cap(struct 
drm_dp_aux *aux,
if (dpcd[DP_DPCD_REV] < DP_DPCD_REV_12)
return DRM_DP_SST;
 
-   if (drm_dp_dpcd_readb(aux, DP_MSTM_CAP, &mstm_cap) != 1)
+   if (drm_dp_dpcd_read_byte(aux, DP_MSTM_CAP, &mstm_cap) < 0)
return DRM_DP_SST;
 
if (mstm_cap & DP_MST_CAP)
@@ -3679,10 +3675,10 @@ int drm_dp_mst_topology_mgr_set_mst(struct 
drm_dp_mst_topology_mgr *mgr, bool ms
mgr->mst_primary = mstb;
drm_dp_mst_topology_get_mstb(mgr->mst_primary);
 
-   ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
-DP_MST_EN |
-DP_UP_REQ_EN |
-DP_UPSTREAM_IS_SRC);
+   ret = drm_dp_dpcd_write_byte(mgr->aux, DP_MSTM_CTRL,
+DP_MST_EN |
+DP_UP_REQ_EN |
+DP_UPSTREAM_IS_SRC);
if (ret < 0)
goto out_unlock;
 
@@ -3697,7 +3693,7 @@ int drm_dp_mst_topology_mgr_set_mst(struct 
drm_dp_mst_topology_mgr *mgr, bool ms
mstb = mgr->mst_primary;
mgr->mst_primary = NULL;
/* this can fail if the device is gone */
-   drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL, 0);
+   drm_dp_dpcd_write_byte(mgr->aux, DP_MSTM_CTRL, 0);
ret = 0;
mgr->payload_id_table_cleared = false;
 
@@ -3763,8 +3759,8 @@ EXPORT_SYMBOL(drm_dp_mst_topology_queue_probe);
 void drm_dp_mst_topology_mgr_suspend(struct drm_dp_mst_topology_mgr *mgr)
 {
mutex_lock(&mgr->lock);
-   drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
-  DP_MST_EN | DP_UPSTREAM_IS_SRC);
+   drm_dp_dpcd_write_byte(mgr->aux, DP_MSTM_CTRL,
+  DP_MST_EN | DP_UPSTREAM_IS_SRC);
mutex_unlock(&mgr->lock);
flush_work(&mgr->up_req_work);
flush_work(&mgr->work);
@@ -3813,18 +3809,18 @@ int drm_dp_mst_topology_mgr_resume(struct 
drm_dp_mst_topology_mgr *mgr,
goto out_fail;
}
 
-   ret = drm_dp_dpcd_writeb(mgr->aux, DP_MSTM_CTRL,
-DP_MST_EN |
-DP_UP_REQ_EN |
-DP_UPSTREAM_IS_SRC);
+   ret = drm_dp_dpcd_write_byte(mgr->aux, DP_MSTM_CTRL,

[PATCH] drm/amdgpu: Higher log level for missing PCIe atomics caps

2025-03-14 Thread Daisuke Matsuda
Currently, ROCm requires CPUs that support PCIe atomics. The message is
more urgent for GPGPU users, meaning basic functionalities of ROCm are not
available on the node.

Signed-off-by: Daisuke Matsuda 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 4 ++--
 drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 018dfccd771b..faeef136e272 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4374,7 +4374,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
return r;
}
 
-   /* enable PCIE atomic ops */
+   /* enable PCIe atomic ops */
if (amdgpu_sriov_vf(adev)) {
if (adev->virt.fw_reserve.p_pf2vf)
adev->have_atomics_support = ((struct 
amd_sriov_msg_pf2vf_info *)
@@ -4395,7 +4395,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
}
 
if (!adev->have_atomics_support)
-   dev_info(adev->dev, "PCIE atomic ops is not supported\n");
+   dev_warn(adev->dev, "PCIe atomic ops are not supported\n");
 
/* doorbell bar mapping and doorbell index init*/
amdgpu_doorbell_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h 
b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
index b4f9c2f4e92c..c52605a07597 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
@@ -240,7 +240,7 @@ struct amd_sriov_msg_pf2vf_info {
} mm_bw_management[AMD_SRIOV_MSG_RESERVE_VCN_INST];
/* UUID info */
struct amd_sriov_msg_uuid_info uuid_info;
-   /* PCIE atomic ops support flag */
+   /* PCIe atomic ops support flag */
uint32_t pcie_atomic_ops_support_flags;
/* Portion of GPU memory occupied by VF.  MAX value is 65535, but set 
to uint32_t to maintain alignment with reserved size */
uint32_t gpu_capacity;
-- 
2.39.1



[PATCH v1 3/3] drm/ci: arm64.config: mediatek: enable PHY drivers

2025-03-14 Thread Vignesh Raman
The mediatek display driver fails to probe on mt8173-elm-hana and
mt8183-kukui-jacuzzi-juniper-sku16 in v6.14-rc4 due to missing PHY
configurations.

Enable the following PHY drivers for MediaTek platforms:
- CONFIG_PHY_MTK_HDMI=y for HDMI display
- CONFIG_PHY_MTK_MIPI_DSI=y for DSI display

Signed-off-by: Vignesh Raman 
---
 drivers/gpu/drm/ci/arm64.config | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/ci/arm64.config b/drivers/gpu/drm/ci/arm64.config
index a8fca079921b..fddfbd4d2493 100644
--- a/drivers/gpu/drm/ci/arm64.config
+++ b/drivers/gpu/drm/ci/arm64.config
@@ -193,6 +193,8 @@ CONFIG_PWM_MTK_DISP=y
 CONFIG_MTK_CMDQ=y
 CONFIG_REGULATOR_DA9211=y
 CONFIG_DRM_ANALOGIX_ANX7625=y
+CONFIG_PHY_MTK_HDMI=y
+CONFIG_PHY_MTK_MIPI_DSI=y
 
 # For nouveau.  Note that DRM must be a module so that it's loaded after NFS 
is up to provide the firmware.
 CONFIG_ARCH_TEGRA=y
-- 
2.47.2



[PATCH] drm/amdgpu: Add EEPROM I2C address support for smu v13_0_12

2025-03-14 Thread Candice Li
Add EEPROM I2C address support for smu v13_0_12.

Signed-off-by: Candice Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 3597ecd9baca34..3de89e95a636c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -161,6 +161,7 @@ static bool __is_ras_eeprom_supported(struct amdgpu_device 
*adev)
case IP_VERSION(13, 0, 10):
return true;
case IP_VERSION(13, 0, 6):
+   case IP_VERSION(13, 0, 12):
case IP_VERSION(13, 0, 14):
return (adev->gmc.is_app_apu) ? false : true;
default:
@@ -223,6 +224,7 @@ static bool __get_eeprom_i2c_addr(struct amdgpu_device 
*adev,
return true;
case IP_VERSION(13, 0, 6):
case IP_VERSION(13, 0, 10):
+   case IP_VERSION(13, 0, 12):
case IP_VERSION(13, 0, 14):
control->i2c_address = EEPROM_I2C_MADDR_4;
return true;
-- 
2.25.1



RE: [PATCH V2 1/2] drm/amdgpu/gfx: fix ref counting for ring based profile handling

2025-03-14 Thread Feng, Kenneth
[AMD Official Use Only - AMD Internal Distribution Only]

Test-by: Kenneth Feng 
Series is Reviewed-by: Kenneth Feng 

-Original Message-
From: Deucher, Alexander 
Sent: Wednesday, March 12, 2025 10:19 PM
To: amd-gfx@lists.freedesktop.org
Cc: Deucher, Alexander ; Wang, Yang(Kevin) 
; Feng, Kenneth 
Subject: [PATCH V2 1/2] drm/amdgpu/gfx: fix ref counting for ring based profile 
handling

We need to make sure the workload profile ref counts are balanced.  This isn't 
currently the case because we can increment the count on submissions, but the 
decrement may be delayed as work comes in.  Track when we enable the workload 
profile so the references are balanced.

v2: switch to a mutex and active flag

Fixes: 8fdb3958e396 ("drm/amdgpu/gfx: add ring helpers for setting workload 
profile")
Cc: Yang Wang 
Cc: Kenneth Feng 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 30 -  
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h |  2 ++
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 984e6ff6e4632..099329d15b9ff 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -2160,11 +2160,16 @@ void amdgpu_gfx_profile_idle_work_handler(struct 
work_struct *work)
for (i = 0; i < (AMDGPU_MAX_COMPUTE_RINGS * AMDGPU_MAX_GC_INSTANCES); 
++i)
fences += 
amdgpu_fence_count_emitted(&adev->gfx.compute_ring[i]);
if (!fences && !atomic_read(&adev->gfx.total_submission_cnt)) {
-   r = amdgpu_dpm_switch_power_profile(adev, profile, false);
-   if (r)
-   dev_warn(adev->dev, "(%d) failed to disable %s power 
profile mode\n", r,
-profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
-"fullscreen 3D" : "compute");
+   mutex_lock(&adev->gfx.workload_profile_mutex);
+   if (adev->gfx.workload_profile_active) {
+   r = amdgpu_dpm_switch_power_profile(adev, profile, 
false);
+   if (r)
+   dev_warn(adev->dev, "(%d) failed to disable %s 
power profile mode\n", r,
+profile == 
PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
+"fullscreen 3D" : "compute");
+   adev->gfx.workload_profile_active = false;
+   }
+   mutex_unlock(&adev->gfx.workload_profile_mutex);
} else {
schedule_delayed_work(&adev->gfx.idle_work, 
GFX_PROFILE_IDLE_TIMEOUT);
}
@@ -2184,11 +2189,16 @@ void amdgpu_gfx_profile_ring_begin_use(struct 
amdgpu_ring *ring)
atomic_inc(&adev->gfx.total_submission_cnt);

if (!cancel_delayed_work_sync(&adev->gfx.idle_work)) {
-   r = amdgpu_dpm_switch_power_profile(adev, profile, true);
-   if (r)
-   dev_warn(adev->dev, "(%d) failed to disable %s power 
profile mode\n", r,
-profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
-"fullscreen 3D" : "compute");
+   mutex_lock(&adev->gfx.workload_profile_mutex);
+   if (!adev->gfx.workload_profile_active) {
+   r = amdgpu_dpm_switch_power_profile(adev, profile, 
true);
+   if (r)
+   dev_warn(adev->dev, "(%d) failed to disable %s 
power profile mode\n", r,
+profile == 
PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
+"fullscreen 3D" : "compute");
+   adev->gfx.workload_profile_active = true;
+   }
+   mutex_unlock(&adev->gfx.workload_profile_mutex);
}
 }

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
index ddf4533614bac..75af4f25a133b 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.h
@@ -483,6 +483,8 @@ struct amdgpu_gfx {

atomic_ttotal_submission_cnt;
struct delayed_work idle_work;
+   boolworkload_profile_active;
+   struct mutexworkload_profile_mutex;
 };

 struct amdgpu_gfx_ras_reg_entry {
--
2.48.1



RE: [PATCH] drm/amdgpu: Add EEPROM I2C address support for smu v13_0_12

2025-03-14 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Candice Li
Sent: Friday, March 14, 2025 15:33
To: amd-gfx@lists.freedesktop.org
Cc: Li, Candice 
Subject: [PATCH] drm/amdgpu: Add EEPROM I2C address support for smu v13_0_12

Add EEPROM I2C address support for smu v13_0_12.

Signed-off-by: Candice Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
index 3597ecd9baca34..3de89e95a636c5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras_eeprom.c
@@ -161,6 +161,7 @@ static bool __is_ras_eeprom_supported(struct amdgpu_device 
*adev)
case IP_VERSION(13, 0, 10):
return true;
case IP_VERSION(13, 0, 6):
+   case IP_VERSION(13, 0, 12):
case IP_VERSION(13, 0, 14):
return (adev->gmc.is_app_apu) ? false : true;
default:
@@ -223,6 +224,7 @@ static bool __get_eeprom_i2c_addr(struct amdgpu_device 
*adev,
return true;
case IP_VERSION(13, 0, 6):
case IP_VERSION(13, 0, 10):
+   case IP_VERSION(13, 0, 12):
case IP_VERSION(13, 0, 14):
control->i2c_address = EEPROM_I2C_MADDR_4;
return true;
--
2.25.1



[PATCH] drm/amdgpu: Add active_umc_mask to ras init_flags

2025-03-14 Thread Candice Li
Add active_umc_mask to ras init_flags.

Signed-off-by: Candice Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 1 +
 drivers/gpu/drm/amd/amdgpu/ta_ras_if.h  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 79dad75bd0e79f..d3b05b7020c84d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1861,6 +1861,7 @@ int psp_ras_initialize(struct psp_context *psp)
if (adev->gmc.gmc_funcs->query_mem_partition_mode)
ras_cmd->ras_in_message.init_flags.nps_mode =
adev->gmc.gmc_funcs->query_mem_partition_mode(adev);
+   ras_cmd->ras_in_message.init_flags.active_umc_mask = 
adev->umc.active_mask;
 
ret = psp_ta_load(psp, &psp->ras_context.context);
 
diff --git a/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h 
b/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h
index 64891f0993666e..a3b5fda224328b 100644
--- a/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h
+++ b/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h
@@ -151,6 +151,7 @@ struct ta_ras_init_flags {
uint16_t xcc_mask;
uint8_t channel_dis_num;
uint8_t nps_mode;
+   uint32_t active_umc_mask;
 };
 
 struct ta_ras_mca_addr {
-- 
2.25.1



Re: [PATCH 4/8] drm/amdgpu: rework how isolation is enforced v2

2025-03-14 Thread Christian König
Hi Siqueira,

as discussed on the call if you can wrap your head around how the 
amdgpu_device_enforce_isolation() function works it should be trivial to write 
a new function or extend the function to insert a CPU bubble whenever the 
ownership of one of the compute rings change.

IIRC we already do load balancing between the 8 available compute rings anyway, 
so the only part missing is the CPU bubble to ensure that a queue reset only 
affects a single application.

Regards,
Christian.

Am 07.03.25 um 14:48 schrieb Christian König:
> Limiting the number of available VMIDs to enforce isolation causes some
> issues with gang submit and applying certain HW workarounds which
> require multiple VMIDs to work correctly.
>
> So instead start to track all submissions to the relevant engines in a
> per partition data structure and use the dma_fences of the submissions
> to enforce isolation similar to what a VMID limit does.
>
> v2: use ~0l for jobs without isolation to distinct it from kernel
> submissions which uses NULL for the owner. Add some warning when we
> are OOM.
>
> Signed-off-by: Christian König 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu.h| 13 ++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 98 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c| 43 --
>  drivers/gpu/drm/amd/amdgpu/amdgpu_job.c| 16 +++-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c   | 19 +
>  drivers/gpu/drm/amd/amdgpu/amdgpu_sync.h   |  1 +
>  6 files changed, 155 insertions(+), 35 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> index 87062c1adcdf..f68a348dcec9 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h
> @@ -1211,9 +1211,15 @@ struct amdgpu_device {
>   booldebug_exp_resets;
>   booldebug_disable_gpu_ring_reset;
>  
> - boolenforce_isolation[MAX_XCP];
> - /* Added this mutex for cleaner shader isolation between GFX and 
> compute processes */
> + /* Protection for the following isolation structure */
>   struct mutexenforce_isolation_mutex;
> + boolenforce_isolation[MAX_XCP];
> + struct amdgpu_isolation {
> + void*owner;
> + struct dma_fence*spearhead;
> + struct amdgpu_sync  active;
> + struct amdgpu_sync  prev;
> + } isolation[MAX_XCP];
>  
>   struct amdgpu_init_level *init_lvl;
>  
> @@ -1499,6 +1505,9 @@ void amdgpu_device_pcie_port_wreg(struct amdgpu_device 
> *adev,
>  struct dma_fence *amdgpu_device_get_gang(struct amdgpu_device *adev);
>  struct dma_fence *amdgpu_device_switch_gang(struct amdgpu_device *adev,
>   struct dma_fence *gang);
> +struct dma_fence *amdgpu_device_enforce_isolation(struct amdgpu_device *adev,
> +   struct amdgpu_ring *ring,
> +   struct amdgpu_job *job);
>  bool amdgpu_device_has_display_hardware(struct amdgpu_device *adev);
>  ssize_t amdgpu_get_soft_full_reset_mask(struct amdgpu_ring *ring);
>  ssize_t amdgpu_show_reset_mask(char *buf, uint32_t supported_reset);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 337543ec615c..3fa7788b4e12 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4272,6 +4272,11 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   mutex_init(&adev->gfx.reset_sem_mutex);
>   /* Initialize the mutex for cleaner shader isolation between GFX and 
> compute processes */
>   mutex_init(&adev->enforce_isolation_mutex);
> + for (i = 0; i < MAX_XCP; ++i) {
> + adev->isolation[i].spearhead = dma_fence_get_stub();
> + amdgpu_sync_create(&adev->isolation[i].active);
> + amdgpu_sync_create(&adev->isolation[i].prev);
> + }
>   mutex_init(&adev->gfx.kfd_sch_mutex);
>  
>   amdgpu_device_init_apu_flags(adev);
> @@ -4770,7 +4775,7 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev)
>  
>  void amdgpu_device_fini_sw(struct amdgpu_device *adev)
>  {
> - int idx;
> + int i, idx;
>   bool px;
>  
>   amdgpu_device_ip_fini(adev);
> @@ -4778,6 +4783,11 @@ void amdgpu_device_fini_sw(struct amdgpu_device *adev)
>   amdgpu_ucode_release(&adev->firmware.gpu_info_fw);
>   adev->accel_working = false;
>   dma_fence_put(rcu_dereference_protected(adev->gang_submit, true));
> + for (i = 0; i < MAX_XCP; ++i) {
> + dma_fence_put(adev->isolation[i].spearhead);
> + amdgpu_sync_free(&adev->isolation[i].active);
> + amdgpu_sync_free(&adev->isolation[i].prev);
> + }
>  
>   amdgpu_reset_fini(adev);

RE: [PATCH] drm/amdgpu: Add active_umc_mask to ras init_flags

2025-03-14 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Candice Li
Sent: Friday, March 14, 2025 15:33
To: amd-gfx@lists.freedesktop.org
Cc: Li, Candice 
Subject: [PATCH] drm/amdgpu: Add active_umc_mask to ras init_flags

Add active_umc_mask to ras init_flags.

Signed-off-by: Candice Li 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 1 +  
drivers/gpu/drm/amd/amdgpu/ta_ras_if.h  | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index 79dad75bd0e79f..d3b05b7020c84d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1861,6 +1861,7 @@ int psp_ras_initialize(struct psp_context *psp)
if (adev->gmc.gmc_funcs->query_mem_partition_mode)
ras_cmd->ras_in_message.init_flags.nps_mode =
adev->gmc.gmc_funcs->query_mem_partition_mode(adev);
+   ras_cmd->ras_in_message.init_flags.active_umc_mask =
+adev->umc.active_mask;

ret = psp_ta_load(psp, &psp->ras_context.context);

diff --git a/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h 
b/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h
index 64891f0993666e..a3b5fda224328b 100644
--- a/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h
+++ b/drivers/gpu/drm/amd/amdgpu/ta_ras_if.h
@@ -151,6 +151,7 @@ struct ta_ras_init_flags {
uint16_t xcc_mask;
uint8_t channel_dis_num;
uint8_t nps_mode;
+   uint32_t active_umc_mask;
 };

 struct ta_ras_mca_addr {
--
2.25.1



RE: [PATCH 1/7] drm/amd/amdgpu: Simplify SDMA reset mechanism by removing dynamic callbacks

2025-03-14 Thread Kim, Jonathan
[Public]

> -Original Message-
> From: Alex Deucher 
> Sent: Thursday, March 13, 2025 10:38 AM
> To: Zhang, Jesse(Jie) 
> Cc: Koenig, Christian ; amd-
> g...@lists.freedesktop.org; Deucher, Alexander ;
> Kim, Jonathan ; Zhu, Jiadong
> 
> Subject: Re: [PATCH 1/7] drm/amd/amdgpu: Simplify SDMA reset mechanism by
> removing dynamic callbacks
>
> I think as long as the locking is correct, the src shouldn't matter.
> You just need to stop the kernel queues (and save state) and evict the
> user queues (since HWS is responsible for saving the MQDs of the
> non-guilty user queues).  If KFD detected the hang (e.g., queue
> eviction fails when called for memory pressure, etc.), we just need to
> make sure that it's ok for the sdma reset routine to call evict queues
> again even if it was already called (presumably it should exit early
> since the queues are already evicted).  If KGD initiates the reset, it
> will call into KFD to evict queues.  We just need to make sure the
> evict queues function we call just evicts the queues and doesn't also
> try and reset.

If we're removing the src parameter, we need to be careful we don't end up in a 
double lock scenario in the KFD.
i.e. kgd inits reset -> kfd detects hang on kgd reset trigger and calls back to 
kgd -> amdgpu_amdkfd_suspend gets called again but is blocked on previous 
suspend call from original kgd reset (which is holding a bunch of KFD locks) 
while KFD is trying to clean up immediately.

Safest way to remove the parameter seems like to move the KFD suspend/restore 
outside of the common call and have KGD call suspend/resume when it's calling 
the common call itself.

Jon

>
> Alex
>
> On Wed, Mar 12, 2025 at 5:24 AM Zhang, Jesse(Jie) 
> wrote:
> >
> > [AMD Official Use Only - AMD Internal Distribution Only]
> >
> >
> >
> >
> >
> >
> > From: Koenig, Christian 
> > Sent: Wednesday, March 12, 2025 4:39 PM
> > To: Zhang, Jesse(Jie) ; amd-gfx@lists.freedesktop.org
> > Cc: Deucher, Alexander ; Kim, Jonathan
> ; Zhu, Jiadong 
> > Subject: Re: [PATCH 1/7] drm/amd/amdgpu: Simplify SDMA reset mechanism by
> removing dynamic callbacks
> >
> >
> >
> > Am 12.03.25 um 09:15 schrieb Zhang, Jesse(Jie):
> >
> > [SNIP9
> >
> > -
> >
> > + gfx_ring->funcs->stop_queue(adev, instance_id);
> >
> >
> >
> > Yeah that starts to look good. Question here is who is calling
> amdgpu_sdma_reset_engine()?
> >
> >
> >
> > If this call comes from engine specific code we might not need the
> start/stop_queue callbacks all together.
> >
> >
> >
> > Kfd and sdma v4/v5/v5_2 will call amdgpu_sdma_reset_engine, and
> start/stop_queue callbacks are only implemented in sdmav4/sdmav5/sdma5_2.
> >
> >
> > Why would the KFD call this as well? Because it detects an issue with a SDMA
> user queue  If yes I would rather suggest that the KFD calls the reset 
> function of
> the paging queue.
> >
> > Since this reset function is specific to the SDMA HW generation anyway you 
> > don't
> need those extra functions to abstract starting and stopping of the queue for 
> each
> HW generation.
> >
> > kfd can't call reset function directly, unless we add a parameter src  to 
> > distinguish
> kfd and kgd in reset function, like this:
> >
> > int (*reset)(struct amdgpu_ring *ring, unsigned int vmid, int src );
> >
> > As Alex said in another thread,
> >
> > We need to distinguish  kfd and kgd  in reset.
> >
> > If kfd triggers a reset, kgd must save healthy jobs and recover jobs after 
> > reset.
> >
> > If kgd triggers a reset, kgd must abandon bad jobs after reset.(and perhaps 
> > kfd
> needs to save its healthy jobs for recovery).
> >
> >
> >
> > If we can add a parameter, I am ok for that solution too.
> >
> >
> >
> > Additionally:
> >
> > For sdma6/7, when a queue reset fails, we may need to fall back to an engine
> reset for a attempt.
> >
> >
> >
> > Thanks
> >
> > Jesse
> >
> >
> > Regards,
> > Christian.
> >
> >
> >
> >
> >
> >
> > Thanks
> >
> > Jesse
> >
> >
> >
> > Regards,
> >
> > Christian.
> >
> >
> >
> >   /* Perform the SDMA reset for the specified instance */
> >
> >   ret = amdgpu_dpm_reset_sdma(adev, 1 << instance_id);
> >
> >   if (ret) {
> >
> > @@ -591,18 +573,7 @@ int amdgpu_sdma_reset_engine(struct amdgpu_device
> *adev, uint32_t instance_id, b
> >
> >   goto exit;
> >
> >   }
> >
> >
> >
> > - /* Invoke all registered post_reset callbacks */
> >
> > - list_for_each_entry(funcs, &adev->sdma.reset_callback_list, list) {
> >
> > - if (funcs->post_reset) {
> >
> > - ret = funcs->post_reset(adev, instance_id);
> >
> > - if (ret) {
> >
> > - dev_err(adev->dev,
> >
> > - "afterReset callback failed for instance %u: 
> > %d\n",
> >
> > - instance_id, ret);
> >
> > - goto exit;
> >
> > - }
> >
> > - }
> >
> > - }
> >
> > + gfx_ring->

[PATCH RFC v4 3/6] drm/display: dp: use new DCPD access helpers

2025-03-14 Thread Dmitry Baryshkov
From: Dmitry Baryshkov 

Switch drm_dp_helper.c to use new set of DPCD read / write helpers.

Reviewed-by: Lyude Paul 
Acked-by: Jani Nikula 
Signed-off-by: Dmitry Baryshkov 
---
 drivers/gpu/drm/display/drm_dp_helper.c | 296 +---
 1 file changed, 116 insertions(+), 180 deletions(-)

diff --git a/drivers/gpu/drm/display/drm_dp_helper.c 
b/drivers/gpu/drm/display/drm_dp_helper.c
index 
410be0be233ad94702af423262a7d98e21afbfeb..e2439c8a7fefe116b04aaa689b557e2387b05540
 100644
--- a/drivers/gpu/drm/display/drm_dp_helper.c
+++ b/drivers/gpu/drm/display/drm_dp_helper.c
@@ -327,7 +327,7 @@ static int __read_delay(struct drm_dp_aux *aux, const u8 
dpcd[DP_RECEIVER_CAP_SI
if (offset < DP_RECEIVER_CAP_SIZE) {
rd_interval = dpcd[offset];
} else {
-   if (drm_dp_dpcd_readb(aux, offset, &rd_interval) != 1) {
+   if (drm_dp_dpcd_read_byte(aux, offset, &rd_interval) < 0) {
drm_dbg_kms(aux->drm_dev, "%s: failed rd interval 
read\n",
aux->name);
/* arbitrary default delay */
@@ -358,7 +358,7 @@ int drm_dp_128b132b_read_aux_rd_interval(struct drm_dp_aux 
*aux)
int unit;
u8 val;
 
-   if (drm_dp_dpcd_readb(aux, DP_128B132B_TRAINING_AUX_RD_INTERVAL, &val) 
!= 1) {
+   if (drm_dp_dpcd_read_byte(aux, DP_128B132B_TRAINING_AUX_RD_INTERVAL, 
&val) < 0) {
drm_err(aux->drm_dev, "%s: failed rd interval read\n",
aux->name);
/* default to max */
@@ -807,30 +807,20 @@ int drm_dp_dpcd_read_phy_link_status(struct drm_dp_aux 
*aux,
 {
int ret;
 
-   if (dp_phy == DP_PHY_DPRX) {
-   ret = drm_dp_dpcd_read(aux,
-  DP_LANE0_1_STATUS,
-  link_status,
-  DP_LINK_STATUS_SIZE);
-
-   if (ret < 0)
-   return ret;
+   if (dp_phy == DP_PHY_DPRX)
+   return drm_dp_dpcd_read_data(aux,
+DP_LANE0_1_STATUS,
+link_status,
+DP_LINK_STATUS_SIZE);
 
-   WARN_ON(ret != DP_LINK_STATUS_SIZE);
-
-   return 0;
-   }
-
-   ret = drm_dp_dpcd_read(aux,
-  DP_LANE0_1_STATUS_PHY_REPEATER(dp_phy),
-  link_status,
-  DP_LINK_STATUS_SIZE - 1);
+   ret = drm_dp_dpcd_read_data(aux,
+   DP_LANE0_1_STATUS_PHY_REPEATER(dp_phy),
+   link_status,
+   DP_LINK_STATUS_SIZE - 1);
 
if (ret < 0)
return ret;
 
-   WARN_ON(ret != DP_LINK_STATUS_SIZE - 1);
-
/* Convert the LTTPR to the sink PHY link status layout */
memmove(&link_status[DP_SINK_STATUS - DP_LANE0_1_STATUS + 1],
&link_status[DP_SINK_STATUS - DP_LANE0_1_STATUS],
@@ -846,7 +836,7 @@ static int read_payload_update_status(struct drm_dp_aux 
*aux)
int ret;
u8 status;
 
-   ret = drm_dp_dpcd_readb(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, &status);
+   ret = drm_dp_dpcd_read_byte(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, 
&status);
if (ret < 0)
return ret;
 
@@ -873,21 +863,21 @@ int drm_dp_dpcd_write_payload(struct drm_dp_aux *aux,
int ret;
int retries = 0;
 
-   drm_dp_dpcd_writeb(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS,
-  DP_PAYLOAD_TABLE_UPDATED);
+   drm_dp_dpcd_write_byte(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS,
+  DP_PAYLOAD_TABLE_UPDATED);
 
payload_alloc[0] = vcpid;
payload_alloc[1] = start_time_slot;
payload_alloc[2] = time_slot_count;
 
-   ret = drm_dp_dpcd_write(aux, DP_PAYLOAD_ALLOCATE_SET, payload_alloc, 3);
-   if (ret != 3) {
+   ret = drm_dp_dpcd_write_data(aux, DP_PAYLOAD_ALLOCATE_SET, 
payload_alloc, 3);
+   if (ret < 0) {
drm_dbg_kms(aux->drm_dev, "failed to write payload allocation 
%d\n", ret);
goto fail;
}
 
 retry:
-   ret = drm_dp_dpcd_readb(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, &status);
+   ret = drm_dp_dpcd_read_byte(aux, DP_PAYLOAD_TABLE_UPDATE_STATUS, 
&status);
if (ret < 0) {
drm_dbg_kms(aux->drm_dev, "failed to read payload table status 
%d\n", ret);
goto fail;
@@ -1043,15 +1033,15 @@ bool drm_dp_send_real_edid_checksum(struct drm_dp_aux 
*aux,
 {
u8 link_edid_read = 0, auto_test_req = 0, test_resp = 0;
 
-   if (drm_dp_dpcd_read(aux, DP_DEVICE_SERVICE_IRQ_VECTOR,
-&auto_test_req, 1) < 1) {
+   if (drm_dp_dpcd_read_byte(aux, DP_DEVICE_SERVICE_IRQ_VECTOR,
+ &auto_test_re

RE: [PATCH 2/3] drm/amdgpu: adjust drm_firmware_drivers_only() handling

2025-03-14 Thread Russell, Kent
[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Kent Russell 



> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Thursday, March 13, 2025 9:02 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH 2/3] drm/amdgpu: adjust drm_firmware_drivers_only() handling
>
> Move to probe so we can check the PCI device type and
> only apply the drm_firmware_drivers_only() check for
> PCI DISPLAY classes.  Also add a module parameter to
> override the nomodeset kernel parameter as a workaround
> for platforms that have this hardcoded on their kernel
> command lines.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> index dd86661153582..4e1a6a249bba5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
> @@ -178,6 +178,7 @@ uint amdgpu_sdma_phase_quantum = 32;
>  char *amdgpu_disable_cu;
>  char *amdgpu_virtual_display;
>  bool enforce_isolation;
> +int amdgpu_modeset = -1;
>
>  /* Specifies the default granularity for SVM, used in buffer
>   * migration and restoration of backing memory when handling
> @@ -1040,6 +1041,13 @@ module_param_named(user_partt_mode,
> amdgpu_user_partt_mode, uint, 0444);
>  module_param(enforce_isolation, bool, 0444);
>  MODULE_PARM_DESC(enforce_isolation, "enforce process isolation between
> graphics and compute . enforce_isolation = on");
>
> +/**
> + * DOC: modeset (int)
> + * Override nomodeset (1 = override, -1 = auto). The default is -1 (auto).
> + */
> +MODULE_PARM_DESC(modeset, "Override nomodeset (1 = enable, -1 = auto)");
> +module_param_named(modeset, amdgpu_modeset, int, 0444);
> +
>  /**
>   * DOC: seamless (int)
>   * Seamless boot will keep the image on the screen during the boot process.
> @@ -2270,6 +2278,12 @@ static int amdgpu_pci_probe(struct pci_dev *pdev,
>   int ret, retry = 0, i;
>   bool supports_atomic = false;
>
> + if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA ||
> + (pdev->class >> 8) == PCI_CLASS_DISPLAY_OTHER) {
> + if (drm_firmware_drivers_only() && amdgpu_modeset == -1)
> + return -EINVAL;
> + }
> +
>   /* skip devices which are owned by radeon */
>   for (i = 0; i < ARRAY_SIZE(amdgpu_unsupported_pciidlist); i++) {
>   if (amdgpu_unsupported_pciidlist[i] == pdev->device)
> --
> 2.48.1



Re: [PATCH 1/2] drm/amdgpu/gfx: adjust workload profile handling

2025-03-14 Thread Lazar, Lijo



On 3/14/2025 7:17 PM, Alex Deucher wrote:
> No need to make the workload profile setup dependent
> on the results of cancelling the delayed work thread.
> We have all of the necessary checking in place for the
> workload profile reference counting, so separate the
> two.  As it is now, we can theoretically end up with
> the call from begin_use happening while the worker
> thread is executing which would result in the profile
> not getting set for that submission.  It should not
> affect the reference counting.
> 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 22 +++---
>  1 file changed, 11 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> index 099329d15b9ff..20424f8c4925f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> @@ -2188,18 +2188,18 @@ void amdgpu_gfx_profile_ring_begin_use(struct 
> amdgpu_ring *ring)
>  
>   atomic_inc(&adev->gfx.total_submission_cnt);
>  
> - if (!cancel_delayed_work_sync(&adev->gfx.idle_work)) {
> - mutex_lock(&adev->gfx.workload_profile_mutex);
> - if (!adev->gfx.workload_profile_active) {
> - r = amdgpu_dpm_switch_power_profile(adev, profile, 
> true);
> - if (r)
> - dev_warn(adev->dev, "(%d) failed to disable %s 
> power profile mode\n", r,
> -  profile == 
> PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> -  "fullscreen 3D" : "compute");
> - adev->gfx.workload_profile_active = true;
> - }
> - mutex_unlock(&adev->gfx.workload_profile_mutex);
> + cancel_delayed_work_sync(&adev->gfx.idle_work);
> +

To avoid locking/unlocking mutex for each begin_use, I think here we
could do like

if (adev->gfx.workload_profile_active)
return;

Thanks,
Lijo

> + mutex_lock(&adev->gfx.workload_profile_mutex);
> + if (!adev->gfx.workload_profile_active) {
> + r = amdgpu_dpm_switch_power_profile(adev, profile, true);
> + if (r)
> + dev_warn(adev->dev, "(%d) failed to disable %s power 
> profile mode\n", r,
> +  profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> +  "fullscreen 3D" : "compute");
> + adev->gfx.workload_profile_active = true;
>   }
> + mutex_unlock(&adev->gfx.workload_profile_mutex);
>  }
>  
>  void amdgpu_gfx_profile_ring_end_use(struct amdgpu_ring *ring)



[PATCH 3/3] drm/amdgpu/discovery: optionally use fw based ip discovery

2025-03-14 Thread Flora Cui
From: Alex Deucher 

On chips without native IP discovery support, use the fw binary
if available, otherwise we can continue without it.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 38 +++
 1 file changed, 30 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index fff438baf64b..cf286fde18d5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -2536,6 +2536,36 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
 {
int r;
 
+   switch (adev->asic_type) {
+   case CHIP_VEGA10:
+   case CHIP_VEGA12:
+   case CHIP_RAVEN:
+   case CHIP_VEGA20:
+   case CHIP_ARCTURUS:
+   case CHIP_ALDEBARAN:
+   /* this is not fatal.  We have a fallback below
+* if the new firmwares are not present.
+*/
+   r = amdgpu_discovery_reg_base_init(adev);
+   if (!r) {
+   amdgpu_discovery_harvest_ip(adev);
+   amdgpu_discovery_get_gfx_info(adev);
+   amdgpu_discovery_get_mall_info(adev);
+   amdgpu_discovery_get_vcn_info(adev);
+   }
+   break;
+   default:
+   r = amdgpu_discovery_reg_base_init(adev);
+   if (r)
+   return -EINVAL;
+
+   amdgpu_discovery_harvest_ip(adev);
+   amdgpu_discovery_get_gfx_info(adev);
+   amdgpu_discovery_get_mall_info(adev);
+   amdgpu_discovery_get_vcn_info(adev);
+   break;
+   }
+
switch (adev->asic_type) {
case CHIP_VEGA10:
vega10_reg_base_init(adev);
@@ -2700,14 +2730,6 @@ int amdgpu_discovery_set_ip_blocks(struct amdgpu_device 
*adev)
adev->ip_versions[XGMI_HWIP][0] = IP_VERSION(6, 1, 0);
break;
default:
-   r = amdgpu_discovery_reg_base_init(adev);
-   if (r)
-   return -EINVAL;
-
-   amdgpu_discovery_harvest_ip(adev);
-   amdgpu_discovery_get_gfx_info(adev);
-   amdgpu_discovery_get_mall_info(adev);
-   amdgpu_discovery_get_vcn_info(adev);
break;
}
 
-- 
2.43.0



RE: [PATCH 3/3] drm/amdgpu: don't free conflicting apertures for non-display devices

2025-03-14 Thread Russell, Kent
[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Kent Russell 


> -Original Message-
> From: amd-gfx  On Behalf Of Alex
> Deucher
> Sent: Thursday, March 13, 2025 9:02 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH 3/3] drm/amdgpu: don't free conflicting apertures for 
> non-display
> devices
>
> PCI_CLASS_ACCELERATOR_PROCESSING devices won't ever be
> the sysfb, so there is no need to free conflicting
> apertures.
>
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++
>  1 file changed, 11 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 1eff6252f66b4..689f1833d02b7 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> @@ -4401,10 +4401,17 @@ int amdgpu_device_init(struct amdgpu_device *adev,
>   if (r)
>   return r;
>
> - /* Get rid of things like offb */
> - r = aperture_remove_conflicting_pci_devices(adev->pdev,
> amdgpu_kms_driver.name);
> - if (r)
> - return r;
> + /*
> +  * No need to remove conflicting FBs for non-display class devices.
> +  * This prevents the sysfb from being freed accidently.
> +  */
> + if ((pdev->class >> 8) == PCI_CLASS_DISPLAY_VGA ||
> + (pdev->class >> 8) == PCI_CLASS_DISPLAY_OTHER) {
> + /* Get rid of things like offb */
> + r = aperture_remove_conflicting_pci_devices(adev->pdev,
> amdgpu_kms_driver.name);
> + if (r)
> + return r;
> + }
>
>   /* Enable TMZ based on IP_VERSION */
>   amdgpu_gmc_tmz_set(adev);
> --
> 2.48.1



Re: [PATCH 1/2] drm/amdgpu/gfx: adjust workload profile handling

2025-03-14 Thread Alex Deucher
On Fri, Mar 14, 2025 at 10:53 AM Lazar, Lijo  wrote:
>
>
>
> On 3/14/2025 7:17 PM, Alex Deucher wrote:
> > No need to make the workload profile setup dependent
> > on the results of cancelling the delayed work thread.
> > We have all of the necessary checking in place for the
> > workload profile reference counting, so separate the
> > two.  As it is now, we can theoretically end up with
> > the call from begin_use happening while the worker
> > thread is executing which would result in the profile
> > not getting set for that submission.  It should not
> > affect the reference counting.
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 22 +++---
> >  1 file changed, 11 insertions(+), 11 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > index 099329d15b9ff..20424f8c4925f 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
> > @@ -2188,18 +2188,18 @@ void amdgpu_gfx_profile_ring_begin_use(struct 
> > amdgpu_ring *ring)
> >
> >   atomic_inc(&adev->gfx.total_submission_cnt);
> >
> > - if (!cancel_delayed_work_sync(&adev->gfx.idle_work)) {
> > - mutex_lock(&adev->gfx.workload_profile_mutex);
> > - if (!adev->gfx.workload_profile_active) {
> > - r = amdgpu_dpm_switch_power_profile(adev, profile, 
> > true);
> > - if (r)
> > - dev_warn(adev->dev, "(%d) failed to disable 
> > %s power profile mode\n", r,
> > -  profile == 
> > PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
> > -  "fullscreen 3D" : "compute");
> > - adev->gfx.workload_profile_active = true;
> > - }
> > - mutex_unlock(&adev->gfx.workload_profile_mutex);
> > + cancel_delayed_work_sync(&adev->gfx.idle_work);
> > +
>
> To avoid locking/unlocking mutex for each begin_use, I think here we
> could do like
>
> if (adev->gfx.workload_profile_active)
> return;

But that is what the mutex is protecting.

Alex

>
> Thanks,
> Lijo
>
> > + mutex_lock(&adev->gfx.workload_profile_mutex);
> > + if (!adev->gfx.workload_profile_active) {
> > + r = amdgpu_dpm_switch_power_profile(adev, profile, true);
> > + if (r)
> > + dev_warn(adev->dev, "(%d) failed to disable %s power 
> > profile mode\n", r,
> > +  profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D 
> > ?
> > +  "fullscreen 3D" : "compute");
> > + adev->gfx.workload_profile_active = true;
> >   }
> > + mutex_unlock(&adev->gfx.workload_profile_mutex);
> >  }
> >
> >  void amdgpu_gfx_profile_ring_end_use(struct amdgpu_ring *ring)
>


Re: [PATCH 3/3] drm/amdgpu/discovery: optionally use fw based ip discovery

2025-03-14 Thread Lazar, Lijo



On 3/14/2025 3:24 PM, Flora Cui wrote:
> From: Alex Deucher 
> 
> On chips without native IP discovery support, use the fw binary
> if available, otherwise we can continue without it.
> 
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 38 +++
>  1 file changed, 30 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> index fff438baf64b..cf286fde18d5 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
> @@ -2536,6 +2536,36 @@ int amdgpu_discovery_set_ip_blocks(struct 
> amdgpu_device *adev)
>  {
>   int r;
>  
> + switch (adev->asic_type) {
> + case CHIP_VEGA10:
> + case CHIP_VEGA12:
> + case CHIP_RAVEN:
> + case CHIP_VEGA20:
> + case CHIP_ARCTURUS:
> + case CHIP_ALDEBARAN:
> + /* this is not fatal.  We have a fallback below
> +  * if the new firmwares are not present.
> +  */
> + r = amdgpu_discovery_reg_base_init(adev);
> + if (!r) {
> + amdgpu_discovery_harvest_ip(adev);
> + amdgpu_discovery_get_gfx_info(adev);
> + amdgpu_discovery_get_mall_info(adev);
> + amdgpu_discovery_get_vcn_info(adev);
> + }
> + break;
> + default:
> + r = amdgpu_discovery_reg_base_init(adev);
> + if (r)
> + return -EINVAL;
> +
> + amdgpu_discovery_harvest_ip(adev);
> + amdgpu_discovery_get_gfx_info(adev);
> + amdgpu_discovery_get_mall_info(adev);
> + amdgpu_discovery_get_vcn_info(adev);
> + break;
> + }
> +
>   switch (adev->asic_type) {

Looks like this fallback gets executed regardless of the
presence/absence of new firmware.

Thanks,
Lijo

>   case CHIP_VEGA10:
>   vega10_reg_base_init(adev);
> @@ -2700,14 +2730,6 @@ int amdgpu_discovery_set_ip_blocks(struct 
> amdgpu_device *adev)
>   adev->ip_versions[XGMI_HWIP][0] = IP_VERSION(6, 1, 0);
>   break;
>   default:
> - r = amdgpu_discovery_reg_base_init(adev);
> - if (r)
> - return -EINVAL;
> -
> - amdgpu_discovery_harvest_ip(adev);
> - amdgpu_discovery_get_gfx_info(adev);
> - amdgpu_discovery_get_mall_info(adev);
> - amdgpu_discovery_get_vcn_info(adev);
>   break;
>   }
>  



Re: [PATCH v2 4/9] drm/amdkfd: Validate user queue buffers

2025-03-14 Thread Dieter Faulbaum



Hello Philip,

Philip Yang  writes:


On 2025-02-12 17:42, Uwe Kleine-König wrote:

 #regzbot introduced: 68e599db7a549f010a329515f3508d8a8c3467a4
#regzbot monitor: https://bugs.debian.org/1093124

Hello,

On Thu, Jul 18, 2024 at 05:05:53PM -0400, Philip Yang wrote:

 Find user queue rptr, ring buf, eop buffer and cwsr area BOs, 
 and
check BOs are mapped on the GPU with correct size and take the 
BO

reference.

Signed-off-by: Philip Yang 
Reviewed-by: Felix Kuehling 


This change made it into v6.12-rc1 as 68e599db7a54 ("drm/amdkfd:
Validate user queue buffers"). A Debian user (Dieter Faulbaum, 
on Cc)
reported that this change introduced a regression using a gfx803 
device
resulting in a HSA exception when e.g. darktable is used. I 
didn't even
try to understand the problem, but maybe one of you have an idea 
about

the issue?!

Try this patch

https://lore.kernel.org/all/2025013412.29812-1-philip.y...@amd.com/T/


It seems for me, that your patch isn't applied in the mainline 
kernel.

What do you think, will it once happen?-)
Is it falling through cracks?



With regards
Dieter


Re: [PATCH] drm/amdgpu/pm: Handle SCLK offset correctly in overdrive for smu 14.0.2

2025-03-14 Thread Alex Deucher
Applied.  Thanks!

On Wed, Mar 12, 2025 at 11:28 PM Wang, Yang(Kevin)
 wrote:
>
> [AMD Official Use Only - AMD Internal Distribution Only]
>
> It looks good to me.
>
> Reviewed-by: Yang Wang 
>
> Best Regards,
> Kevin
>
> -Original Message-
> From: amd-gfx  On Behalf Of Tomasz 
> Pakula
> Sent: Wednesday, March 12, 2025 05:39
> To: amd-gfx@lists.freedesktop.org
> Cc: Deucher, Alexander 
> Subject: [PATCH] drm/amdgpu/pm: Handle SCLK offset correctly in overdrive for 
> smu 14.0.2
>
> Currently, it seems like the code was carried over from RDNA3 because it 
> assumes two possible values to set. RDNA4, instead of having:
> 0: min SCLK
> 1: max SCLK
> only has
> 0: SCLK offset
>
> This change makes it so it only reports current offset value instead of 
> showing possible min/max values and their indices. Moreover, it now only 
> accepts the offset as a value, without the indice index.
>
> Additionally, the lower bound was printed as %u by mistake.
>
> Old:
> OD_SCLK_OFFSET:
> 0: -500Mhz
> 1: 1000Mhz
> OD_MCLK:
> 0: 97Mhz
> 1: 1259MHz
> OD_VDDGFX_OFFSET:
> 0mV
> OD_RANGE:
> SCLK_OFFSET:-500Mhz   1000Mhz
> MCLK:  97Mhz   1500Mhz
> VDDGFX_OFFSET:-200mv  0mv
>
> New:
> OD_SCLK_OFFSET:
> 0Mhz
> OD_MCLK:
> 0: 97Mhz
> 1: 1259MHz
> OD_VDDGFX_OFFSET:
> 0mV
> OD_RANGE:
> SCLK_OFFSET:-500Mhz   1000Mhz
> MCLK:  97Mhz   1500Mhz
> VDDGFX_OFFSET:-200mv  0mv
>
> Setting this offset:
> Old: "s 1 "
> New: "s "
>
> Signed-off-by: Tomasz Pakuła 
> ---
>  .../drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c  | 59 ++-
>  1 file changed, 18 insertions(+), 41 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c 
> b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
> index 5cad09c5f2ff..62bd9647541a 100644
> --- a/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
> +++ b/drivers/gpu/drm/amd/pm/swsmu/smu14/smu_v14_0_2_ppt.c
> @@ -1193,16 +1193,9 @@ static int smu_v14_0_2_print_clk_levels(struct 
> smu_context *smu,
>  
> PP_OD_FEATURE_GFXCLK_BIT))
> break;
>
> -   PPTable_t *pptable = smu->smu_table.driver_pptable;
> -   const OverDriveLimits_t * const overdrive_upperlimits =
> -   
> &pptable->SkuTable.OverDriveLimitsBasicMax;
> -   const OverDriveLimits_t * const overdrive_lowerlimits =
> -   
> &pptable->SkuTable.OverDriveLimitsBasicMin;
> -
> size += sysfs_emit_at(buf, size, "OD_SCLK_OFFSET:\n");
> -   size += sysfs_emit_at(buf, size, "0: %dMhz\n1: %uMhz\n",
> -   overdrive_lowerlimits->GfxclkFoffset,
> -   overdrive_upperlimits->GfxclkFoffset);
> +   size += sysfs_emit_at(buf, size, "%dMhz\n",
> +   
> od_table->OverDriveTable.GfxclkFoffset);
> break;
>
> case SMU_OD_MCLK:
> @@ -1336,13 +1329,9 @@ static int smu_v14_0_2_print_clk_levels(struct 
> smu_context *smu,
> size += sysfs_emit_at(buf, size, "%s:\n", "OD_RANGE");
>
> if (smu_v14_0_2_is_od_feature_supported(smu, 
> PP_OD_FEATURE_GFXCLK_BIT)) {
> -   smu_v14_0_2_get_od_setting_limits(smu,
> - 
> PP_OD_FEATURE_GFXCLK_FMIN,
> - &min_value,
> - NULL);
> smu_v14_0_2_get_od_setting_limits(smu,
>   
> PP_OD_FEATURE_GFXCLK_FMAX,
> - NULL,
> + &min_value,
>   &max_value);
> size += sysfs_emit_at(buf, size, "SCLK_OFFSET: %7dMhz 
> %10uMhz\n",
>   min_value, max_value);
> @@ -2417,36 +2406,24 @@ static int smu_v14_0_2_od_edit_dpm_table(struct 
> smu_context *smu,
> return -ENOTSUPP;
> }
>
> -   for (i = 0; i < size; i += 2) {
> -   if (i + 2 > size) {
> -   dev_info(adev->dev, "invalid number of input 
> parameters %d\n", size);
> -   return -EINVAL;
> -   }
> -
> -   switch (input[i]) {
> -   case 1:
> -   smu_v14_0_2_get_od_setting_limits(smu,
> - 
> PP_OD_FEATURE_GFXCLK_FMAX,
> - &minimum,
> - &ma

Re: [PATCH 3/8] drm/amdgpu: overwrite signaled fence in amdgpu_sync

2025-03-14 Thread Christian König
Am 12.03.25 um 16:06 schrieb SRINIVASAN SHANMUGAM:
>
> On 3/7/2025 7:18 PM, Christian König wrote:
>> This allows using amdgpu_sync even without peeking into the fences for a
>> long time.
>>
>> Signed-off-by: Christian König 
>> ---
>>   drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 13 +
>>   1 file changed, 9 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> index 86c17a8946f5..bfe12164d27d 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c
>> @@ -135,11 +135,16 @@ static bool amdgpu_sync_add_later(struct amdgpu_sync 
>> *sync, struct dma_fence *f)
>>   struct amdgpu_sync_entry *e;
>>     hash_for_each_possible(sync->fences, e, node, f->context) {
>> -    if (unlikely(e->fence->context != f->context))
>> -    continue;
>> +    if (dma_fence_is_signaled(e->fence)) {
>> +    dma_fence_put(e->fence);
>> +    e->fence = dma_fence_get(f);
>> +    return true;
>> +    }
>>   -    amdgpu_sync_keep_later(&e->fence, f);
>> -    return true;
>> +    if (likely(e->fence->context == f->context)) {
>> +    amdgpu_sync_keep_later(&e->fence, f);
> --> The call to amdgpu_sync_keep_later(&e->fence, f); ensures that the new 
> fence is tracked for future synchronization., ie., so If the driver only 
> replaced the old fence without keeping a reference to the new one of the next 
> job or second job for example, it could lead to situations where the 
> synchronization state is lost. This could cause race conditions where one job 
> might proceed before another job has completed, leading to errors. , so this 
> is  " amdgpu_sync_keep_later(&e->fence, f);" done, for tracking purpose of 
> next job/second job, if we have multiple jobs in gang submissions of same 
> context/client, is my understanding is correct here pls?

Your questions is not easy to understand, but I think your understanding is 
correct.

Basically if you have submissions A,B,C to the same ring buffer and the HW 
guarantees that they execute in order you only need to keep a reference to C to 
wait for A and B as well.

Regards,
Christian.

>> +    return true;
>> +    }
>>   }
>>   return false;
>>   }



Re: [PATCH 6/8] drm/amdgpu: stop reserving VMIDs to enforce isolation

2025-03-14 Thread Christian König
Am 12.03.25 um 16:10 schrieb SRINIVASAN SHANMUGAM:
> On 3/7/2025 7:18 PM, Christian König wrote:
>> That was quite troublesome for gang submit. Completely drop this
>> approach and enforce the isolation separately.
>>
>> Signed-off-by: Christian König 
>> ---
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  2 +-
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c |  9 +
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 11 +++
>>  drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h |  3 +--
>>  4 files changed, 6 insertions(+), 19 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> index 2ce0c6a152a6..4375e5019418 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c
>> @@ -,7 +,7 @@ static int amdgpu_cs_vm_handling(struct 
>> amdgpu_cs_parser *p)
>>  struct drm_gpu_scheduler *sched = entity->rq->sched;
>>  struct amdgpu_ring *ring = to_amdgpu_ring(sched);
>>  
>> -if (amdgpu_vmid_uses_reserved(adev, vm, ring->vm_hub))
>> +if (amdgpu_vmid_uses_reserved(vm, ring->vm_hub))
>>  return -EINVAL;
>>  }
>>  }
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> index a194bf3347cb..9e5f6b11d923 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
>> @@ -1665,15 +1665,8 @@ static ssize_t 
>> amdgpu_gfx_set_enforce_isolation(struct device *dev,
>>  }
>>  
>>  mutex_lock(&adev->enforce_isolation_mutex);
>> -for (i = 0; i < num_partitions; i++) {
>> -if (adev->enforce_isolation[i] && !partition_values[i])
>> -/* Going from enabled to disabled */
>> -amdgpu_vmid_free_reserved(adev, AMDGPU_GFXHUB(i));
>> -else if (!adev->enforce_isolation[i] && partition_values[i])
>> -/* Going from disabled to enabled */
>> -amdgpu_vmid_alloc_reserved(adev, AMDGPU_GFXHUB(i));
>> +for (i = 0; i < num_partitions; i++)
>>  adev->enforce_isolation[i] = partition_values[i];
>> -}
>>  mutex_unlock(&adev->enforce_isolation_mutex);
>>  
>>  amdgpu_mes_update_enforce_isolation(adev);
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>> index 92ab821afc06..4c4e087230ac 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c
>> @@ -411,7 +411,7 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct 
>> amdgpu_ring *ring,
>>  if (r || !idle)
>>  goto error;
>>  
>> -if (amdgpu_vmid_uses_reserved(adev, vm, vmhub)) {
>> +if (amdgpu_vmid_uses_reserved(vm, vmhub)) {
>>  r = amdgpu_vmid_grab_reserved(vm, ring, job, &id, fence);
>>  if (r || !id)
>>  goto error;
>> @@ -464,19 +464,14 @@ int amdgpu_vmid_grab(struct amdgpu_vm *vm, struct 
>> amdgpu_ring *ring,
>>  
>>  /*
>>   * amdgpu_vmid_uses_reserved - check if a VM will use a reserved VMID
>> - * @adev: amdgpu_device pointer
>>   * @vm: the VM to check
>>   * @vmhub: the VMHUB which will be used
>>   *
>>   * Returns: True if the VM will use a reserved VMID.
>>   */
>> -bool amdgpu_vmid_uses_reserved(struct amdgpu_device *adev,
>> -   struct amdgpu_vm *vm, unsigned int vmhub)
>> +bool amdgpu_vmid_uses_reserved(struct amdgpu_vm *vm, unsigned int vmhub)
>>  {
>> -return vm->reserved_vmid[vmhub] ||
>> -(adev->enforce_isolation[(vm->root.bo->xcp_id != 
>> AMDGPU_XCP_NO_PARTITION) ?
>> - vm->root.bo->xcp_id : 0] &&
>> - AMDGPU_IS_GFXHUB(vmhub));
>> +return vm->reserved_vmid[vmhub];
>>  }
>>  
>>  int amdgpu_vmid_alloc_reserved(struct amdgpu_device *adev,
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h 
>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
>> index 4012fb2dd08a..240fa6751260 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.h
>> @@ -78,8 +78,7 @@ void amdgpu_pasid_free_delayed(struct dma_resv *resv,
>>  
>>  bool amdgpu_vmid_had_gpu_reset(struct amdgpu_device *adev,
>> struct amdgpu_vmid *id);
>> -bool amdgpu_vmid_uses_reserved(struct amdgpu_device *adev,
>> -   struct amdgpu_vm *vm, unsigned int vmhub);
>> +bool amdgpu_vmid_uses_reserved(struct amdgpu_vm *vm, unsigned int vmhub);
>>  int amdgpu_vmid_alloc_reserved(struct amdgpu_device *adev,
>>  unsigned vmhub);
>>  void amdgpu_vmid_free_reserved(struct amdgpu_device *adev,
> here we are trying to remove process isolation based on VMID's  this is 
> because the constraints that we have limited number of vimids, that could be 
> assigned to limited number of clients, now we have s

Re: [PATCH 07/11] drm/amdgpu/gfx11: add support for disable_kq

2025-03-14 Thread Khatri, Sunil

Reviewed-by: Sunil Khatri 

On 3/13/2025 8:11 PM, Alex Deucher wrote:

Plumb in support for disabling kernel queues in
GFX11.  We have to bring up a GFX queue briefly in
order to initialize the clear state.  After that
we can disable it.

v2: use ring counts per Felix' suggestion
v3: fix stream fault handler, enable EOP interrupts
v4: fix MEC interrupt offset (Sunil)

Signed-off-by: Alex Deucher 
---
  drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 191 ++---
  1 file changed, 136 insertions(+), 55 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
index 95eefd9a40d28..fde8464cbd3b3 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c
@@ -1145,6 +1145,10 @@ static int gfx_v11_0_gfx_ring_init(struct amdgpu_device 
*adev, int ring_id,
  
  	ring->ring_obj = NULL;

ring->use_doorbell = true;
+   if (adev->gfx.disable_kq) {
+   ring->no_scheduler = true;
+   ring->no_user_submission = true;
+   }
  
  	if (!ring_id)

ring->doorbell_index = adev->doorbell_index.gfx_ring0 << 1;
@@ -1577,7 +1581,7 @@ static void gfx_v11_0_alloc_ip_dump(struct amdgpu_device 
*adev)
  
  static int gfx_v11_0_sw_init(struct amdgpu_ip_block *ip_block)

  {
-   int i, j, k, r, ring_id = 0;
+   int i, j, k, r, ring_id;
int xcc_id = 0;
struct amdgpu_device *adev = ip_block->adev;
  
@@ -1710,37 +1714,42 @@ static int gfx_v11_0_sw_init(struct amdgpu_ip_block *ip_block)

return r;
}
  
-	/* set up the gfx ring */

-   for (i = 0; i < adev->gfx.me.num_me; i++) {
-   for (j = 0; j < adev->gfx.me.num_queue_per_pipe; j++) {
-   for (k = 0; k < adev->gfx.me.num_pipe_per_me; k++) {
-   if (!amdgpu_gfx_is_me_queue_enabled(adev, i, k, 
j))
-   continue;
-
-   r = gfx_v11_0_gfx_ring_init(adev, ring_id,
-   i, k, j);
-   if (r)
-   return r;
-   ring_id++;
+   if (adev->gfx.num_gfx_rings) {
+   ring_id = 0;
+   /* set up the gfx ring */
+   for (i = 0; i < adev->gfx.me.num_me; i++) {
+   for (j = 0; j < adev->gfx.me.num_queue_per_pipe; j++) {
+   for (k = 0; k < adev->gfx.me.num_pipe_per_me; 
k++) {
+   if 
(!amdgpu_gfx_is_me_queue_enabled(adev, i, k, j))
+   continue;
+
+   r = gfx_v11_0_gfx_ring_init(adev, 
ring_id,
+   i, k, j);
+   if (r)
+   return r;
+   ring_id++;
+   }
}
}
}
  
-	ring_id = 0;

-   /* set up the compute queues - allocate horizontally across pipes */
-   for (i = 0; i < adev->gfx.mec.num_mec; ++i) {
-   for (j = 0; j < adev->gfx.mec.num_queue_per_pipe; j++) {
-   for (k = 0; k < adev->gfx.mec.num_pipe_per_mec; k++) {
-   if (!amdgpu_gfx_is_mec_queue_enabled(adev, 0, i,
-k, j))
-   continue;
+   if (adev->gfx.num_compute_rings) {
+   ring_id = 0;
+   /* set up the compute queues - allocate horizontally across 
pipes */
+   for (i = 0; i < adev->gfx.mec.num_mec; ++i) {
+   for (j = 0; j < adev->gfx.mec.num_queue_per_pipe; j++) {
+   for (k = 0; k < adev->gfx.mec.num_pipe_per_mec; 
k++) {
+   if 
(!amdgpu_gfx_is_mec_queue_enabled(adev, 0, i,
+k, 
j))
+   continue;
  
-r = gfx_v11_0_compute_ring_init(adev, ring_id,

-   i, k, j);
-   if (r)
-   return r;
+   r = gfx_v11_0_compute_ring_init(adev, 
ring_id,
+   i, k, 
j);
+   if (r)
+   return r;
  
-ring_id++;

+   ring_id++;
+   }
}
}
}
@@ -4578,11 +4587,23 @@ static int gfx_v11_0_cp_resume(struct amdgpu_device 
*ad

[PATCH] drm/amdkfd: Update return value of config_dequeue_wait_counts

2025-03-14 Thread Harish Kasiviswanathan
.config_dequeue_wait_counts returns a nop case. Modify return parameter
to reflect that since the caller also needs to ignore this condition.

Fixes: <98a5af8103f> ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")

Signed-off-by: Harish Kasiviswanathan 
---
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c| 11 ---
 drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c |  9 -
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
index 3f574d82b5fc..47de572741e7 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
@@ -436,19 +436,24 @@ int pm_config_dequeue_wait_counts(struct packet_manager 
*pm,
 
retval = pm->pmf->config_dequeue_wait_counts(pm, buffer,
 cmd, value);
-   if (!retval)
+   if (retval > 0) {
retval = kq_submit_packet(pm->priv_queue);
+
+   /* If default value is modified, cache that in 
dqm->wait_times */
+   if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
+   update_dqm_wait_times(pm->dqm);
+   }
else
kq_rollback_packet(pm->priv_queue);
}
 
/* If default value is modified, cache that value in dqm->wait_times */
-   if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
+   if (retval > 0 && cmd == KFD_DEQUEUE_WAIT_INIT)
update_dqm_wait_times(pm->dqm);
 
 out:
mutex_unlock(&pm->lock);
-   return retval;
+   return retval < 0 ? retval : 0;
 }
 
 int pm_send_unmap_queue(struct packet_manager *pm,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
index d440df602393..af3a18d81900 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
@@ -310,6 +310,13 @@ static inline void 
pm_build_dequeue_wait_counts_packet_info(struct packet_manage
reg_data);
 }
 
+/* pm_config_dequeue_wait_counts_v9: Builds WRITE_DATA packet with
+ *register/value for configuring dequeue wait counts
+ *
+ * @return: -ve for failure, 0 for nop and +ve for success and buffer is
+ *  filled in with packet
+ *
+ **/
 static int pm_config_dequeue_wait_counts_v9(struct packet_manager *pm,
uint32_t *buffer,
enum kfd_config_dequeue_wait_counts_cmd cmd,
@@ -377,7 +384,7 @@ static int pm_config_dequeue_wait_counts_v9(struct 
packet_manager *pm,
 
packet->data = reg_data;
 
-   return 0;
+   return sizeof(struct pm4_mec_write_data_mmio);
 }
 
 static int pm_unmap_queues_v9(struct packet_manager *pm, uint32_t *buffer,
-- 
2.34.1



[PATCH 1/3] drm/amdgpu: check ip_discovery fw file available

2025-03-14 Thread Flora Cui
Signed-off-by: Flora Cui 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c | 31 ++-
 1 file changed, 16 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
index 967a992829bd..2b4854e03821 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_discovery.c
@@ -113,8 +113,7 @@
 #include "amdgpu_isp.h"
 #endif
 
-#define FIRMWARE_IP_DISCOVERY "amdgpu/ip_discovery.bin"
-MODULE_FIRMWARE(FIRMWARE_IP_DISCOVERY);
+MODULE_FIRMWARE("amdgpu/ip_discovery.bin");
 
 #define mmIP_DISCOVERY_VERSION  0x16A00
 #define mmRCC_CONFIG_MEMSIZE   0xde3
@@ -297,21 +296,13 @@ static int amdgpu_discovery_read_binary_from_mem(struct 
amdgpu_device *adev,
return ret;
 }
 
-static int amdgpu_discovery_read_binary_from_file(struct amdgpu_device *adev, 
uint8_t *binary)
+static int amdgpu_discovery_read_binary_from_file(struct amdgpu_device *adev,
+   uint8_t *binary,
+   const char *fw_name)
 {
const struct firmware *fw;
-   const char *fw_name;
int r;
 
-   switch (amdgpu_discovery) {
-   case 2:
-   fw_name = FIRMWARE_IP_DISCOVERY;
-   break;
-   default:
-   dev_warn(adev->dev, "amdgpu_discovery is not set properly\n");
-   return -EINVAL;
-   }
-
r = request_firmware(&fw, fw_name, adev->dev);
if (r) {
dev_err(adev->dev, "can't load firmware \"%s\"\n",
@@ -404,10 +395,19 @@ static int amdgpu_discovery_verify_npsinfo(struct 
amdgpu_device *adev,
return 0;
 }
 
+static const char *amdgpu_discovery_get_fw_name(struct amdgpu_device *adev)
+{
+   if (amdgpu_discovery == 2)
+   return "amdgpu/ip_discovery.bin";
+
+   return NULL;
+}
+
 static int amdgpu_discovery_init(struct amdgpu_device *adev)
 {
struct table_info *info;
struct binary_header *bhdr;
+   const char *fw_name;
uint16_t offset;
uint16_t size;
uint16_t checksum;
@@ -419,9 +419,10 @@ static int amdgpu_discovery_init(struct amdgpu_device 
*adev)
return -ENOMEM;
 
/* Read from file if it is the preferred option */
-   if (amdgpu_discovery == 2) {
+   fw_name = amdgpu_discovery_get_fw_name(adev);
+   if (fw_name != NULL) {
dev_info(adev->dev, "use ip discovery information from file");
-   r = amdgpu_discovery_read_binary_from_file(adev, 
adev->mman.discovery_bin);
+   r = amdgpu_discovery_read_binary_from_file(adev, 
adev->mman.discovery_bin, fw_name);
 
if (r) {
dev_err(adev->dev, "failed to read ip discovery binary 
from file\n");
-- 
2.43.0



[PATCH 1/2] drm/amdgpu/gfx: adjust workload profile handling

2025-03-14 Thread Alex Deucher
No need to make the workload profile setup dependent
on the results of cancelling the delayed work thread.
We have all of the necessary checking in place for the
workload profile reference counting, so separate the
two.  As it is now, we can theoretically end up with
the call from begin_use happening while the worker
thread is executing which would result in the profile
not getting set for that submission.  It should not
affect the reference counting.

v2: bail early if the the profile is already active (Lijo)

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c | 29 +++--
 1 file changed, 18 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
index 099329d15b9ff..4beb0609e7034 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c
@@ -2188,18 +2188,25 @@ void amdgpu_gfx_profile_ring_begin_use(struct 
amdgpu_ring *ring)
 
atomic_inc(&adev->gfx.total_submission_cnt);
 
-   if (!cancel_delayed_work_sync(&adev->gfx.idle_work)) {
-   mutex_lock(&adev->gfx.workload_profile_mutex);
-   if (!adev->gfx.workload_profile_active) {
-   r = amdgpu_dpm_switch_power_profile(adev, profile, 
true);
-   if (r)
-   dev_warn(adev->dev, "(%d) failed to disable %s 
power profile mode\n", r,
-profile == 
PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
-"fullscreen 3D" : "compute");
-   adev->gfx.workload_profile_active = true;
-   }
-   mutex_unlock(&adev->gfx.workload_profile_mutex);
+   cancel_delayed_work_sync(&adev->gfx.idle_work);
+
+   /* We can safely return early here because we've cancelled the
+* the delayed work so there is no one else to set it to false
+* and we don't care if someone else sets it to true.
+*/
+   if (adev->gfx.workload_profile_active)
+   return;
+
+   mutex_lock(&adev->gfx.workload_profile_mutex);
+   if (!adev->gfx.workload_profile_active) {
+   r = amdgpu_dpm_switch_power_profile(adev, profile, true);
+   if (r)
+   dev_warn(adev->dev, "(%d) failed to disable %s power 
profile mode\n", r,
+profile == PP_SMC_POWER_PROFILE_FULLSCREEN3D ?
+"fullscreen 3D" : "compute");
+   adev->gfx.workload_profile_active = true;
}
+   mutex_unlock(&adev->gfx.workload_profile_mutex);
 }
 
 void amdgpu_gfx_profile_ring_end_use(struct amdgpu_ring *ring)
-- 
2.48.1



Re: [PATCH 1/8] drm/amdgpu: grab an additional reference on the gang fence v2

2025-03-14 Thread SRINIVASAN SHANMUGAM

Thanks for these patches and feedback's.

The series is:
Acked-by: Srinivasan Shanmugam 

On 3/14/2025 7:50 PM, Christian König wrote:

Am 14.03.25 um 05:09 schrieb SRINIVASAN SHANMUGAM:

On 3/7/2025 7:18 PM, Christian König wrote:

We keep the gang submission fence around in adev, make sure that it
stays alive.

v2: fix memory leak on retry

Signed-off-by: Christian König 
---
   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 +-
   1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 198d29faa754..337543ec615c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -6889,18 +6889,26 @@ struct dma_fence *amdgpu_device_switch_gang(struct 
amdgpu_device *adev,
   {
   struct dma_fence *old = NULL;
   +    dma_fence_get(gang);
   do {
   dma_fence_put(old);
   old = amdgpu_device_get_gang(adev);
   if (old == gang)
   break;
   -    if (!dma_fence_is_signaled(old))
+    if (!dma_fence_is_signaled(old)) {

Here, should we need to check ?

The gang is initialized to a dummy fence on bootup. So even when there is never 
any gang submission the old value is never NULL.

Regards,
Christian.


     // Check if old fence isn't signaled
     if (old && !dma_fence_is_signaled(old)) {


+    dma_fence_put(gang);
   return old;
+    }
     } while (cmpxchg((struct dma_fence __force **)&adev->gang_submit,
    old, gang) != old);
   +    /*
+ * Drop it once for the exchanged reference in adev and once for the
+ * thread local reference acquired in amdgpu_device_get_gang().
+ */
+    dma_fence_put(old);

if (old)
     dma_fence_put(old); // Ensure to release old reference  only if it is 
valid?



   dma_fence_put(old);
   return NULL;
   }


RE: [PATCH 1/7] drm/amd/amdgpu: Simplify SDMA reset mechanism by removing dynamic callbacks

2025-03-14 Thread Kim, Jonathan
[Public]

> -Original Message-
> From: Alex Deucher 
> Sent: Friday, March 14, 2025 5:46 PM
> To: Kim, Jonathan 
> Cc: Zhang, Jesse(Jie) ; Koenig, Christian
> ; amd-gfx@lists.freedesktop.org; Deucher, Alexander
> ; Zhu, Jiadong 
> Subject: Re: [PATCH 1/7] drm/amd/amdgpu: Simplify SDMA reset mechanism by
> removing dynamic callbacks
>
> On Fri, Mar 14, 2025 at 10:43 AM Kim, Jonathan 
> wrote:
> >
> > [Public]
> >
> > > -Original Message-
> > > From: Alex Deucher 
> > > Sent: Thursday, March 13, 2025 10:38 AM
> > > To: Zhang, Jesse(Jie) 
> > > Cc: Koenig, Christian ; amd-
> > > g...@lists.freedesktop.org; Deucher, Alexander
> ;
> > > Kim, Jonathan ; Zhu, Jiadong
> > > 
> > > Subject: Re: [PATCH 1/7] drm/amd/amdgpu: Simplify SDMA reset mechanism
> by
> > > removing dynamic callbacks
> > >
> > > I think as long as the locking is correct, the src shouldn't matter.
> > > You just need to stop the kernel queues (and save state) and evict the
> > > user queues (since HWS is responsible for saving the MQDs of the
> > > non-guilty user queues).  If KFD detected the hang (e.g., queue
> > > eviction fails when called for memory pressure, etc.), we just need to
> > > make sure that it's ok for the sdma reset routine to call evict queues
> > > again even if it was already called (presumably it should exit early
> > > since the queues are already evicted).  If KGD initiates the reset, it
> > > will call into KFD to evict queues.  We just need to make sure the
> > > evict queues function we call just evicts the queues and doesn't also
> > > try and reset.
> >
> > If we're removing the src parameter, we need to be careful we don't end up 
> > in a
> double lock scenario in the KFD.
> > i.e. kgd inits reset -> kfd detects hang on kgd reset trigger and calls 
> > back to kgd -
> > amdgpu_amdkfd_suspend gets called again but is blocked on previous suspend
> call from original kgd reset (which is holding a bunch of KFD locks) while 
> KFD is
> trying to clean up immediately.
> >
>
> How would this work even with the src parameter?  I think you could
> still get into that case.  E.g., KGD detects a hang and initiates an
> engine reset.  In parallel, KFD tries to unmap all queues for some
> reason and detects a hang, so it tries to reset SDMA.  Assuming there
> is a lock that protects SDMA reset, that should work.  However, it
> requires that the prerequisites on each side don't attempt to reset
> anything.

If the KGD succeeds in suspending the KFD, then KFD will be quiesced and no KFD 
reset is possible.
If the KFD is trying to reset inside a KGD initiated reset due to a KFD suspend 
call failure or is resetting in parallel, the current and any future KGD resets 
calls will be blocked on KFD suspend calls due to KFD locks (assuming suspend 
is the first action and suspend_user_queues == true).
KFD will then always reset with suspend_user_queues == false (and always hold a 
device/suspend or reset lock during this time), which avoids the KFD suspend on 
the KGD callback and avoids deadlocking itself.  Any other KGD future calls 
should be blocked on their own calls to KFD suspend due to KFD locking until 
the KFD is done resetting, avoiding any reset deadlocks.

>
> sdma_engine_reset()
> {
>
> KFD pre reset requirements
> 1. unmap all SDMA queues (CP firmware will save non-guilty state in MQDs)
> 2. detect guilty queue
>
> KGD pre reset requirements:
> 1. stop relevant drm schedulers
> 2. detect guilt queue
> 3. save non-guilty queue state
>
> Do engine reset
>
> KGD post reset requirements:
> 1. restore non-guilty queue state
> 2. start relevant drm schedulers
>
> KFD post reset requirements
> 1. map all non-guilty SDMA queues
>
> }
>
> I think what we need on the KFD side, if we don't have it already, is
> a function to just umap queues and update guilty state, but not to
> attempt to reset anything.  Then on the KFD side, in your normal
> flows, you could call this function to unamp queues and update queue
> state, and then after calling that walk through the queue state and
> trigger any resets based on queues flagged as bad.  On the KFD side,

I'd have to look at this more carefully, but it seems like you're suggesting 
the KFD to defer reset on preemption failure via scheduling similar to GPU 
resets.
The problem is future map/unmap request in the KFD are user requested so 
they're unpredictable.
GPU resets could just "flick-away" user queue update request failures 
subsequent to the initial preemption failure because the device is dead and all 
HSA processes have to die at some point.
You can't do this with SDMA resets since other processes are still good so the 
KFD would have to immediately lock itself down (which could be problematic 
during preemption routines) if it wants to unblock and schedule the reset later.

Jon

> in a normal flow you will have called this unmap and update state
> function and now you have a list of bad queues.  You can then initiate
> an engine reset for the engine the bad que

RE: [PATCH v2] drm/amdkfd: Update return value of config_dequeue_wait_counts

2025-03-14 Thread Kasiviswanathan, Harish
[Public]

-Original Message-
From: Kim, Jonathan 
Sent: Friday, March 14, 2025 5:35 PM
To: Kasiviswanathan, Harish ; 
amd-gfx@lists.freedesktop.org
Subject: RE: [PATCH v2] drm/amdkfd: Update return value of 
config_dequeue_wait_counts

[Public]

> -Original Message-
> From: Kasiviswanathan, Harish 
> Sent: Friday, March 14, 2025 5:04 PM
> To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org
> Subject: RE: [PATCH v2] drm/amdkfd: Update return value of
> config_dequeue_wait_counts
>
> [Public]
>
> -Original Message-
> From: Kim, Jonathan 
> Sent: Friday, March 14, 2025 4:41 PM
> To: Kasiviswanathan, Harish ; amd-
> g...@lists.freedesktop.org
> Cc: Kasiviswanathan, Harish 
> Subject: RE: [PATCH v2] drm/amdkfd: Update return value of
> config_dequeue_wait_counts
>
> [Public]
>
> > -Original Message-
> > From: amd-gfx  On Behalf Of Harish
> > Kasiviswanathan
> > Sent: Friday, March 14, 2025 4:22 PM
> > To: amd-gfx@lists.freedesktop.org
> > Cc: Kasiviswanathan, Harish 
> > Subject: [PATCH v2] drm/amdkfd: Update return value of
> > config_dequeue_wait_counts
> >
> > .config_dequeue_wait_counts returns a nop case. Modify return parameter
> > to reflect that since the caller also needs to ignore this condition.
> >
> > v2: Removed redudant code.
> > Tidy up code based on review comments
> >
> > Fixes: <98a5af8103f> ("drm/amdkfd: Add pm_config_dequeue_wait_counts API")
> >
> > Signed-off-by: Harish Kasiviswanathan 
> > ---
> >  .../gpu/drm/amd/amdkfd/kfd_packet_manager.c   | 14 
> >  .../drm/amd/amdkfd/kfd_packet_manager_v9.c| 36 +++
> >  2 files changed, 29 insertions(+), 21 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> > index 3f574d82b5fc..502b89639a9f 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager.c
> > @@ -436,19 +436,19 @@ int pm_config_dequeue_wait_counts(struct
> > packet_manager *pm,
> >
> >   retval = pm->pmf->config_dequeue_wait_counts(pm, buffer,
> >cmd, value);
> > - if (!retval)
> > + if (retval > 0) {
> >   retval = kq_submit_packet(pm->priv_queue);
> > +
> > + /* If default value is modified, cache that in 
> > dqm->wait_times
> > */
> > + if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> > + update_dqm_wait_times(pm->dqm);
> > + }
> >   else
> >   kq_rollback_packet(pm->priv_queue);
> >   }
> > -
> > - /* If default value is modified, cache that value in dqm->wait_times 
> > */
> > - if (!retval && cmd == KFD_DEQUEUE_WAIT_INIT)
> > - update_dqm_wait_times(pm->dqm);
> > -
> >  out:
> >   mutex_unlock(&pm->lock);
> > - return retval;
> > + return retval < 0 ? retval : 0;
> >  }
> >
> >  int pm_send_unmap_queue(struct packet_manager *pm,
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> > b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> > index d440df602393..3c6134d61b2b 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c
> > @@ -310,6 +310,13 @@ static inline void
> > pm_build_dequeue_wait_counts_packet_info(struct packet_manage
> >   reg_data);
> >  }
> >
> > +/* pm_config_dequeue_wait_counts_v9: Builds WRITE_DATA packet with
> > + *register/value for configuring dequeue wait counts
> > + *
> > + * @return: -ve for failure, 0 for nop and +ve for success and buffer is
> > + *  filled in with packet
> > + *
> > + **/
> >  static int pm_config_dequeue_wait_counts_v9(struct packet_manager *pm,
> >   uint32_t *buffer,
> >   enum kfd_config_dequeue_wait_counts_cmd cmd,
> > @@ -321,24 +328,25 @@ static int pm_config_dequeue_wait_counts_v9(struct
> > packet_manager *pm,
> >
> >   switch (cmd) {
> >   case KFD_DEQUEUE_WAIT_INIT: {
> > - uint32_t sch_wave = 0, que_sleep = 0;
> > - /* Reduce CP_IQ_WAIT_TIME2.QUE_SLEEP to 0x1 from default
> > 0x40.
> > + uint32_t sch_wave = 0, que_sleep = 1;
> > +
> > + if (KFD_GC_VERSION(pm->dqm->dev) < IP_VERSION(9, 4, 1) ||
> > + KFD_GC_VERSION(pm->dqm->dev) > IP_VERSION(10, 0, 0))
> > + return 0;
>
> From my last comment, I suggested to put this at the beginning of the non-v9
> pm_config_dequeue_wait_counts call that calls this function.
> Then you don't have to make the return value more complicated than it 
> currently is.
>
> [HK]: Ah ok. I didn't really want to put asic specific code in there but in 
> this case
> code it is fine as you mentioned we have already overloading these functions.

Right.  Which is why I also suggested that you could create a front loa