[PATCH] drm/amdgpu/vce2: fix ip block reference

2025-03-09 Thread Alex Deucher
Need to use the correct IP block type.  VCE vs VCN.
Fixes mclk issues on Hawaii.

Suggested by selendym.

Fixes: 82ae6619a450a ("drm/amdgpu: update the handle ptr in wait_for_idle")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3997
Cc: Sunil Khatri 
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/vce_v2_0.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
index 8c8c02606d25a..bee3e904a6bc7 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
@@ -284,7 +284,7 @@ static int vce_v2_0_stop(struct amdgpu_device *adev)
return 0;
}
 
-   ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_VCN);
+   ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_VCE);
if (!ip_block)
return -EINVAL;
 
-- 
2.48.1



Re: SI DMA: clarification on naming convention

2025-03-09 Thread Alex Deucher
On Sat, Mar 8, 2025 at 3:39 AM Alexandre Demers
 wrote:
>
> Hi,
>
> While working on cleaning up sid.h, si_enums.h and some other SI
> related files, I've been scratching my head about why SI DMA files and
> variables were named "DMA" compared to CIK and over where "SDMA" is
> used.
>
> While I understand that a new system DMA was introduced with CIK,
> isn't SI DMA also a "system DMA"? Could we use the same naming
> convention and talk about sDMA, name defines values, shifts and masks?
> Could si_dma.c/.h be renamed to si_sdma.c/.h? Was the naming
> difference introduced to CIK so different that the naming covention
> needed to be modified?

The SDMA IP was first added on CIK.  SI used the older paging DMA IP
that first appeared on r6xx parts.  While they are conceptually
similar (general purpose DMA copy/fill engines), the underlying
hardware and firmware was much different.  I'm not sure what the
advantage would be to renaming it at this point other than code churn.

Alex


[PATCH v4 0/3] Uniformize defines between DCE6, DCE8 and DCE10

2025-03-09 Thread Alexandre Demers
Keep a uniform way of where and how variables are defined between
DCE6, DCE8 and DCE10. It is easier to understand the code, their
similarities and their modifications.

Since sid.h is being wired up in dce_v6_0.c, duplicated defines need to be
cleaned up in sid.h and si_enums.h.

V4:
* Reorder patches so each one compiles properly.
* Remove drm-amdgpu-add-defines-for-pin_offsets-in-DCE8.patch since it was 
already approved and applied.

A bigger round of cleanup is coming to remove more of the duplicated and
unused defines found in sid.h and si_enums.h and continue the uniformization.

Alexandre Demers (3):
  drm/amdgpu: fix warning and errors caused by duplicated defines in
sid.h
  drm/amdgpu: move and fix X_GB_ADDR_CONFIG_GOLDEN values
  drm/amdgpu: add or move defines for DCE6 in sid.h

 drivers/gpu/drm/amd/amdgpu/dce_v6_0.c |  73 ++---
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c |  15 +-
 drivers/gpu/drm/amd/amdgpu/si.c   |  68 ++---
 drivers/gpu/drm/amd/amdgpu/si_enums.h |  12 -
 drivers/gpu/drm/amd/amdgpu/sid.h  | 369 ++
 5 files changed, 111 insertions(+), 426 deletions(-)

-- 
2.48.1



Re: [PATCH v3 1/4] drm/amdgpu: add or move defines for DCE6 in sid.h

2025-03-09 Thread Alexandre Demers
On Sat, Mar 8, 2025 at 7:32 PM Alexandre Demers
 wrote:
>
> On Thu, Mar 6, 2025 at 10:19 AM Alex Deucher  wrote:
> >
> > On Wed, Mar 5, 2025 at 9:08 PM Alexandre Demers
> >  wrote:
> > >
> > > For coherence with DCE8 et DCE10, add or move some values under sid.h.
> > >
> > > Signed-off-by: Alexandre Demers 
> >
> > This change doesn't build.  Please adjust the order of the patches as
> > needed to make sure they all build.
> >
> > Alex
>
> Yeah, adding sid.h should probably be at the end, once all changes are
> made. I'll look at it.
>
> Thanks for noticing.
> Alexandre

V4 is sent, it compiles at each commit. I did not include the DCE8
patch (4/4) since it was accepted in V3 and applied.

Alexandre
> >
> > > ---
> > >  drivers/gpu/drm/amd/amdgpu/dce_v6_0.c | 63 ++-
> > >  drivers/gpu/drm/amd/amdgpu/si_enums.h |  7 ---
> > >  drivers/gpu/drm/amd/amdgpu/sid.h  | 29 +---
> > >  3 files changed, 55 insertions(+), 44 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c 
> > > b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> > > index a72fd7220081..185401d66961 100644
> > > --- a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> > > +++ b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
> > > @@ -32,6 +32,7 @@
> > >  #include "amdgpu.h"
> > >  #include "amdgpu_pm.h"
> > >  #include "amdgpu_i2c.h"
> > > +#include "sid.h"
> > >  #include "atom.h"
> > >  #include "amdgpu_atombios.h"
> > >  #include "atombios_crtc.h"
> > > @@ -59,31 +60,31 @@ static void dce_v6_0_set_irq_funcs(struct 
> > > amdgpu_device *adev);
> > >
> > >  static const u32 crtc_offsets[6] =
> > >  {
> > > -   SI_CRTC0_REGISTER_OFFSET,
> > > -   SI_CRTC1_REGISTER_OFFSET,
> > > -   SI_CRTC2_REGISTER_OFFSET,
> > > -   SI_CRTC3_REGISTER_OFFSET,
> > > -   SI_CRTC4_REGISTER_OFFSET,
> > > -   SI_CRTC5_REGISTER_OFFSET
> > > +   CRTC0_REGISTER_OFFSET,
> > > +   CRTC1_REGISTER_OFFSET,
> > > +   CRTC2_REGISTER_OFFSET,
> > > +   CRTC3_REGISTER_OFFSET,
> > > +   CRTC4_REGISTER_OFFSET,
> > > +   CRTC5_REGISTER_OFFSET
> > >  };
> > >
> > >  static const u32 hpd_offsets[] =
> > >  {
> > > -   mmDC_HPD1_INT_STATUS - mmDC_HPD1_INT_STATUS,
> > > -   mmDC_HPD2_INT_STATUS - mmDC_HPD1_INT_STATUS,
> > > -   mmDC_HPD3_INT_STATUS - mmDC_HPD1_INT_STATUS,
> > > -   mmDC_HPD4_INT_STATUS - mmDC_HPD1_INT_STATUS,
> > > -   mmDC_HPD5_INT_STATUS - mmDC_HPD1_INT_STATUS,
> > > -   mmDC_HPD6_INT_STATUS - mmDC_HPD1_INT_STATUS,
> > > +   HPD0_REGISTER_OFFSET,
> > > +   HPD1_REGISTER_OFFSET,
> > > +   HPD2_REGISTER_OFFSET,
> > > +   HPD3_REGISTER_OFFSET,
> > > +   HPD4_REGISTER_OFFSET,
> > > +   HPD5_REGISTER_OFFSET
> > >  };
> > >
> > >  static const uint32_t dig_offsets[] = {
> > > -   SI_CRTC0_REGISTER_OFFSET,
> > > -   SI_CRTC1_REGISTER_OFFSET,
> > > -   SI_CRTC2_REGISTER_OFFSET,
> > > -   SI_CRTC3_REGISTER_OFFSET,
> > > -   SI_CRTC4_REGISTER_OFFSET,
> > > -   SI_CRTC5_REGISTER_OFFSET,
> > > +   CRTC0_REGISTER_OFFSET,
> > > +   CRTC1_REGISTER_OFFSET,
> > > +   CRTC2_REGISTER_OFFSET,
> > > +   CRTC3_REGISTER_OFFSET,
> > > +   CRTC4_REGISTER_OFFSET,
> > > +   CRTC5_REGISTER_OFFSET,
> > > (0x13830 - 0x7030) >> 2,
> > >  };
> > >
> > > @@ -1359,13 +1360,13 @@ static void dce_v6_0_audio_enable(struct 
> > > amdgpu_device *adev,
> > >
> > >  static const u32 pin_offsets[7] =
> > >  {
> > > -   (0x1780 - 0x1780),
> > > -   (0x1786 - 0x1780),
> > > -   (0x178c - 0x1780),
> > > -   (0x1792 - 0x1780),
> > > -   (0x1798 - 0x1780),
> > > -   (0x179d - 0x1780),
> > > -   (0x17a4 - 0x1780),
> > > +   AUD0_REGISTER_OFFSET,
> > > +   AUD1_REGISTER_OFFSET,
> > > +   AUD2_REGISTER_OFFSET,
> > > +   AUD3_REGISTER_OFFSET,
> > > +   AUD4_REGISTER_OFFSET,
> > > +   AUD5_REGISTER_OFFSET,
> > > +   AUD6_REGISTER_OFFSET,
> > >  };
> > >
> > >  static int dce_v6_0_audio_init(struct amdgpu_device *adev)
> > > @@ -2876,22 +2877,22 @@ static void 
> > > dce_v6_0_set_crtc_vblank_interrupt_state(struct amdgpu_device *adev,
> > >
> > > switch (crtc) {
> > > case 0:
> > > -   reg_block = SI_CRTC0_REGISTER_OFFSET;
> > > +   reg_block = CRTC0_REGISTER_OFFSET;
> > > break;
> > > case 1:
> > > -   reg_block = SI_CRTC1_REGISTER_OFFSET;
> > > +   reg_block = CRTC1_REGISTER_OFFSET;
> > > break;
> > > case 2:
> > > -   reg_block = SI_CRTC2_REGISTER_OFFSET;
> > > +   reg_block = CRTC2_REGISTER_OFFSET;
> > > break;
> > > case 3:
> > > -   reg_block = SI_CRTC3_REGISTER_OFFSET;
> > > +   reg_block = CRTC3_REGISTER_OFFSET;
> > > break;
> > > case 4:
> > > -   reg_block = SI_CRTC4_REGISTER_OFFSET;
> > > +   reg_block = CRTC4_REGISTER_OFFSET;
> > > b

Re: SI DMA: clarification on naming convention

2025-03-09 Thread Alexandre Demers
On Sun, Mar 9, 2025 at 12:47 PM Alex Deucher  wrote:
>
> On Sat, Mar 8, 2025 at 3:39 AM Alexandre Demers
>  wrote:
> >
> > Hi,
> >
> > While working on cleaning up sid.h, si_enums.h and some other SI
> > related files, I've been scratching my head about why SI DMA files and
> > variables were named "DMA" compared to CIK and over where "SDMA" is
> > used.
> >
> > While I understand that a new system DMA was introduced with CIK,
> > isn't SI DMA also a "system DMA"? Could we use the same naming
> > convention and talk about sDMA, name defines values, shifts and masks?
> > Could si_dma.c/.h be renamed to si_sdma.c/.h? Was the naming
> > difference introduced to CIK so different that the naming covention
> > needed to be modified?
>
> The SDMA IP was first added on CIK.  SI used the older paging DMA IP
> that first appeared on r6xx parts.  While they are conceptually
> similar (general purpose DMA copy/fill engines), the underlying
> hardware and firmware was much different.  I'm not sure what the
> advantage would be to renaming it at this point other than code churn.
>
> Alex

The only benefit would be to have closer define names, thus making
comparisons between chip families easier. But the differences are not
so important that they are actually preventing understanding what is
going on. I'll keep things as they are then.

Alexandre


[PATCH v4 2/3] drm/amdgpu: fix SI's GB_ADDR_CONFIG_GOLDEN values and wire up sid.h in GFX6

2025-03-09 Thread Alexandre Demers
By wiring up sid.h in GFX6, we end up with a few duplicated defines such as
the golden registers. Let's clean this up.

[TAHITI,VERDE, HAINAN]_GB_ADDR_CONFIG_GOLDEN were defined both in sid.h
and under si_enums.h, with different values. Keep the values used under radeon
and move them under gfx_v6_0.c where they are used (as it is done under cik)

Signed-off-by: Alexandre Demers 
---
 drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c | 15 +--
 drivers/gpu/drm/amd/amdgpu/si_enums.h |  6 --
 drivers/gpu/drm/amd/amdgpu/sid.h  |  4 
 3 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
index 41f50bf380c4..4476fb816659 100644
--- a/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c
@@ -28,19 +28,30 @@
 #include "amdgpu_gfx.h"
 #include "amdgpu_ucode.h"
 #include "clearstate_si.h"
+#include "si.h"
+#include "sid.h"
+
 #include "bif/bif_3_0_d.h"
 #include "bif/bif_3_0_sh_mask.h"
+
 #include "oss/oss_1_0_d.h"
 #include "oss/oss_1_0_sh_mask.h"
+
 #include "gca/gfx_6_0_d.h"
 #include "gca/gfx_6_0_sh_mask.h"
+#include "gca/gfx_7_2_enum.h"
+
 #include "gmc/gmc_6_0_d.h"
 #include "gmc/gmc_6_0_sh_mask.h"
+
 #include "dce/dce_6_0_d.h"
 #include "dce/dce_6_0_sh_mask.h"
-#include "gca/gfx_7_2_enum.h"
+
 #include "si_enums.h"
-#include "si.h"
+
+#define TAHITI_GB_ADDR_CONFIG_GOLDEN0x12011003
+#define VERDE_GB_ADDR_CONFIG_GOLDEN 0x12010002
+#define HAINAN_GB_ADDR_CONFIG_GOLDEN0x02010001
 
 static void gfx_v6_0_set_ring_funcs(struct amdgpu_device *adev);
 static void gfx_v6_0_set_irq_funcs(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/si_enums.h 
b/drivers/gpu/drm/amd/amdgpu/si_enums.h
index 4e935baa7b91..cd9b0a654991 100644
--- a/drivers/gpu/drm/amd/amdgpu/si_enums.h
+++ b/drivers/gpu/drm/amd/amdgpu/si_enums.h
@@ -128,8 +128,6 @@
 #define SI_CRTC4_REGISTER_OFFSET0x2c00
 #define SI_CRTC5_REGISTER_OFFSET0x2f00
 
-#define DMA0_REGISTER_OFFSET 0x000
-#define DMA1_REGISTER_OFFSET 0x200
 #define ES_AND_GS_AUTO   3
 #define RADEON_PACKET_TYPE3  3
 #define CE_PARTITION_BASE3
@@ -161,10 +159,6 @@
 #define RLC_SAVE_AND_RESTORE_STARTING_OFFSET 0x90
 #define RLC_CLEAR_STATE_DESCRIPTOR_OFFSET0x3D
 
-#define TAHITI_GB_ADDR_CONFIG_GOLDEN0x12011003
-#define VERDE_GB_ADDR_CONFIG_GOLDEN 0x02010002
-#define HAINAN_GB_ADDR_CONFIG_GOLDEN0x02011003
-
 #define PACKET3(op, n)  ((RADEON_PACKET_TYPE3 << 30) |  \
  (((op) & 0xFF) << 8) | \
  ((n) & 0x3FFF) << 16)
diff --git a/drivers/gpu/drm/amd/amdgpu/sid.h b/drivers/gpu/drm/amd/amdgpu/sid.h
index 2218fd99ee83..bf228a1dedff 100644
--- a/drivers/gpu/drm/amd/amdgpu/sid.h
+++ b/drivers/gpu/drm/amd/amdgpu/sid.h
@@ -26,10 +26,6 @@
 
 #define TAHITI_RB_BITMAP_WIDTH_PER_SH  2
 
-#define TAHITI_GB_ADDR_CONFIG_GOLDEN0x12011003
-#define VERDE_GB_ADDR_CONFIG_GOLDEN 0x12010002
-#define HAINAN_GB_ADDR_CONFIG_GOLDEN0x02010001
-
 #define SI_MAX_SH_GPRS 256
 #define SI_MAX_TEMP_GPRS   16
 #define SI_MAX_SH_THREADS  256
-- 
2.48.1



Re: [PATCH] drm/amdgpu/vce2: fix ip block reference

2025-03-09 Thread Boyuan Zhang


On 2025-03-09 12:31, Alex Deucher wrote:

Need to use the correct IP block type.  VCE vs VCN.
Fixes mclk issues on Hawaii.

Suggested by selendym.

Fixes: 82ae6619a450a ("drm/amdgpu: update the handle ptr in wait_for_idle")
Closes:https://gitlab.freedesktop.org/drm/amd/-/issues/3997
Cc: Sunil Khatri
Signed-off-by: Alex Deucher



Reviewed-by: Boyuan Zhang  





---
  drivers/gpu/drm/amd/amdgpu/vce_v2_0.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c 
b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
index 8c8c02606d25a..bee3e904a6bc7 100644
--- a/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/vce_v2_0.c
@@ -284,7 +284,7 @@ static int vce_v2_0_stop(struct amdgpu_device *adev)
return 0;
}
  
-	ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_VCN);

+   ip_block = amdgpu_device_ip_get_ip_block(adev, AMD_IP_BLOCK_TYPE_VCE);
if (!ip_block)
return -EINVAL;
  

[PATCH v3] drm/amd/amdkfd: Evict all queues even HWS remove queue failed

2025-03-09 Thread Yifan Zha
[Why]
If reset is detected and kfd need to evict working queues, HWS moving queue 
will be failed.
Then remaining queues are not evicted and in active state.

After reset done, kfd uses HWS to termination remaining activated queues but 
HWS is resetted.
So remove queue will be failed again.

[How]
Keep removing all queues even if HWS returns failed.
It will not affect cpsch as it checks reset_domain->sem.

v2: If any queue failed, evict queue returns error.
v3: Declare err inside the if-block.

Signed-off-by: Yifan Zha 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index 885e0e9cf21b..2ed003d3ff0e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -1221,11 +1221,13 @@ static int evict_process_queues_cpsch(struct 
device_queue_manager *dqm,
decrement_queue_count(dqm, qpd, q);
 
if (dqm->dev->kfd->shared_resources.enable_mes) {
-   retval = remove_queue_mes(dqm, q, qpd);
-   if (retval) {
+   int err;
+
+   err = remove_queue_mes(dqm, q, qpd);
+   if (err) {
dev_err(dev, "Failed to evict queue %d\n",
q->properties.queue_id);
-   goto out;
+   retval = err;
}
}
}
-- 
2.25.1



[PATCH v4 1/3] drm/amdgpu: prepare DCE6 uniformisation with DCE8 and DCE10

2025-03-09 Thread Alexandre Demers
Let's begin the cleanup in sid.h to prevent warnings and errors when wiring
sid.h into dce_v6_0.c.

This is a bigger cleanup.
Many defines found under sid.h have already been properly moved
into the different "_d.h" and "_sh_mask.h", so they should have been
already removed from sid.h and properly linked in where needed.

Signed-off-by: Alexandre Demers 
---
 drivers/gpu/drm/amd/amdgpu/dce_v6_0.c |  10 +-
 drivers/gpu/drm/amd/amdgpu/si.c   |  68 +++---
 drivers/gpu/drm/amd/amdgpu/sid.h  | 336 +-
 3 files changed, 43 insertions(+), 371 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c 
b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
index a72fd7220081..47b0e4848a56 100644
--- a/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
+++ b/drivers/gpu/drm/amd/amdgpu/dce_v6_0.c
@@ -40,18 +40,24 @@
 #include "amdgpu_connectors.h"
 #include "amdgpu_display.h"
 
+#include "dce_v6_0.h"
+
 #include "bif/bif_3_0_d.h"
 #include "bif/bif_3_0_sh_mask.h"
+
 #include "oss/oss_1_0_d.h"
 #include "oss/oss_1_0_sh_mask.h"
+
 #include "gca/gfx_6_0_d.h"
 #include "gca/gfx_6_0_sh_mask.h"
+#include "gca/gfx_7_2_enum.h"
+
 #include "gmc/gmc_6_0_d.h"
 #include "gmc/gmc_6_0_sh_mask.h"
+
 #include "dce/dce_6_0_d.h"
 #include "dce/dce_6_0_sh_mask.h"
-#include "gca/gfx_7_2_enum.h"
-#include "dce_v6_0.h"
+
 #include "si_enums.h"
 
 static void dce_v6_0_set_display_funcs(struct amdgpu_device *adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index 79307ae3e477..a18b3ece635b 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -1124,41 +1124,41 @@ static struct amdgpu_allowed_register_entry 
si_allowed_read_registers[] = {
{mmCP_STALLED_STAT3},
{GB_ADDR_CONFIG},
{MC_ARB_RAMCFG},
-   {GB_TILE_MODE0},
-   {GB_TILE_MODE1},
-   {GB_TILE_MODE2},
-   {GB_TILE_MODE3},
-   {GB_TILE_MODE4},
-   {GB_TILE_MODE5},
-   {GB_TILE_MODE6},
-   {GB_TILE_MODE7},
-   {GB_TILE_MODE8},
-   {GB_TILE_MODE9},
-   {GB_TILE_MODE10},
-   {GB_TILE_MODE11},
-   {GB_TILE_MODE12},
-   {GB_TILE_MODE13},
-   {GB_TILE_MODE14},
-   {GB_TILE_MODE15},
-   {GB_TILE_MODE16},
-   {GB_TILE_MODE17},
-   {GB_TILE_MODE18},
-   {GB_TILE_MODE19},
-   {GB_TILE_MODE20},
-   {GB_TILE_MODE21},
-   {GB_TILE_MODE22},
-   {GB_TILE_MODE23},
-   {GB_TILE_MODE24},
-   {GB_TILE_MODE25},
-   {GB_TILE_MODE26},
-   {GB_TILE_MODE27},
-   {GB_TILE_MODE28},
-   {GB_TILE_MODE29},
-   {GB_TILE_MODE30},
-   {GB_TILE_MODE31},
+   {mmGB_TILE_MODE0},
+   {mmGB_TILE_MODE1},
+   {mmGB_TILE_MODE2},
+   {mmGB_TILE_MODE3},
+   {mmGB_TILE_MODE4},
+   {mmGB_TILE_MODE5},
+   {mmGB_TILE_MODE6},
+   {mmGB_TILE_MODE7},
+   {mmGB_TILE_MODE8},
+   {mmGB_TILE_MODE9},
+   {mmGB_TILE_MODE10},
+   {mmGB_TILE_MODE11},
+   {mmGB_TILE_MODE12},
+   {mmGB_TILE_MODE13},
+   {mmGB_TILE_MODE14},
+   {mmGB_TILE_MODE15},
+   {mmGB_TILE_MODE16},
+   {mmGB_TILE_MODE17},
+   {mmGB_TILE_MODE18},
+   {mmGB_TILE_MODE19},
+   {mmGB_TILE_MODE20},
+   {mmGB_TILE_MODE21},
+   {mmGB_TILE_MODE22},
+   {mmGB_TILE_MODE23},
+   {mmGB_TILE_MODE24},
+   {mmGB_TILE_MODE25},
+   {mmGB_TILE_MODE26},
+   {mmGB_TILE_MODE27},
+   {mmGB_TILE_MODE28},
+   {mmGB_TILE_MODE29},
+   {mmGB_TILE_MODE30},
+   {mmGB_TILE_MODE31},
{CC_RB_BACKEND_DISABLE, true},
-   {GC_USER_RB_BACKEND_DISABLE, true},
-   {PA_SC_RASTER_CONFIG, true},
+   {mmGC_USER_RB_BACKEND_DISABLE, true},
+   {mmPA_SC_RASTER_CONFIG, true},
 };
 
 static uint32_t si_get_register_value(struct amdgpu_device *adev,
diff --git a/drivers/gpu/drm/amd/amdgpu/sid.h b/drivers/gpu/drm/amd/amdgpu/sid.h
index 9a39cbfe6db9..2218fd99ee83 100644
--- a/drivers/gpu/drm/amd/amdgpu/sid.h
+++ b/drivers/gpu/drm/amd/amdgpu/sid.h
@@ -696,18 +696,6 @@
 #define HDP_REG_COHERENCY_FLUSH_CNTL   0x1528
 
 /* DCE6 ELD audio interface */
-#define AZ_F0_CODEC_ENDPOINT_INDEX   0x1780
-#   define AZ_ENDPOINT_REG_INDEX(x)  (((x) & 0xff) << 0)
-#   define AZ_ENDPOINT_REG_WRITE_EN  (1 << 8)
-#define AZ_F0_CODEC_ENDPOINT_DATA0x1781
-
-#define AZ_F0_CODEC_PIN_CONTROL_CHANNEL_SPEAKER  0x25
-#defineSPEAKER_ALLOCATION(x)   (((x) & 0x7f) 
<< 0)
-#defineSPEAKER_ALLOCATION_MASK (0x7f << 0)
-#defineSPEAKER_ALLOCATION_SHIFT0
-#defineHDMI_CONNECTION (1 << 16)
-#defineDP_CONNECTION   (1 << 17)
-
 #define AZ_F0_CODEC_PIN_CONTROL_AUDIO_DESCRIPTOR00x28 /* LPCM */
 #define AZ_F0_CODEC_PIN_CONTROL_AUDIO_DESCRIPTOR10x29 /* AC3 */
 #define AZ_F0_CODEC_PIN_CONTROL_AUDIO_DE

Re: [PATCH v4 0/3] Uniformize defines between DCE6, DCE8 and DCE10

2025-03-09 Thread Alexandre Demers
On Sun, Mar 9, 2025 at 12:49 PM Alexandre Demers
 wrote:
>
> Keep a uniform way of where and how variables are defined between
> DCE6, DCE8 and DCE10. It is easier to understand the code, their
> similarities and their modifications.
>
> Since sid.h is being wired up in dce_v6_0.c, duplicated defines need to be
> cleaned up in sid.h and si_enums.h.
>
> V4:
> * Reorder patches so each one compiles properly.
> * Remove drm-amdgpu-add-defines-for-pin_offsets-in-DCE8.patch since it was
> already approved and applied.
>
> A bigger round of cleanup is coming to remove more of the duplicated and
> unused defines found in sid.h and si_enums.h and continue the uniformization.
>
> Alexandre Demers (3):
>   drm/amdgpu: fix warning and errors caused by duplicated defines in
> sid.h
>   drm/amdgpu: move and fix X_GB_ADDR_CONFIG_GOLDEN values
>   drm/amdgpu: add or move defines for DCE6 in sid.h
>
The descriptions of the 3 patches were edited before I sent them, but
I didn't think of the cover letter. They are:
drm/amdgpu: prepare DCE6 uniformisation with DCE8 and DCE10
drm/amdgpu: fix SI's GB_ADDR_CONFIG_GOLDEN values and wire up sid.h in GFX6
drm/amdgpu: finish wiring up sid.h in DCE6

Alexandre

>  drivers/gpu/drm/amd/amdgpu/dce_v6_0.c |  73 ++---
>  drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c |  15 +-
>  drivers/gpu/drm/amd/amdgpu/si.c   |  68 ++---
>  drivers/gpu/drm/amd/amdgpu/si_enums.h |  12 -
>  drivers/gpu/drm/amd/amdgpu/sid.h  | 369 ++
>  5 files changed, 111 insertions(+), 426 deletions(-)
>
> --
> 2.48.1
>


Re: [PATCH v2] drm/amd/amdkfd: Evict all queues even HWS remove queue failed

2025-03-09 Thread Zha, YiFan(Even)
[AMD Official Use Only - AMD Internal Distribution Only]

Thanks Felix. Declaring err inside the if block is better. I have submitted 
patch, could you please help to review it?


Thanks.



Best regard,

Yifan Zha




From: Kuehling, Felix 
Sent: Saturday, March 8, 2025 5:00 AM
To: Zha, YiFan(Even) ; amd-gfx@lists.freedesktop.org 
; Deucher, Alexander 
; Zhang, Hawking 
Cc: Chang, HaiJun ; Chen, Horace ; 
Yin, ZhenGuo (Chris) 
Subject: Re: [PATCH v2] drm/amd/amdkfd: Evict all queues even HWS remove queue 
failed


On 2025-03-07 03:53, Yifan Zha wrote:
> [Why]
> If reset is detected and kfd need to evict working queues, HWS moving queue 
> will be failed.
> Then remaining queues are not evicted and in active state.
>
> After reset done, kfd uses HWS to termination remaining activated queues but 
> HWS is resetted.
> So remove queue will be failed again.
>
> [How]
> Keep removing all queues even if HWS returns failed.
> It will not affect cpsch as it checks reset_domain->sem.
>
> v2: If any queue failed, evict queue returns error.
>
> Signed-off-by: Yifan Zha 
> ---
>   drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 8 
>   1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> index f3f2fd6ee65c..b647745ee0a5 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
> @@ -1189,7 +1189,7 @@ static int evict_process_queues_cpsch(struct 
> device_queue_manager *dqm,
>struct queue *q;
>struct device *dev = dqm->dev->adev->dev;
>struct kfd_process_device *pdd;
> - int retval = 0;
> + int retval, err = 0;

You should still initialize retval to 0, otherwise the function may
return an uninitialized value if there are no MES queues. err does not
need to be initialized because you're always assigning a value just
before checking it below.

In fact, you could declare err inside the if-block below, since it is
only needed in that scope. It is preferred to declare variables in a
more limited scope if they are not needed outside.

Regards,
   Felix


>
>dqm_lock(dqm);
>if (qpd->evicted++ > 0) /* already evicted, do nothing */
> @@ -1219,11 +1219,11 @@ static int evict_process_queues_cpsch(struct 
> device_queue_manager *dqm,
>decrement_queue_count(dqm, qpd, q);
>
>if (dqm->dev->kfd->shared_resources.enable_mes) {
> - retval = remove_queue_mes(dqm, q, qpd);
> - if (retval) {
> + err = remove_queue_mes(dqm, q, qpd);
> + if (err) {
>dev_err(dev, "Failed to evict queue %d\n",
>q->properties.queue_id);
> - goto out;
> + retval = err;
>}
>}
>}


RE: [PATCH v4] drm/amdgpu: Fix the race condition for draining retry fault

2025-03-09 Thread Deng, Emily
[AMD Official Use Only - AMD Internal Distribution Only]



From: Chen, Xiaogang 
Sent: Saturday, March 8, 2025 8:38 AM
To: Deng, Emily ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v4] drm/amdgpu: Fix the race condition for draining retry 
fault



On 3/6/2025 7:27 PM, Deng, Emily wrote:

[AMD Official Use Only - AMD Internal Distribution Only]



From: Chen, Xiaogang 
Sent: Friday, March 7, 2025 1:01 AM
To: Deng, Emily ; 
amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH v4] drm/amdgpu: Fix the race condition for draining retry 
fault


Thanks for catch up and fix this race condition. It looks good to me. One minor 
thing below:
On 3/6/2025 12:03 AM, Emily Deng wrote:

Issue:

In the scenario where svm_range_restore_pages is called, but svm->checkpoint_ts

 has not been set and the retry fault has not been drained, 
svm_range_unmap_from_cpu

is triggered and calls svm_range_free. Meanwhile, svm_range_restore_pages

continues execution and reaches svm_range_from_addr. This results in

a "failed to find prange..." error, causing the page recovery to fail.



How to fix:

Move the timestamp check code under the protection of svm->lock.



v2:

Make sure all right locks are released before go out.



v3:

Directly goto out_unlock_svms, and return -EAGAIN.



v4:

Refine code.



Signed-off-by: Emily Deng 

---

 drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 30 +++-

 1 file changed, 16 insertions(+), 14 deletions(-)



diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

index d04725583f19..83ac14bf7a7a 100644

--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c

@@ -3008,19 +3008,6 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,

goto out;

}



-   /* check if this page fault time stamp is before svms->checkpoint_ts */

-   if (svms->checkpoint_ts[gpuidx] != 0) {

-   if (amdgpu_ih_ts_after_or_equal(ts,  
svms->checkpoint_ts[gpuidx])) {

-   pr_debug("draining retry fault, drop fault 0x%llx\n", 
addr);

-   r = 0;

-   goto out;

-   } else

-   /* ts is after svms->checkpoint_ts now, reset 
svms->checkpoint_ts

-* to zero to avoid following ts wrap around give wrong 
comparing

-*/

-svms->checkpoint_ts[gpuidx] = 0;

-   }

-

if (!p->xnack_enabled) {

pr_debug("XNACK not enabled for pasid 0x%x\n", pasid);

r = -EFAULT;

@@ -3040,6 +3027,20 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,

mmap_read_lock(mm);

 retry_write_locked:

mutex_lock(&svms->lock);

+

+   /* check if this page fault time stamp is before svms->checkpoint_ts */

+   if (svms->checkpoint_ts[gpuidx] != 0) {

+   if (amdgpu_ih_ts_after_or_equal(ts,  
svms->checkpoint_ts[gpuidx])) {

+   pr_debug("draining retry fault, drop fault 0x%llx\n", 
addr);

+   r = -EAGAIN;

We drop page fault because it is stale, not mean to handle it again. if return 
-EAGAIN we do amdgpu_gmc_filter_faults_remove. If after unmap, user map same 
range again we should treat page fault happened at same range as new one.

Regards

Xiaogang

Sorry, I didn't quite catch that. So, you think we shouldn't remove the fault 
from amdgpu_gmc_filter_faults_remove?

I think return "-EAGAIN" means a page fault with an addr and a pasid is not 
handled this time. Same page fault will come back again and kfd will handle it, 
so remove it from filter to make sure it will be handled next time.

Yes, I also think we should remove it to make sure it will be handled next 
time. From this point of view, it should be removed from filter.

Emily Deng
Best Wishes




In this case this page fault is stale and we do not need or cannot handle it. 
There is no indication it will come back again and we need handle it.  I am not 
sure if should remove it from filter. I just concern if should return "-EAGAIN" 
in this case.

Regards

Xiaogang



Emily Deng
Best Wishes






+   goto out_unlock_svms;

+   } else

+   /* ts is after svms->checkpoint_ts now, reset 
svms->checkpoint_ts

+* to zero to avoid following ts wrap around give wrong 
comparing

+*/

+svms->checkpoint_ts[gpuidx] = 0;

+   }

+

prange = svm_range_from_addr(svms, addr, NULL);

if (!prange) {

pr_debug("failed to find prange svms 0x%p address [0x%llx]\n",

@@ -3165,7 +3166,8 @@ svm_range_restore_pages(struct amdgpu_device *adev, 
unsigned int pasid,

mutex_unlock(&svms->lock);