Re: [PATCH v15 1/9] Documentation/driver-api: Add document about WBRF mechanism

2023-12-07 Thread kernel test robot
Hi Ma,

kernel test robot noticed the following build warnings:

[auto build test WARNING on linus/master]
[also build test WARNING on v6.7-rc4 next-20231207]
[cannot apply to drm-misc/drm-misc-next wireless-next/main wireless/main]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:
https://github.com/intel-lab-lkp/linux/commits/Ma-Jun/Documentation-driver-api-Add-document-about-WBRF-mechanism/20231206-153327
base:   linus/master
patch link:
https://lore.kernel.org/r/20231206072947.1331729-2-Jun.Ma2%40amd.com
patch subject: [PATCH v15 1/9] Documentation/driver-api: Add document about 
WBRF mechanism
reproduce: 
(https://download.01.org/0day-ci/archive/20231207/202312071941.jxqxsk1c-...@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot 
| Closes: 
https://lore.kernel.org/oe-kbuild-all/202312071941.jxqxsk1c-...@intel.com/

All warnings (new ones prefixed by >>):

>> Documentation/driver-api/wbrf.rst:28: WARNING: Unexpected indentation.
>> Documentation/driver-api/wbrf.rst:61: WARNING: Block quote ends without a 
>> blank line; unexpected unindent.
>> Documentation/driver-api/wbrf.rst: WARNING: document isn't included in any 
>> toctree

vim +28 Documentation/driver-api/wbrf.rst

25  
26  Producer: such component who can produce high-powered radio frequency
27  Consumer: such component who can adjust its in-use frequency in
  > 28 response to the radio frequencies of other components to
29 mitigate the possible RFI.
30  
31  To make the mechanism function, those producers should notify active use
32  of their particular frequencies so that other consumers can make 
relative
33  internal adjustments as necessary to avoid this resonance.
34  
35  ACPI interface
36  ==
37  
38  Although initially used by for wifi + dGPU use cases, the ACPI interface
39  can be scaled to any type of device that a platform designer discovers
40  can cause interference.
41  
42  The GUID used for the _DSM is 7B7656CF-DC3D-4C1C-83E9-66E721DE3070.
43  
44  3 functions are available in this _DSM:
45  
46  * 0: discover # of functions available
47  * 1: record RF bands in use
48  * 2: retrieve RF bands in use
49  
50  Driver programming interface
51  
52  
53  .. kernel-doc:: drivers/platform/x86/amd/wbrf.c
54  
55  Sample Usage
56  =
57  
58  The expected flow for the producers:
59  1. During probe, call `acpi_amd_wbrf_supported_producer` to check if 
WBRF
60 can be enabled for the device.
  > 61  2. On using some frequency band, call `acpi_amd_wbrf_add_remove` with 
'add'
62 param to get other consumers properly notified.
63  3. Or on stopping using some frequency band, call
64 `acpi_amd_wbrf_add_remove` with 'remove' param to get other 
consumers notified.
65  

-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


Re: [PATCH] drm/amd/display: fix hw rotated modes when PSR-SU is enabled

2023-12-07 Thread Mario Limonciello

Bin, KH,

Thanks for the confirmation!

Hamza,

I think you can add a Tested-by tag for Bin too.

On 12/7/2023 04:38, Bin Li wrote:

Hi Mario,

  It's a false alarm from my side, after testing the 6.1.0-oem and
6.5.0-oem kernels, this patch works perfectly fine, sorry about that.

On Thu, Dec 7, 2023 at 3:47 PM Bin Li  wrote:


Hi Mario,

I found I missed the part in 
drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c with kai.heng's review.
I will rebuild a new kernel and test it again, and reply later, sorry about 
that.



On Thu, Dec 7, 2023 at 2:58 PM Kai-Heng Feng  
wrote:


On Thu, Dec 7, 2023 at 10:10 AM Mario Limonciello
 wrote:


On 12/6/2023 20:07, Kai-Heng Feng wrote:

On Thu, Dec 7, 2023 at 9:57 AM Mario Limonciello
 wrote:


On 12/6/2023 19:23, Kai-Heng Feng wrote:

On Wed, Dec 6, 2023 at 4:29 AM Mario Limonciello
 wrote:


On 12/5/2023 14:17, Hamza Mahfooz wrote:

We currently don't support dirty rectangles on hardware rotated modes.
So, if a user is using hardware rotated modes with PSR-SU enabled,
use PSR-SU FFU for all rotated planes (including cursor planes).



Here is the email for the original reporter to give an attribution tag.

Reported-by: Kai-Heng Feng 


For this particular issue,
Tested-by: Kai-Heng Feng 


Can you confirm what kernel base you tested issue against?

I ask because Bin Li (+CC) also tested it against 6.1 based LTS kernel
but ran into problems.


The patch was tested against ADSN.



I wonder if it's because of other dependency patches.  If that's the
case it would be good to call them out in the Cc: @stable as
dependencies so when Greg or Sasha backport this 6.1 doesn't get broken.


Probably. I haven't really tested any older kernel series.


Since you've got a good environment to test it and reproduce it would
you mind double checking it against 6.7-rc, 6.5 and 6.1 trees?  If we
don't have confidence it works on the older trees I think we'll need to
drop the stable tag.


Not seeing issues here when the patch is applied against 6.5 and 6.1
(which needs to resolve a minor conflict).

I am not sure what happened for Bin's case.

Kai-Heng



Kai-Heng



Bin,

Could you run ./scripts/decode_stacktrace.sh on your kernel trace to
give us a specific line number on the issue you hit?

Thanks!





Cc: sta...@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2952
Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
Signed-off-by: Hamza Mahfooz 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c|  4 
 drivers/gpu/drm/amd/display/dc/dc_hw_types.h |  1 +
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c| 12 ++--
 .../gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c  |  3 ++-
 4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index c146dc9cba92..79f8102d2601 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5208,6 +5208,7 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
 bool bb_changed;
 bool fb_changed;
 u32 i = 0;
+


Looks like a spurious newline here.


 *dirty_regions_changed = false;

 /*
@@ -5217,6 +5218,9 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
 if (plane->type == DRM_PLANE_TYPE_CURSOR)
 return;

+ if (new_plane_state->rotation != DRM_MODE_ROTATE_0)
+ goto ffu;
+


I noticed that the original report was specifically on 180°.  Since
you're also covering 90° and 270° with this check it sounds like it's
actually problematic on those too?


90 & 270 are problematic too. But from what I observed the issue is
much more than just cursors.


Got it; thanks.



Kai-Heng




 num_clips = drm_plane_get_damage_clips_count(new_plane_state);
 clips = drm_plane_get_damage_clips(new_plane_state);

diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h 
b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
index 9649934ea186..e2a3aa8812df 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
@@ -465,6 +465,7 @@ struct dc_cursor_mi_param {
 struct fixed31_32 v_scale_ratio;
 enum dc_rotation_angle rotation;
 bool mirror;
+ struct dc_stream_state *stream;
 };

 /* IPP related types */
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
index 139cf31d2e45..89c3bf0fe0c9 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
@@ -1077,8 +1077,16 @@ void hubp2_cursor_set_position(
 if (src_y_offset < 0)
 src_y_offset = 0;
 /* Save necessary cursor info x, y position. w, h is saved in 
attribute func. */
- hubp->cur_rect.x = src_x_offset + p

Re: [PATCH] drm/amd/display: fix hw rotated modes when PSR-SU is enabled

2023-12-07 Thread Hamza Mahfooz

On 12/5/23 15:29, Mario Limonciello wrote:

On 12/5/2023 14:17, Hamza Mahfooz wrote:

We currently don't support dirty rectangles on hardware rotated modes.
So, if a user is using hardware rotated modes with PSR-SU enabled,
use PSR-SU FFU for all rotated planes (including cursor planes).



Here is the email for the original reporter to give an attribution tag.

Reported-by: Kai-Heng Feng 


Cc: sta...@vger.kernel.org
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2952
Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
Signed-off-by: Hamza Mahfooz 
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c    |  4 
  drivers/gpu/drm/amd/display/dc/dc_hw_types.h |  1 +
  drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c    | 12 ++--
  .../gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c  |  3 ++-
  4 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c

index c146dc9cba92..79f8102d2601 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5208,6 +5208,7 @@ static void fill_dc_dirty_rects(struct drm_plane 
*plane,

  bool bb_changed;
  bool fb_changed;
  u32 i = 0;
+


Looks like a spurious newline here.


  *dirty_regions_changed = false;
  /*
@@ -5217,6 +5218,9 @@ static void fill_dc_dirty_rects(struct drm_plane 
*plane,

  if (plane->type == DRM_PLANE_TYPE_CURSOR)
  return;
+    if (new_plane_state->rotation != DRM_MODE_ROTATE_0)
+    goto ffu;
+


I noticed that the original report was specifically on 180°.  Since 
you're also covering 90° and 270° with this check it sounds like it's 
actually problematic on those too?


Ya it's problematic for 90 and 270 as well, though most mainstream
compositors don't use hardware rotation for those cases under any
circumstances. So, I doubt that many people would encounter this issue in
the wild for them.




  num_clips = drm_plane_get_damage_clips_count(new_plane_state);
  clips = drm_plane_get_damage_clips(new_plane_state);
diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h 
b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h

index 9649934ea186..e2a3aa8812df 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
@@ -465,6 +465,7 @@ struct dc_cursor_mi_param {
  struct fixed31_32 v_scale_ratio;
  enum dc_rotation_angle rotation;
  bool mirror;
+    struct dc_stream_state *stream;
  };
  /* IPP related types */
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c

index 139cf31d2e45..89c3bf0fe0c9 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
@@ -1077,8 +1077,16 @@ void hubp2_cursor_set_position(
  if (src_y_offset < 0)
  src_y_offset = 0;
  /* Save necessary cursor info x, y position. w, h is saved in 
attribute func. */

-    hubp->cur_rect.x = src_x_offset + param->viewport.x;
-    hubp->cur_rect.y = src_y_offset + param->viewport.y;
+    if (param->stream->link->psr_settings.psr_version >= 
DC_PSR_VERSION_SU_1 &&

+    param->rotation != ROTATION_ANGLE_0) {


Ditto on above about 90° and 270°.


+    hubp->cur_rect.x = 0;
+    hubp->cur_rect.y = 0;
+    hubp->cur_rect.w = param->stream->timing.h_addressable;
+    hubp->cur_rect.h = param->stream->timing.v_addressable;
+    } else {
+    hubp->cur_rect.x = src_x_offset + param->viewport.x;
+    hubp->cur_rect.y = src_y_offset + param->viewport.y;
+    }
  }
  void hubp2_clk_cntl(struct hubp *hubp, bool enable)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c

index 2b8b8366538e..ce5613a76267 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3417,7 +3417,8 @@ void dcn10_set_cursor_position(struct pipe_ctx 
*pipe_ctx)

  .h_scale_ratio = pipe_ctx->plane_res.scl_data.ratios.horz,
  .v_scale_ratio = pipe_ctx->plane_res.scl_data.ratios.vert,
  .rotation = pipe_ctx->plane_state->rotation,
-    .mirror = pipe_ctx->plane_state->horizontal_mirror
+    .mirror = pipe_ctx->plane_state->horizontal_mirror,
+    .stream = pipe_ctx->stream


As a nit; I think it's worth leaving a harmless trailing ',' so that 
there is less ping pong in the future when adding new members to a struct.



  };
  bool pipe_split_on = false;
  bool odm_combine_on = (pipe_ctx->next_odm_pipe != NULL) ||



--
Hamza



[PATCH v2] drm/amd/display: fix hw rotated modes when PSR-SU is enabled

2023-12-07 Thread Hamza Mahfooz
We currently don't support dirty rectangles on hardware rotated modes.
So, if a user is using hardware rotated modes with PSR-SU enabled,
use PSR-SU FFU for all rotated planes (including cursor planes).

Cc: sta...@vger.kernel.org
Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
Reported-by: Kai-Heng Feng 
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2952
Tested-by: Kai-Heng Feng 
Tested-by: Bin Li 
Signed-off-by: Hamza Mahfooz 
---
v2: fix style issues and add tags
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c|  3 +++
 drivers/gpu/drm/amd/display/dc/dc_hw_types.h |  1 +
 drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c| 12 ++--
 .../gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c  |  3 ++-
 4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index c146dc9cba92..3cd1d6a8fbdf 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5217,6 +5217,9 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
if (plane->type == DRM_PLANE_TYPE_CURSOR)
return;
 
+   if (new_plane_state->rotation != DRM_MODE_ROTATE_0)
+   goto ffu;
+
num_clips = drm_plane_get_damage_clips_count(new_plane_state);
clips = drm_plane_get_damage_clips(new_plane_state);
 
diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h 
b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
index 9649934ea186..e2a3aa8812df 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
@@ -465,6 +465,7 @@ struct dc_cursor_mi_param {
struct fixed31_32 v_scale_ratio;
enum dc_rotation_angle rotation;
bool mirror;
+   struct dc_stream_state *stream;
 };
 
 /* IPP related types */
diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
index 139cf31d2e45..89c3bf0fe0c9 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
@@ -1077,8 +1077,16 @@ void hubp2_cursor_set_position(
if (src_y_offset < 0)
src_y_offset = 0;
/* Save necessary cursor info x, y position. w, h is saved in attribute 
func. */
-   hubp->cur_rect.x = src_x_offset + param->viewport.x;
-   hubp->cur_rect.y = src_y_offset + param->viewport.y;
+   if (param->stream->link->psr_settings.psr_version >= 
DC_PSR_VERSION_SU_1 &&
+   param->rotation != ROTATION_ANGLE_0) {
+   hubp->cur_rect.x = 0;
+   hubp->cur_rect.y = 0;
+   hubp->cur_rect.w = param->stream->timing.h_addressable;
+   hubp->cur_rect.h = param->stream->timing.v_addressable;
+   } else {
+   hubp->cur_rect.x = src_x_offset + param->viewport.x;
+   hubp->cur_rect.y = src_y_offset + param->viewport.y;
+   }
 }
 
 void hubp2_clk_cntl(struct hubp *hubp, bool enable)
diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index 2b8b8366538e..cdb903116eb7 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3417,7 +3417,8 @@ void dcn10_set_cursor_position(struct pipe_ctx *pipe_ctx)
.h_scale_ratio = pipe_ctx->plane_res.scl_data.ratios.horz,
.v_scale_ratio = pipe_ctx->plane_res.scl_data.ratios.vert,
.rotation = pipe_ctx->plane_state->rotation,
-   .mirror = pipe_ctx->plane_state->horizontal_mirror
+   .mirror = pipe_ctx->plane_state->horizontal_mirror,
+   .stream = pipe_ctx->stream,
};
bool pipe_split_on = false;
bool odm_combine_on = (pipe_ctx->next_odm_pipe != NULL) ||
-- 
2.43.0



Re: [PATCH v2] drm/amd/display: fix hw rotated modes when PSR-SU is enabled

2023-12-07 Thread Mario Limonciello

On 12/7/2023 08:51, Hamza Mahfooz wrote:

We currently don't support dirty rectangles on hardware rotated modes.
So, if a user is using hardware rotated modes with PSR-SU enabled,
use PSR-SU FFU for all rotated planes (including cursor planes).

Cc: sta...@vger.kernel.org
Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
Reported-by: Kai-Heng Feng 
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2952
Tested-by: Kai-Heng Feng 
Tested-by: Bin Li 
Signed-off-by: Hamza Mahfooz 

Reviewed-by: Mario Limonciello 

---
v2: fix style issues and add tags
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c|  3 +++
  drivers/gpu/drm/amd/display/dc/dc_hw_types.h |  1 +
  drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c| 12 ++--
  .../gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c  |  3 ++-
  4 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index c146dc9cba92..3cd1d6a8fbdf 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -5217,6 +5217,9 @@ static void fill_dc_dirty_rects(struct drm_plane *plane,
if (plane->type == DRM_PLANE_TYPE_CURSOR)
return;
  
+	if (new_plane_state->rotation != DRM_MODE_ROTATE_0)

+   goto ffu;
+
num_clips = drm_plane_get_damage_clips_count(new_plane_state);
clips = drm_plane_get_damage_clips(new_plane_state);
  
diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h

index 9649934ea186..e2a3aa8812df 100644
--- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
+++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
@@ -465,6 +465,7 @@ struct dc_cursor_mi_param {
struct fixed31_32 v_scale_ratio;
enum dc_rotation_angle rotation;
bool mirror;
+   struct dc_stream_state *stream;
  };
  
  /* IPP related types */

diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c 
b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
index 139cf31d2e45..89c3bf0fe0c9 100644
--- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
+++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
@@ -1077,8 +1077,16 @@ void hubp2_cursor_set_position(
if (src_y_offset < 0)
src_y_offset = 0;
/* Save necessary cursor info x, y position. w, h is saved in attribute 
func. */
-   hubp->cur_rect.x = src_x_offset + param->viewport.x;
-   hubp->cur_rect.y = src_y_offset + param->viewport.y;
+   if (param->stream->link->psr_settings.psr_version >= DC_PSR_VERSION_SU_1 
&&
+   param->rotation != ROTATION_ANGLE_0) {
+   hubp->cur_rect.x = 0;
+   hubp->cur_rect.y = 0;
+   hubp->cur_rect.w = param->stream->timing.h_addressable;
+   hubp->cur_rect.h = param->stream->timing.v_addressable;
+   } else {
+   hubp->cur_rect.x = src_x_offset + param->viewport.x;
+   hubp->cur_rect.y = src_y_offset + param->viewport.y;
+   }
  }
  
  void hubp2_clk_cntl(struct hubp *hubp, bool enable)

diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
index 2b8b8366538e..cdb903116eb7 100644
--- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
+++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
@@ -3417,7 +3417,8 @@ void dcn10_set_cursor_position(struct pipe_ctx *pipe_ctx)
.h_scale_ratio = pipe_ctx->plane_res.scl_data.ratios.horz,
.v_scale_ratio = pipe_ctx->plane_res.scl_data.ratios.vert,
.rotation = pipe_ctx->plane_state->rotation,
-   .mirror = pipe_ctx->plane_state->horizontal_mirror
+   .mirror = pipe_ctx->plane_state->horizontal_mirror,
+   .stream = pipe_ctx->stream,
};
bool pipe_split_on = false;
bool odm_combine_on = (pipe_ctx->next_odm_pipe != NULL) ||




Re: [PATCH] drm/amdgpu: drop the long-double-128 powerpc check/hack

2023-12-07 Thread Christophe Leroy


Le 31/03/2023 à 12:53, Michael Ellerman a écrit :
> "Daniel Kolesa"  writes:
>> Commit c653c591789b ("drm/amdgpu: Re-enable DCN for 64-bit powerpc")
>> introduced this check as a workaround for the driver not building
>> with toolchains that default to 64-bit long double.
> ...
>> In mainline, this work is now fully done, so this check is fully
>> redundant and does not do anything except preventing AMDGPU DC
>> from being built on systems such as those using musl libc. The
>> last piece of work to enable this was commit c92b7fe0d92a
>> ("drm/amd/display: move remaining FPU code to dml folder")
>> and this has since been backported to 6.1 stable (in 6.1.7).
>>
>> Relevant issue: https://gitlab.freedesktop.org/drm/amd/-/issues/2288
> 
> I looked to pick this up for 6.3 but was still seeing build errors with
> some compilers. I assumed that was due to some fixes coming in
> linux-next that I didn't have.
> 
> But applying the patch on v6.3-rc4 I still see build errors. This is
> building allyesconfig with the kernel.org GCC 12.2.0 / binutils 2.39
> toolchain:
> 
>powerpc64le-linux-gnu-ld: 
> drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.o uses hard float, 
> arch/powerpc/lib/test_emulate_step.o uses soft float
>powerpc64le-linux-gnu-ld: failed to merge target specific data of file 
> drivers/gpu/drm/amd/display/dc/dml/display_mode_lib.o
> 
> etc.
> 
> All the conflicts are between test_emulate_step.o and some file in 
> drivers/gpu/drm/amd/display/dc/dml.
> 
> So even with all the hard-float code isolated in the dml folder, we
> still hit build errors, because allyesconfig wants to link those
> hard-float using objects with soft-float objects from elsewhere in the
> kernel.
> 
> It seems like the only workable fix is to force the kernel build to use
> 128-bit long double. I'll send a patch doing that.
> 

Commit 78f0929884d4 ("powerpc/64: Always build with 128-bit long 
double") I guess ?

Let's drop this patch from patchwork then.


Re: [PATCH] drm/amd/display: fix hw rotated modes when PSR-SU is enabled

2023-12-07 Thread Kai-Heng Feng
On Wed, Dec 6, 2023 at 4:29 AM Mario Limonciello
 wrote:
>
> On 12/5/2023 14:17, Hamza Mahfooz wrote:
> > We currently don't support dirty rectangles on hardware rotated modes.
> > So, if a user is using hardware rotated modes with PSR-SU enabled,
> > use PSR-SU FFU for all rotated planes (including cursor planes).
> >
>
> Here is the email for the original reporter to give an attribution tag.
>
> Reported-by: Kai-Heng Feng 

For this particular issue,
Tested-by: Kai-Heng Feng 

>
> > Cc: sta...@vger.kernel.org
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2952
> > Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
> > Signed-off-by: Hamza Mahfooz 
> > ---
> >   drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c|  4 
> >   drivers/gpu/drm/amd/display/dc/dc_hw_types.h |  1 +
> >   drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c| 12 ++--
> >   .../gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c  |  3 ++-
> >   4 files changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > index c146dc9cba92..79f8102d2601 100644
> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > @@ -5208,6 +5208,7 @@ static void fill_dc_dirty_rects(struct drm_plane 
> > *plane,
> >   bool bb_changed;
> >   bool fb_changed;
> >   u32 i = 0;
> > +
>
> Looks like a spurious newline here.
>
> >   *dirty_regions_changed = false;
> >
> >   /*
> > @@ -5217,6 +5218,9 @@ static void fill_dc_dirty_rects(struct drm_plane 
> > *plane,
> >   if (plane->type == DRM_PLANE_TYPE_CURSOR)
> >   return;
> >
> > + if (new_plane_state->rotation != DRM_MODE_ROTATE_0)
> > + goto ffu;
> > +
>
> I noticed that the original report was specifically on 180°.  Since
> you're also covering 90° and 270° with this check it sounds like it's
> actually problematic on those too?

90 & 270 are problematic too. But from what I observed the issue is
much more than just cursors.

Kai-Heng

>
> >   num_clips = drm_plane_get_damage_clips_count(new_plane_state);
> >   clips = drm_plane_get_damage_clips(new_plane_state);
> >
> > diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h 
> > b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> > index 9649934ea186..e2a3aa8812df 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> > +++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> > @@ -465,6 +465,7 @@ struct dc_cursor_mi_param {
> >   struct fixed31_32 v_scale_ratio;
> >   enum dc_rotation_angle rotation;
> >   bool mirror;
> > + struct dc_stream_state *stream;
> >   };
> >
> >   /* IPP related types */
> > diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c 
> > b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
> > index 139cf31d2e45..89c3bf0fe0c9 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
> > +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
> > @@ -1077,8 +1077,16 @@ void hubp2_cursor_set_position(
> >   if (src_y_offset < 0)
> >   src_y_offset = 0;
> >   /* Save necessary cursor info x, y position. w, h is saved in 
> > attribute func. */
> > - hubp->cur_rect.x = src_x_offset + param->viewport.x;
> > - hubp->cur_rect.y = src_y_offset + param->viewport.y;
> > + if (param->stream->link->psr_settings.psr_version >= 
> > DC_PSR_VERSION_SU_1 &&
> > + param->rotation != ROTATION_ANGLE_0) {
>
> Ditto on above about 90° and 270°.
>
> > + hubp->cur_rect.x = 0;
> > + hubp->cur_rect.y = 0;
> > + hubp->cur_rect.w = param->stream->timing.h_addressable;
> > + hubp->cur_rect.h = param->stream->timing.v_addressable;
> > + } else {
> > + hubp->cur_rect.x = src_x_offset + param->viewport.x;
> > + hubp->cur_rect.y = src_y_offset + param->viewport.y;
> > + }
> >   }
> >
> >   void hubp2_clk_cntl(struct hubp *hubp, bool enable)
> > diff --git a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c 
> > b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
> > index 2b8b8366538e..ce5613a76267 100644
> > --- a/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
> > +++ b/drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c
> > @@ -3417,7 +3417,8 @@ void dcn10_set_cursor_position(struct pipe_ctx 
> > *pipe_ctx)
> >   .h_scale_ratio = pipe_ctx->plane_res.scl_data.ratios.horz,
> >   .v_scale_ratio = pipe_ctx->plane_res.scl_data.ratios.vert,
> >   .rotation = pipe_ctx->plane_state->rotation,
> > - .mirror = pipe_ctx->plane_state->horizontal_mirror
> > + .mirror = pipe_ctx->plane_state->horizontal_mirror,
> > + .stream = pipe_ctx->stream
>
> As a nit; I think it's worth leaving a harmless trailing ',' so that
> there is less ping pong 

Re: [PATCH] drm/amd/display: fix hw rotated modes when PSR-SU is enabled

2023-12-07 Thread Kai-Heng Feng
On Thu, Dec 7, 2023 at 10:10 AM Mario Limonciello
 wrote:
>
> On 12/6/2023 20:07, Kai-Heng Feng wrote:
> > On Thu, Dec 7, 2023 at 9:57 AM Mario Limonciello
> >  wrote:
> >>
> >> On 12/6/2023 19:23, Kai-Heng Feng wrote:
> >>> On Wed, Dec 6, 2023 at 4:29 AM Mario Limonciello
> >>>  wrote:
> 
>  On 12/5/2023 14:17, Hamza Mahfooz wrote:
> > We currently don't support dirty rectangles on hardware rotated modes.
> > So, if a user is using hardware rotated modes with PSR-SU enabled,
> > use PSR-SU FFU for all rotated planes (including cursor planes).
> >
> 
>  Here is the email for the original reporter to give an attribution tag.
> 
>  Reported-by: Kai-Heng Feng 
> >>>
> >>> For this particular issue,
> >>> Tested-by: Kai-Heng Feng 
> >>
> >> Can you confirm what kernel base you tested issue against?
> >>
> >> I ask because Bin Li (+CC) also tested it against 6.1 based LTS kernel
> >> but ran into problems.
> >
> > The patch was tested against ADSN.
> >
> >>
> >> I wonder if it's because of other dependency patches.  If that's the
> >> case it would be good to call them out in the Cc: @stable as
> >> dependencies so when Greg or Sasha backport this 6.1 doesn't get broken.
> >
> > Probably. I haven't really tested any older kernel series.
>
> Since you've got a good environment to test it and reproduce it would
> you mind double checking it against 6.7-rc, 6.5 and 6.1 trees?  If we
> don't have confidence it works on the older trees I think we'll need to
> drop the stable tag.

Not seeing issues here when the patch is applied against 6.5 and 6.1
(which needs to resolve a minor conflict).

I am not sure what happened for Bin's case.

Kai-Heng

> >
> > Kai-Heng
> >
> >>
> >> Bin,
> >>
> >> Could you run ./scripts/decode_stacktrace.sh on your kernel trace to
> >> give us a specific line number on the issue you hit?
> >>
> >> Thanks!
> >>>
> 
> > Cc: sta...@vger.kernel.org
> > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2952
> > Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
> > Signed-off-by: Hamza Mahfooz 
> > ---
> > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c|  4 
> > drivers/gpu/drm/amd/display/dc/dc_hw_types.h |  1 +
> > drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c| 12 
> > ++--
> > .../gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c  |  3 ++-
> > 4 files changed, 17 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > index c146dc9cba92..79f8102d2601 100644
> > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > @@ -5208,6 +5208,7 @@ static void fill_dc_dirty_rects(struct drm_plane 
> > *plane,
> > bool bb_changed;
> > bool fb_changed;
> > u32 i = 0;
> > +
> 
>  Looks like a spurious newline here.
> 
> > *dirty_regions_changed = false;
> >
> > /*
> > @@ -5217,6 +5218,9 @@ static void fill_dc_dirty_rects(struct drm_plane 
> > *plane,
> > if (plane->type == DRM_PLANE_TYPE_CURSOR)
> > return;
> >
> > + if (new_plane_state->rotation != DRM_MODE_ROTATE_0)
> > + goto ffu;
> > +
> 
>  I noticed that the original report was specifically on 180°.  Since
>  you're also covering 90° and 270° with this check it sounds like it's
>  actually problematic on those too?
> >>>
> >>> 90 & 270 are problematic too. But from what I observed the issue is
> >>> much more than just cursors.
> >>
> >> Got it; thanks.
> >>
> >>>
> >>> Kai-Heng
> >>>
> 
> > num_clips = drm_plane_get_damage_clips_count(new_plane_state);
> > clips = drm_plane_get_damage_clips(new_plane_state);
> >
> > diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h 
> > b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> > index 9649934ea186..e2a3aa8812df 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> > +++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> > @@ -465,6 +465,7 @@ struct dc_cursor_mi_param {
> > struct fixed31_32 v_scale_ratio;
> > enum dc_rotation_angle rotation;
> > bool mirror;
> > + struct dc_stream_state *stream;
> > };
> >
> > /* IPP related types */
> > diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c 
> > b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
> > index 139cf31d2e45..89c3bf0fe0c9 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
> > +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
> > @@ -1077,8 +1077,16 @@ void hubp2_cursor_set_position(
> > if (src_y_offset < 0)
> > 

Re: [PATCH] drm/amd/display: fix hw rotated modes when PSR-SU is enabled

2023-12-07 Thread Kai-Heng Feng
On Thu, Dec 7, 2023 at 9:57 AM Mario Limonciello
 wrote:
>
> On 12/6/2023 19:23, Kai-Heng Feng wrote:
> > On Wed, Dec 6, 2023 at 4:29 AM Mario Limonciello
> >  wrote:
> >>
> >> On 12/5/2023 14:17, Hamza Mahfooz wrote:
> >>> We currently don't support dirty rectangles on hardware rotated modes.
> >>> So, if a user is using hardware rotated modes with PSR-SU enabled,
> >>> use PSR-SU FFU for all rotated planes (including cursor planes).
> >>>
> >>
> >> Here is the email for the original reporter to give an attribution tag.
> >>
> >> Reported-by: Kai-Heng Feng 
> >
> > For this particular issue,
> > Tested-by: Kai-Heng Feng 
>
> Can you confirm what kernel base you tested issue against?
>
> I ask because Bin Li (+CC) also tested it against 6.1 based LTS kernel
> but ran into problems.

The patch was tested against ADSN.

>
> I wonder if it's because of other dependency patches.  If that's the
> case it would be good to call them out in the Cc: @stable as
> dependencies so when Greg or Sasha backport this 6.1 doesn't get broken.

Probably. I haven't really tested any older kernel series.

Kai-Heng

>
> Bin,
>
> Could you run ./scripts/decode_stacktrace.sh on your kernel trace to
> give us a specific line number on the issue you hit?
>
> Thanks!
> >
> >>
> >>> Cc: sta...@vger.kernel.org
> >>> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2952
> >>> Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
> >>> Signed-off-by: Hamza Mahfooz 
> >>> ---
> >>>drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c|  4 
> >>>drivers/gpu/drm/amd/display/dc/dc_hw_types.h |  1 +
> >>>drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c| 12 ++--
> >>>.../gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c  |  3 ++-
> >>>4 files changed, 17 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> >>> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >>> index c146dc9cba92..79f8102d2601 100644
> >>> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >>> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> >>> @@ -5208,6 +5208,7 @@ static void fill_dc_dirty_rects(struct drm_plane 
> >>> *plane,
> >>>bool bb_changed;
> >>>bool fb_changed;
> >>>u32 i = 0;
> >>> +
> >>
> >> Looks like a spurious newline here.
> >>
> >>>*dirty_regions_changed = false;
> >>>
> >>>/*
> >>> @@ -5217,6 +5218,9 @@ static void fill_dc_dirty_rects(struct drm_plane 
> >>> *plane,
> >>>if (plane->type == DRM_PLANE_TYPE_CURSOR)
> >>>return;
> >>>
> >>> + if (new_plane_state->rotation != DRM_MODE_ROTATE_0)
> >>> + goto ffu;
> >>> +
> >>
> >> I noticed that the original report was specifically on 180°.  Since
> >> you're also covering 90° and 270° with this check it sounds like it's
> >> actually problematic on those too?
> >
> > 90 & 270 are problematic too. But from what I observed the issue is
> > much more than just cursors.
>
> Got it; thanks.
>
> >
> > Kai-Heng
> >
> >>
> >>>num_clips = drm_plane_get_damage_clips_count(new_plane_state);
> >>>clips = drm_plane_get_damage_clips(new_plane_state);
> >>>
> >>> diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h 
> >>> b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> >>> index 9649934ea186..e2a3aa8812df 100644
> >>> --- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> >>> +++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> >>> @@ -465,6 +465,7 @@ struct dc_cursor_mi_param {
> >>>struct fixed31_32 v_scale_ratio;
> >>>enum dc_rotation_angle rotation;
> >>>bool mirror;
> >>> + struct dc_stream_state *stream;
> >>>};
> >>>
> >>>/* IPP related types */
> >>> diff --git a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c 
> >>> b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
> >>> index 139cf31d2e45..89c3bf0fe0c9 100644
> >>> --- a/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
> >>> +++ b/drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c
> >>> @@ -1077,8 +1077,16 @@ void hubp2_cursor_set_position(
> >>>if (src_y_offset < 0)
> >>>src_y_offset = 0;
> >>>/* Save necessary cursor info x, y position. w, h is saved in 
> >>> attribute func. */
> >>> - hubp->cur_rect.x = src_x_offset + param->viewport.x;
> >>> - hubp->cur_rect.y = src_y_offset + param->viewport.y;
> >>> + if (param->stream->link->psr_settings.psr_version >= 
> >>> DC_PSR_VERSION_SU_1 &&
> >>> + param->rotation != ROTATION_ANGLE_0) {
> >>
> >> Ditto on above about 90° and 270°.
> >>
> >>> + hubp->cur_rect.x = 0;
> >>> + hubp->cur_rect.y = 0;
> >>> + hubp->cur_rect.w = param->stream->timing.h_addressable;
> >>> + hubp->cur_rect.h = param->stream->timing.v_addressable;
> >>> + } else {
> >>> + hubp->cur_rect.x = src_x_offset + param->viewport.x;
> >>> + hubp->cur_re

[PATCH][next] drm/amd/display: Fix spelling mistake "SMC_MSG_AllowZstatesEntr" -> "SMC_MSG_AllowZstatesEntry"

2023-12-07 Thread Colin Ian King
There is a spelling mistake in a smu_print message. Fix it.

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.c 
b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.c
index d6db9d7fced2..6d4a1ffab5ed 100644
--- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.c
+++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.c
@@ -361,26 +361,26 @@ void dcn35_smu_set_zstate_support(struct clk_mgr_internal 
*clk_mgr, enum dcn_zst
case DCN_ZSTATE_SUPPORT_ALLOW:
msg_id = VBIOSSMC_MSG_AllowZstatesEntry;
param = (1 << 10) | (1 << 9) | (1 << 8);
-   smu_print("%s: SMC_MSG_AllowZstatesEntr msg = ALLOW, param = 
%d\n", __func__, param);
+   smu_print("%s: SMC_MSG_AllowZstatesEntry msg = ALLOW, param = 
%d\n", __func__, param);
break;
 
case DCN_ZSTATE_SUPPORT_DISALLOW:
msg_id = VBIOSSMC_MSG_AllowZstatesEntry;
param = 0;
-   smu_print("%s: SMC_MSG_AllowZstatesEntr msg_id = DISALLOW, 
param = %d\n",  __func__, param);
+   smu_print("%s: SMC_MSG_AllowZstatesEntry msg_id = DISALLOW, 
param = %d\n",  __func__, param);
break;
 
 
case DCN_ZSTATE_SUPPORT_ALLOW_Z10_ONLY:
msg_id = VBIOSSMC_MSG_AllowZstatesEntry;
param = (1 << 10);
-   smu_print("%s: SMC_MSG_AllowZstatesEntr msg = ALLOW_Z10_ONLY, 
param = %d\n", __func__, param);
+   smu_print("%s: SMC_MSG_AllowZstatesEntry msg = ALLOW_Z10_ONLY, 
param = %d\n", __func__, param);
break;
 
case DCN_ZSTATE_SUPPORT_ALLOW_Z8_Z10_ONLY:
msg_id = VBIOSSMC_MSG_AllowZstatesEntry;
param = (1 << 10) | (1 << 8);
-   smu_print("%s: SMC_MSG_AllowZstatesEntr msg = 
ALLOW_Z8_Z10_ONLY, param = %d\n", __func__, param);
+   smu_print("%s: SMC_MSG_AllowZstatesEntry msg = 
ALLOW_Z8_Z10_ONLY, param = %d\n", __func__, param);
break;
 
case DCN_ZSTATE_SUPPORT_ALLOW_Z8_ONLY:
-- 
2.39.2



Re: Regression: Radeon video card does not work with 6.6.4; works fine with 6.6.3

2023-12-07 Thread Bagas Sanjaya
[Cc'ing also amdgpu people]

On Wed, Dec 06, 2023 at 05:22:20PM -0500, Dianne Skoll wrote:
> Hi,
> 
> I had to go back to 6.6.3 because 6.6.4 seems to have broken my Radeon
> video setup.  The full bug report:
> https://bugzilla.kernel.org/show_bug.cgi?id=218238
> 

Can you bisect to find the culprit commit? See
Documentation/admin-guide/bug-bisect.rst in the kernel sources for reference
if you have never done bisection.

Also, can you check if latest mainline (currently v6.7-rc4) still have this
regression?

Regardless, please also report on freedesktop tracker [1].

Thanks.

[1]: https://gitlab.freedesktop.org/drm/amd/-/issues

-- 
An old man doll... just what I always wanted! - Clara


signature.asc
Description: PGP signature


Re: [PATCH] drm/amd/display: fix hw rotated modes when PSR-SU is enabled

2023-12-07 Thread Bin Li
Hi Mario,

I found I missed the part
in drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c with kai.heng's
review.
I will rebuild a new kernel and test it again, and reply later, sorry about
that.



On Thu, Dec 7, 2023 at 2:58 PM Kai-Heng Feng 
wrote:

> On Thu, Dec 7, 2023 at 10:10 AM Mario Limonciello
>  wrote:
> >
> > On 12/6/2023 20:07, Kai-Heng Feng wrote:
> > > On Thu, Dec 7, 2023 at 9:57 AM Mario Limonciello
> > >  wrote:
> > >>
> > >> On 12/6/2023 19:23, Kai-Heng Feng wrote:
> > >>> On Wed, Dec 6, 2023 at 4:29 AM Mario Limonciello
> > >>>  wrote:
> > 
> >  On 12/5/2023 14:17, Hamza Mahfooz wrote:
> > > We currently don't support dirty rectangles on hardware rotated
> modes.
> > > So, if a user is using hardware rotated modes with PSR-SU enabled,
> > > use PSR-SU FFU for all rotated planes (including cursor planes).
> > >
> > 
> >  Here is the email for the original reporter to give an attribution
> tag.
> > 
> >  Reported-by: Kai-Heng Feng 
> > >>>
> > >>> For this particular issue,
> > >>> Tested-by: Kai-Heng Feng 
> > >>
> > >> Can you confirm what kernel base you tested issue against?
> > >>
> > >> I ask because Bin Li (+CC) also tested it against 6.1 based LTS kernel
> > >> but ran into problems.
> > >
> > > The patch was tested against ADSN.
> > >
> > >>
> > >> I wonder if it's because of other dependency patches.  If that's the
> > >> case it would be good to call them out in the Cc: @stable as
> > >> dependencies so when Greg or Sasha backport this 6.1 doesn't get
> broken.
> > >
> > > Probably. I haven't really tested any older kernel series.
> >
> > Since you've got a good environment to test it and reproduce it would
> > you mind double checking it against 6.7-rc, 6.5 and 6.1 trees?  If we
> > don't have confidence it works on the older trees I think we'll need to
> > drop the stable tag.
>
> Not seeing issues here when the patch is applied against 6.5 and 6.1
> (which needs to resolve a minor conflict).
>
> I am not sure what happened for Bin's case.
>
> Kai-Heng
>
> > >
> > > Kai-Heng
> > >
> > >>
> > >> Bin,
> > >>
> > >> Could you run ./scripts/decode_stacktrace.sh on your kernel trace to
> > >> give us a specific line number on the issue you hit?
> > >>
> > >> Thanks!
> > >>>
> > 
> > > Cc: sta...@vger.kernel.org
> > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2952
> > > Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS
> support")
> > > Signed-off-by: Hamza Mahfooz 
> > > ---
> > > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c|  4 
> > > drivers/gpu/drm/amd/display/dc/dc_hw_types.h |  1 +
> > > drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c| 12
> ++--
> > > .../gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c  |  3 ++-
> > > 4 files changed, 17 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > index c146dc9cba92..79f8102d2601 100644
> > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> > > @@ -5208,6 +5208,7 @@ static void fill_dc_dirty_rects(struct
> drm_plane *plane,
> > > bool bb_changed;
> > > bool fb_changed;
> > > u32 i = 0;
> > > +
> > 
> >  Looks like a spurious newline here.
> > 
> > > *dirty_regions_changed = false;
> > >
> > > /*
> > > @@ -5217,6 +5218,9 @@ static void fill_dc_dirty_rects(struct
> drm_plane *plane,
> > > if (plane->type == DRM_PLANE_TYPE_CURSOR)
> > > return;
> > >
> > > + if (new_plane_state->rotation != DRM_MODE_ROTATE_0)
> > > + goto ffu;
> > > +
> > 
> >  I noticed that the original report was specifically on 180°.  Since
> >  you're also covering 90° and 270° with this check it sounds like
> it's
> >  actually problematic on those too?
> > >>>
> > >>> 90 & 270 are problematic too. But from what I observed the issue is
> > >>> much more than just cursors.
> > >>
> > >> Got it; thanks.
> > >>
> > >>>
> > >>> Kai-Heng
> > >>>
> > 
> > > num_clips =
> drm_plane_get_damage_clips_count(new_plane_state);
> > > clips = drm_plane_get_damage_clips(new_plane_state);
> > >
> > > diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> > > index 9649934ea186..e2a3aa8812df 100644
> > > --- a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> > > +++ b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
> > > @@ -465,6 +465,7 @@ struct dc_cursor_mi_param {
> > > struct fixed31_32 v_scale_ratio;
> > > enum dc_rotation_angle rotation;
> > > bool mirror;
> > > + struct dc_stream_state *stream;
> > > 

Re: [PATCH] drm/amd/display: fix hw rotated modes when PSR-SU is enabled

2023-12-07 Thread Bin Li
Hi Mario,

 It's a false alarm from my side, after testing the 6.1.0-oem and
6.5.0-oem kernels, this patch works perfectly fine, sorry about that.

On Thu, Dec 7, 2023 at 3:47 PM Bin Li  wrote:
>
> Hi Mario,
>
> I found I missed the part in 
> drivers/gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c with kai.heng's 
> review.
> I will rebuild a new kernel and test it again, and reply later, sorry about 
> that.
>
>
>
> On Thu, Dec 7, 2023 at 2:58 PM Kai-Heng Feng  
> wrote:
>>
>> On Thu, Dec 7, 2023 at 10:10 AM Mario Limonciello
>>  wrote:
>> >
>> > On 12/6/2023 20:07, Kai-Heng Feng wrote:
>> > > On Thu, Dec 7, 2023 at 9:57 AM Mario Limonciello
>> > >  wrote:
>> > >>
>> > >> On 12/6/2023 19:23, Kai-Heng Feng wrote:
>> > >>> On Wed, Dec 6, 2023 at 4:29 AM Mario Limonciello
>> > >>>  wrote:
>> > 
>> >  On 12/5/2023 14:17, Hamza Mahfooz wrote:
>> > > We currently don't support dirty rectangles on hardware rotated 
>> > > modes.
>> > > So, if a user is using hardware rotated modes with PSR-SU enabled,
>> > > use PSR-SU FFU for all rotated planes (including cursor planes).
>> > >
>> > 
>> >  Here is the email for the original reporter to give an attribution 
>> >  tag.
>> > 
>> >  Reported-by: Kai-Heng Feng 
>> > >>>
>> > >>> For this particular issue,
>> > >>> Tested-by: Kai-Heng Feng 
>> > >>
>> > >> Can you confirm what kernel base you tested issue against?
>> > >>
>> > >> I ask because Bin Li (+CC) also tested it against 6.1 based LTS kernel
>> > >> but ran into problems.
>> > >
>> > > The patch was tested against ADSN.
>> > >
>> > >>
>> > >> I wonder if it's because of other dependency patches.  If that's the
>> > >> case it would be good to call them out in the Cc: @stable as
>> > >> dependencies so when Greg or Sasha backport this 6.1 doesn't get broken.
>> > >
>> > > Probably. I haven't really tested any older kernel series.
>> >
>> > Since you've got a good environment to test it and reproduce it would
>> > you mind double checking it against 6.7-rc, 6.5 and 6.1 trees?  If we
>> > don't have confidence it works on the older trees I think we'll need to
>> > drop the stable tag.
>>
>> Not seeing issues here when the patch is applied against 6.5 and 6.1
>> (which needs to resolve a minor conflict).
>>
>> I am not sure what happened for Bin's case.
>>
>> Kai-Heng
>>
>> > >
>> > > Kai-Heng
>> > >
>> > >>
>> > >> Bin,
>> > >>
>> > >> Could you run ./scripts/decode_stacktrace.sh on your kernel trace to
>> > >> give us a specific line number on the issue you hit?
>> > >>
>> > >> Thanks!
>> > >>>
>> > 
>> > > Cc: sta...@vger.kernel.org
>> > > Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2952
>> > > Fixes: 30ebe41582d1 ("drm/amd/display: add FB_DAMAGE_CLIPS support")
>> > > Signed-off-by: Hamza Mahfooz 
>> > > ---
>> > > drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c|  4 
>> > > drivers/gpu/drm/amd/display/dc/dc_hw_types.h |  1 +
>> > > drivers/gpu/drm/amd/display/dc/dcn20/dcn20_hubp.c| 12 
>> > > ++--
>> > > .../gpu/drm/amd/display/dc/hwss/dcn10/dcn10_hwseq.c  |  3 ++-
>> > > 4 files changed, 17 insertions(+), 3 deletions(-)
>> > >
>> > > diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
>> > > b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> > > index c146dc9cba92..79f8102d2601 100644
>> > > --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> > > +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
>> > > @@ -5208,6 +5208,7 @@ static void fill_dc_dirty_rects(struct 
>> > > drm_plane *plane,
>> > > bool bb_changed;
>> > > bool fb_changed;
>> > > u32 i = 0;
>> > > +
>> > 
>> >  Looks like a spurious newline here.
>> > 
>> > > *dirty_regions_changed = false;
>> > >
>> > > /*
>> > > @@ -5217,6 +5218,9 @@ static void fill_dc_dirty_rects(struct 
>> > > drm_plane *plane,
>> > > if (plane->type == DRM_PLANE_TYPE_CURSOR)
>> > > return;
>> > >
>> > > + if (new_plane_state->rotation != DRM_MODE_ROTATE_0)
>> > > + goto ffu;
>> > > +
>> > 
>> >  I noticed that the original report was specifically on 180°.  Since
>> >  you're also covering 90° and 270° with this check it sounds like it's
>> >  actually problematic on those too?
>> > >>>
>> > >>> 90 & 270 are problematic too. But from what I observed the issue is
>> > >>> much more than just cursors.
>> > >>
>> > >> Got it; thanks.
>> > >>
>> > >>>
>> > >>> Kai-Heng
>> > >>>
>> > 
>> > > num_clips = 
>> > > drm_plane_get_damage_clips_count(new_plane_state);
>> > > clips = drm_plane_get_damage_clips(new_plane_state);
>> > >
>> > > diff --git a/drivers/gpu/drm/amd/display/dc/dc_hw_types.h 
>> > > b/drivers/gpu/drm/amd/display/dc/dc_hw_types.h
>> > > in

Re: [PATCH] drm/amd/display: Restore guard against default backlight value < 1 nit

2023-12-07 Thread Alex Deucher
On Thu, Dec 7, 2023 at 9:47 AM Mario Limonciello
 wrote:
>
> Mark reports that brightness is not restored after Xorg dpms screen blank.
>
> This behavior was introduced by commit d9e865826c20 ("drm/amd/display:
> Simplify brightness initialization") which dropped the cached backlight
> value in display code, but also removed code for when the default value
> read back was less than 1 nit.
>
> Restore this code so that the backlight brightness is restored to the
> correct default value in this circumstance.
>
> Reported-by: Mark Herbert 
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3031
> Cc: sta...@vger.kernel.org
> Cc: Camille Cho 
> Cc: Krunoslav Kovac 
> Cc: Hamza Mahfooz 
> Fixes: d9e865826c20 ("drm/amd/display: Simplify brightness initialization")
> Signed-off-by: Mario Limonciello 

Acked-by: Alex Deucher 

> ---
>  .../amd/display/dc/link/protocols/link_edp_panel_control.c| 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git 
> a/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c 
> b/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c
> index ac0fa88b52a0..bf53a86ea817 100644
> --- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c
> +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c
> @@ -287,8 +287,8 @@ bool set_default_brightness_aux(struct dc_link *link)
> if (link && link->dpcd_sink_ext_caps.bits.oled == 1) {
> if (!read_default_bl_aux(link, &default_backlight))
> default_backlight = 15;
> -   // if > 5000, it might be wrong readback
> -   if (default_backlight > 500)
> +   // if < 1 nits or > 5000, it might be wrong readback
> +   if (default_backlight < 1000 || default_backlight > 500)
> default_backlight = 15;
>
> return edp_set_backlight_level_nits(link, true,
> --
> 2.34.1
>


[PATCH 1/2] drm/buddy: Implement tracking clear page feature

2023-12-07 Thread Arunpravin Paneer Selvam
- Add tracking clear page feature.

- If driver requests cleared memory we prefer cleared memory
  but fallback to uncleared if we can't find the cleared blocks.
  when driver requests uncleared memory we try to use uncleared but
  fallback to cleared memory if necessary.

- Driver should enable the DRM_BUDDY_CLEARED flag if it
  successfully clears the blocks in the free path. On the otherhand,
  DRM buddy marks each block as cleared.

- When a block gets freed we clear it and mark the freed block as cleared,
  when there are buddies which are cleared as well we can merge them.
  Otherwise, we prefer to keep the blocks as separated.

- Track the available cleared pages size

Signed-off-by: Arunpravin Paneer Selvam 
Suggested-by: Christian König 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  |   6 +-
 drivers/gpu/drm/drm_buddy.c   | 169 +++---
 drivers/gpu/drm/i915/i915_ttm_buddy_manager.c |   6 +-
 drivers/gpu/drm/tests/drm_buddy_test.c|  10 +-
 include/drm/drm_buddy.h   |  18 +-
 5 files changed, 168 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
index 08916538a615..d0e199cc8f17 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c
@@ -556,7 +556,7 @@ static int amdgpu_vram_mgr_new(struct ttm_resource_manager 
*man,
return 0;
 
 error_free_blocks:
-   drm_buddy_free_list(mm, &vres->blocks);
+   drm_buddy_free_list(mm, &vres->blocks, 0);
mutex_unlock(&mgr->lock);
 error_fini:
ttm_resource_fini(man, &vres->base);
@@ -589,7 +589,7 @@ static void amdgpu_vram_mgr_del(struct ttm_resource_manager 
*man,
 
amdgpu_vram_mgr_do_reserve(man);
 
-   drm_buddy_free_list(mm, &vres->blocks);
+   drm_buddy_free_list(mm, &vres->blocks, 0);
mutex_unlock(&mgr->lock);
 
atomic64_sub(vis_usage, &mgr->vis_usage);
@@ -897,7 +897,7 @@ void amdgpu_vram_mgr_fini(struct amdgpu_device *adev)
kfree(rsv);
 
list_for_each_entry_safe(rsv, temp, &mgr->reserved_pages, blocks) {
-   drm_buddy_free_list(&mgr->mm, &rsv->allocated);
+   drm_buddy_free_list(&mgr->mm, &rsv->allocated, 0);
kfree(rsv);
}
if (!adev->gmc.is_app_apu)
diff --git a/drivers/gpu/drm/drm_buddy.c b/drivers/gpu/drm/drm_buddy.c
index f57e6d74fb0e..d44172f23f05 100644
--- a/drivers/gpu/drm/drm_buddy.c
+++ b/drivers/gpu/drm/drm_buddy.c
@@ -57,6 +57,16 @@ static void list_insert_sorted(struct drm_buddy *mm,
__list_add(&block->link, node->link.prev, &node->link);
 }
 
+static void clear_reset(struct drm_buddy_block *block)
+{
+   block->header &= ~DRM_BUDDY_HEADER_CLEAR;
+}
+
+static void mark_cleared(struct drm_buddy_block *block)
+{
+   block->header |= DRM_BUDDY_HEADER_CLEAR;
+}
+
 static void mark_allocated(struct drm_buddy_block *block)
 {
block->header &= ~DRM_BUDDY_HEADER_STATE;
@@ -223,6 +233,12 @@ static int split_block(struct drm_buddy *mm,
mark_free(mm, block->left);
mark_free(mm, block->right);
 
+   if (drm_buddy_block_is_clear(block)) {
+   mark_cleared(block->left);
+   mark_cleared(block->right);
+   clear_reset(block);
+   }
+
mark_split(block);
 
return 0;
@@ -273,6 +289,13 @@ static void __drm_buddy_free(struct drm_buddy *mm,
if (!drm_buddy_block_is_free(buddy))
break;
 
+   if (drm_buddy_block_is_clear(block) !=
+   drm_buddy_block_is_clear(buddy))
+   break;
+
+   if (drm_buddy_block_is_clear(block))
+   mark_cleared(parent);
+
list_del(&buddy->link);
 
drm_block_free(mm, block);
@@ -295,6 +318,9 @@ void drm_buddy_free_block(struct drm_buddy *mm,
 {
BUG_ON(!drm_buddy_block_is_allocated(block));
mm->avail += drm_buddy_block_size(mm, block);
+   if (drm_buddy_block_is_clear(block))
+   mm->clear_avail += drm_buddy_block_size(mm, block);
+
__drm_buddy_free(mm, block);
 }
 EXPORT_SYMBOL(drm_buddy_free_block);
@@ -305,10 +331,20 @@ EXPORT_SYMBOL(drm_buddy_free_block);
  * @mm: DRM buddy manager
  * @objects: input list head to free blocks
  */
-void drm_buddy_free_list(struct drm_buddy *mm, struct list_head *objects)
+void drm_buddy_free_list(struct drm_buddy *mm,
+struct list_head *objects,
+unsigned long flags)
 {
struct drm_buddy_block *block, *on;
 
+   if (flags & DRM_BUDDY_CLEARED) {
+   list_for_each_entry(block, objects, link)
+   mark_cleared(block);
+   } else {
+   list_for_each_entry(block, objects, link)
+   clear_reset(block);
+   }
+
list_for_each_entry_safe(block, on, objects, l

[PATCH 2/2] drm/amdgpu: Enable clear page functionality

2023-12-07 Thread Arunpravin Paneer Selvam
Add clear page support in vram memory region.

Signed-off-by: Arunpravin Paneer Selvam 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c| 13 +++--
 .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h| 25 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 50 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |  4 ++
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h  |  5 ++
 6 files changed, 105 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index cef920a93924..bc4ea87f8b5e 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -39,6 +39,7 @@
 #include "amdgpu.h"
 #include "amdgpu_trace.h"
 #include "amdgpu_amdkfd.h"
+#include "amdgpu_vram_mgr.h"
 
 /**
  * DOC: amdgpu_object
@@ -629,15 +630,17 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
 
if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED &&
bo->tbo.resource->mem_type == TTM_PL_VRAM) {
-   struct dma_fence *fence;
+   struct dma_fence *fence = NULL;
 
-   r = amdgpu_fill_buffer(bo, 0, bo->tbo.base.resv, &fence, true);
+   r = amdgpu_clear_buffer(bo, bo->tbo.base.resv, &fence, true);
if (unlikely(r))
goto fail_unreserve;
 
-   dma_resv_add_fence(bo->tbo.base.resv, fence,
-  DMA_RESV_USAGE_KERNEL);
-   dma_fence_put(fence);
+   if (fence) {
+   dma_resv_add_fence(bo->tbo.base.resv, fence,
+  DMA_RESV_USAGE_KERNEL);
+   dma_fence_put(fence);
+   }
}
if (!bp->resv)
amdgpu_bo_unreserve(bo);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
index 381101d2bf05..50fcd86e1033 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
@@ -164,4 +164,29 @@ static inline void amdgpu_res_next(struct 
amdgpu_res_cursor *cur, uint64_t size)
}
 }
 
+/**
+ * amdgpu_res_cleared - check if blocks are cleared
+ *
+ * @cur: the cursor to extract the block
+ *
+ * Check if the @cur block is cleared
+ */
+static inline bool amdgpu_res_cleared(struct amdgpu_res_cursor *cur)
+{
+   struct drm_buddy_block *block;
+
+   switch (cur->mem_type) {
+   case TTM_PL_VRAM:
+   block = cur->node;
+
+   if (!amdgpu_vram_mgr_is_cleared(block))
+   return false;
+   break;
+   default:
+   return false;
+   }
+
+   return true;
+}
+
 #endif
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
index 05991c5c8ddb..6d7514e8f40c 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
@@ -,6 +,56 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring *ring, 
uint32_t src_data,
return 0;
 }
 
+int amdgpu_clear_buffer(struct amdgpu_bo *bo,
+   struct dma_resv *resv,
+   struct dma_fence **fence,
+   bool delayed)
+{
+   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
+   struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
+   struct amdgpu_res_cursor cursor;
+   struct dma_fence *f = NULL;
+   u64 addr;
+   int r;
+
+   if (!adev->mman.buffer_funcs_enabled)
+   return -EINVAL;
+
+   amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
+
+   mutex_lock(&adev->mman.gtt_window_lock);
+   while (cursor.remaining) {
+   struct dma_fence *next = NULL;
+   u64 size;
+
+   /* Never clear more than 256MiB at once to avoid timeouts */
+   size = min(cursor.size, 256ULL << 20);
+
+   if (!amdgpu_res_cleared(&cursor)) {
+   r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, 
&cursor,
+ 1, ring, false, &size, &addr);
+   if (r)
+   goto err;
+
+   r = amdgpu_ttm_fill_mem(ring, 0, addr, size, resv,
+   &next, true, delayed);
+   if (r)
+   goto err;
+   }
+   dma_fence_put(f);
+   f = next;
+
+   amdgpu_res_next(&cursor, size);
+   }
+err:
+   mutex_unlock(&adev->mman.gtt_window_lock);
+   if (fence)
+   *fence = dma_fence_get(f);
+   dma_fence_put(f);
+
+   return r;
+}
+
 int amdgpu_fill_buffer(struct amdgpu_bo *bo,
uint32_t src_data,
struct

Re: [PATCH] drm/amd/display: Restore guard against default backlight value < 1 nit

2023-12-07 Thread Harry Wentland



On 2023-12-07 10:03, Alex Deucher wrote:
> On Thu, Dec 7, 2023 at 9:47 AM Mario Limonciello
>  wrote:
>>
>> Mark reports that brightness is not restored after Xorg dpms screen blank.
>>
>> This behavior was introduced by commit d9e865826c20 ("drm/amd/display:
>> Simplify brightness initialization") which dropped the cached backlight
>> value in display code, but also removed code for when the default value
>> read back was less than 1 nit.
>>
>> Restore this code so that the backlight brightness is restored to the
>> correct default value in this circumstance.
>>
>> Reported-by: Mark Herbert 
>> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3031
>> Cc: sta...@vger.kernel.org
>> Cc: Camille Cho 
>> Cc: Krunoslav Kovac 
>> Cc: Hamza Mahfooz 
>> Fixes: d9e865826c20 ("drm/amd/display: Simplify brightness initialization")
>> Signed-off-by: Mario Limonciello 
> 
> Acked-by: Alex Deucher 

Reviewed-by: Harry Wentland 

Harry

> 
>> ---
>>  .../amd/display/dc/link/protocols/link_edp_panel_control.c| 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git 
>> a/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c 
>> b/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c
>> index ac0fa88b52a0..bf53a86ea817 100644
>> --- a/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c
>> +++ b/drivers/gpu/drm/amd/display/dc/link/protocols/link_edp_panel_control.c
>> @@ -287,8 +287,8 @@ bool set_default_brightness_aux(struct dc_link *link)
>> if (link && link->dpcd_sink_ext_caps.bits.oled == 1) {
>> if (!read_default_bl_aux(link, &default_backlight))
>> default_backlight = 15;
>> -   // if > 5000, it might be wrong readback
>> -   if (default_backlight > 500)
>> +   // if < 1 nits or > 5000, it might be wrong readback
>> +   if (default_backlight < 1000 || default_backlight > 500)
>> default_backlight = 15;
>>
>> return edp_set_backlight_level_nits(link, true,
>> --
>> 2.34.1
>>



[PATCH v1] drm/amdgpu/jpeg: configure doorbell for each playback

2023-12-07 Thread Saleemkhan Jamadar
Doorbell is configured during start of each playback.

v1 - add comment for the doorbell programming change (Veera)

Signed-off-by: Saleemkhan Jamadar 
Acked-by: Leo Liu 
---
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
index 9df011323d4b..6ede85b28cc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
@@ -155,13 +155,6 @@ static int jpeg_v4_0_5_hw_init(void *handle)
struct amdgpu_ring *ring = adev->jpeg.inst->ring_dec;
int r;
 
-   adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell,
-   (adev->doorbell_index.vcn.vcn_ring0_1 << 1), 0);
-
-   WREG32_SOC15(VCN, 0, regVCN_JPEG_DB_CTRL,
-   ring->doorbell_index << VCN_JPEG_DB_CTRL__OFFSET__SHIFT |
-   VCN_JPEG_DB_CTRL__EN_MASK);
-
r = amdgpu_ring_test_helper(ring);
if (r)
return r;
@@ -336,6 +329,14 @@ static int jpeg_v4_0_5_start(struct amdgpu_device *adev)
if (adev->pm.dpm_enabled)
amdgpu_dpm_enable_jpeg(adev, true);
 
+   /* doorbell programming is done for every playback */
+   adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell,
+   (adev->doorbell_index.vcn.vcn_ring0_1 << 1), 0);
+
+   WREG32_SOC15(VCN, 0, regVCN_JPEG_DB_CTRL,
+   ring->doorbell_index << VCN_JPEG_DB_CTRL__OFFSET__SHIFT |
+   VCN_JPEG_DB_CTRL__EN_MASK);
+
/* disable power gating */
r = jpeg_v4_0_5_disable_static_power_gating(adev);
if (r)
-- 
2.25.1



RE: [PATCH v1] drm/amdgpu/jpeg: configure doorbell for each playback

2023-12-07 Thread Gopalakrishnan, Veerabadhran (Veera)
[AMD Official Use Only - General]

Looking good to me.

Reviewed-by: Veerabadhran Gopalakrishnan 

Regards,
Veera

-Original Message-
From: Jamadar, Saleemkhan 
Sent: Thursday, December 7, 2023 9:22 PM
To: Jamadar, Saleemkhan ; Liu, Leo 
; Gopalakrishnan, Veerabadhran (Veera) 
; amd-gfx@lists.freedesktop.org
Cc: Sundararaju, Sathishkumar ; Rao, Srinath 
; Deucher, Alexander ; Koenig, 
Christian 
Subject: [PATCH v1] drm/amdgpu/jpeg: configure doorbell for each playback

Doorbell is configured during start of each playback.

v1 - add comment for the doorbell programming change (Veera)

Signed-off-by: Saleemkhan Jamadar 
Acked-by: Leo Liu 
---
 drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c 
b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
index 9df011323d4b..6ede85b28cc8 100644
--- a/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
+++ b/drivers/gpu/drm/amd/amdgpu/jpeg_v4_0_5.c
@@ -155,13 +155,6 @@ static int jpeg_v4_0_5_hw_init(void *handle)
struct amdgpu_ring *ring = adev->jpeg.inst->ring_dec;
int r;

-   adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell,
-   (adev->doorbell_index.vcn.vcn_ring0_1 << 1), 0);
-
-   WREG32_SOC15(VCN, 0, regVCN_JPEG_DB_CTRL,
-   ring->doorbell_index << VCN_JPEG_DB_CTRL__OFFSET__SHIFT |
-   VCN_JPEG_DB_CTRL__EN_MASK);
-
r = amdgpu_ring_test_helper(ring);
if (r)
return r;
@@ -336,6 +329,14 @@ static int jpeg_v4_0_5_start(struct amdgpu_device *adev)
if (adev->pm.dpm_enabled)
amdgpu_dpm_enable_jpeg(adev, true);

+   /* doorbell programming is done for every playback */
+   adev->nbio.funcs->vcn_doorbell_range(adev, ring->use_doorbell,
+   (adev->doorbell_index.vcn.vcn_ring0_1 << 1), 0);
+
+   WREG32_SOC15(VCN, 0, regVCN_JPEG_DB_CTRL,
+   ring->doorbell_index << VCN_JPEG_DB_CTRL__OFFSET__SHIFT |
+   VCN_JPEG_DB_CTRL__EN_MASK);
+
/* disable power gating */
r = jpeg_v4_0_5_disable_static_power_gating(adev);
if (r)
--
2.25.1



Re: [PATCH 2/2] drm/amdgpu: Enable clear page functionality

2023-12-07 Thread Alex Deucher
On Thu, Dec 7, 2023 at 10:12 AM Arunpravin Paneer Selvam
 wrote:
>
> Add clear page support in vram memory region.
>
> Signed-off-by: Arunpravin Paneer Selvam 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c| 13 +++--
>  .../gpu/drm/amd/amdgpu/amdgpu_res_cursor.h| 25 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c   | 50 +++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.h   |  4 ++
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.c  | 14 +-
>  drivers/gpu/drm/amd/amdgpu/amdgpu_vram_mgr.h  |  5 ++
>  6 files changed, 105 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> index cef920a93924..bc4ea87f8b5e 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
> @@ -39,6 +39,7 @@
>  #include "amdgpu.h"
>  #include "amdgpu_trace.h"
>  #include "amdgpu_amdkfd.h"
> +#include "amdgpu_vram_mgr.h"
>
>  /**
>   * DOC: amdgpu_object
> @@ -629,15 +630,17 @@ int amdgpu_bo_create(struct amdgpu_device *adev,
>
> if (bp->flags & AMDGPU_GEM_CREATE_VRAM_CLEARED &&
> bo->tbo.resource->mem_type == TTM_PL_VRAM) {
> -   struct dma_fence *fence;
> +   struct dma_fence *fence = NULL;
>
> -   r = amdgpu_fill_buffer(bo, 0, bo->tbo.base.resv, &fence, 
> true);
> +   r = amdgpu_clear_buffer(bo, bo->tbo.base.resv, &fence, true);
> if (unlikely(r))
> goto fail_unreserve;
>
> -   dma_resv_add_fence(bo->tbo.base.resv, fence,
> -  DMA_RESV_USAGE_KERNEL);
> -   dma_fence_put(fence);
> +   if (fence) {
> +   dma_resv_add_fence(bo->tbo.base.resv, fence,
> +  DMA_RESV_USAGE_KERNEL);
> +   dma_fence_put(fence);
> +   }
> }
> if (!bp->resv)
> amdgpu_bo_unreserve(bo);
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
> index 381101d2bf05..50fcd86e1033 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_res_cursor.h
> @@ -164,4 +164,29 @@ static inline void amdgpu_res_next(struct 
> amdgpu_res_cursor *cur, uint64_t size)
> }
>  }
>
> +/**
> + * amdgpu_res_cleared - check if blocks are cleared
> + *
> + * @cur: the cursor to extract the block
> + *
> + * Check if the @cur block is cleared
> + */
> +static inline bool amdgpu_res_cleared(struct amdgpu_res_cursor *cur)
> +{
> +   struct drm_buddy_block *block;
> +
> +   switch (cur->mem_type) {
> +   case TTM_PL_VRAM:
> +   block = cur->node;
> +
> +   if (!amdgpu_vram_mgr_is_cleared(block))
> +   return false;
> +   break;
> +   default:
> +   return false;
> +   }
> +
> +   return true;
> +}
> +
>  #endif
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index 05991c5c8ddb..6d7514e8f40c 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -,6 +,56 @@ static int amdgpu_ttm_fill_mem(struct amdgpu_ring 
> *ring, uint32_t src_data,
> return 0;
>  }
>
> +int amdgpu_clear_buffer(struct amdgpu_bo *bo,

amdgpu_ttm_clear_buffer() for naming consistency.

Alex

> +   struct dma_resv *resv,
> +   struct dma_fence **fence,
> +   bool delayed)
> +{
> +   struct amdgpu_device *adev = amdgpu_ttm_adev(bo->tbo.bdev);
> +   struct amdgpu_ring *ring = adev->mman.buffer_funcs_ring;
> +   struct amdgpu_res_cursor cursor;
> +   struct dma_fence *f = NULL;
> +   u64 addr;
> +   int r;
> +
> +   if (!adev->mman.buffer_funcs_enabled)
> +   return -EINVAL;
> +
> +   amdgpu_res_first(bo->tbo.resource, 0, amdgpu_bo_size(bo), &cursor);
> +
> +   mutex_lock(&adev->mman.gtt_window_lock);
> +   while (cursor.remaining) {
> +   struct dma_fence *next = NULL;
> +   u64 size;
> +
> +   /* Never clear more than 256MiB at once to avoid timeouts */
> +   size = min(cursor.size, 256ULL << 20);
> +
> +   if (!amdgpu_res_cleared(&cursor)) {
> +   r = amdgpu_ttm_map_buffer(&bo->tbo, bo->tbo.resource, 
> &cursor,
> + 1, ring, false, &size, 
> &addr);
> +   if (r)
> +   goto err;
> +
> +   r = amdgpu_ttm_fill_mem(ring, 0, addr, size, resv,
> +   &next, true, delayed);
> +   if (r)
> +   goto err;
> +   }
> +   dma_fence_pu

Re: [PATCH 3/3] drm/amdgpu: add new INFO IOCTL query for input power

2023-12-07 Thread Alex Deucher
On Fri, Nov 10, 2023 at 3:22 AM Lazar, Lijo  wrote:
>
>
>
> On 11/10/2023 3:44 AM, Alex Deucher wrote:
> > Some chips provide both average and input power.  Previously
> > we just exposed average power, add a new query for input
> > power.
> >
>
> Input looks like a misnomer (not the supply side, but the power
> consumed). Better to rename to instantaneous or current power. I guess
> that will require rename of AMDGPU_PP_SENSOR_GPU_INPUT_POWER too.

It aligns with the sysfs naming.  E.g.,
https://www.kernel.org/doc/Documentation/hwmon/sysfs-interface

power[1-*]_input  Instantaneous power use
Unit: microWatt
RO

Alex

>
> Thanks,
> Lijo
>
> > Example userspace:
> > https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power
> >
> > Signed-off-by: Alex Deucher 
> > ---
> >   drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 9 +
> >   include/uapi/drm/amdgpu_drm.h   | 2 ++
> >   2 files changed, 11 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
> > b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > index bf4f48fe438d..48496bb585c7 100644
> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
> > @@ -1114,6 +1114,15 @@ int amdgpu_info_ioctl(struct drm_device *dev, void 
> > *data, struct drm_file *filp)
> >   }
> >   ui32 >>= 8;
> >   break;
> > + case AMDGPU_INFO_SENSOR_GPU_INPUT_POWER:
> > + /* get input GPU power */
> > + if (amdgpu_dpm_read_sensor(adev,
> > +
> > AMDGPU_PP_SENSOR_GPU_INPUT_POWER,
> > +(void *)&ui32, 
> > &ui32_size)) {
> > + return -EINVAL;
> > + }
> > + ui32 >>= 8;
> > + break;
> >   case AMDGPU_INFO_SENSOR_VDDNB:
> >   /* get VDDNB in millivolts */
> >   if (amdgpu_dpm_read_sensor(adev,
> > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> > index ad21c613fec8..96e32dafd4f0 100644
> > --- a/include/uapi/drm/amdgpu_drm.h
> > +++ b/include/uapi/drm/amdgpu_drm.h
> > @@ -865,6 +865,8 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
> >   #define AMDGPU_INFO_SENSOR_PEAK_PSTATE_GFX_SCLK 0xa
> >   /* Subquery id: Query GPU peak pstate memory clock */
> >   #define AMDGPU_INFO_SENSOR_PEAK_PSTATE_GFX_MCLK 0xb
> > + /* Subquery id: Query input GPU power   */
> > + #define AMDGPU_INFO_SENSOR_GPU_INPUT_POWER  0xc
> >   /* Number of VRAM page faults on CPU access. */
> >   #define AMDGPU_INFO_NUM_VRAM_CPU_PAGE_FAULTS0x1E
> >   #define AMDGPU_INFO_VRAM_LOST_COUNTER   0x1F


Re: [PATCH 1/2] drm/amdgpu/atom: fix vram_usagebyfirmware parsing

2023-12-07 Thread Alex Deucher
Ping on this series.

Alex

On Fri, Nov 17, 2023 at 11:17 AM Alex Deucher  wrote:
>
> The changes to support vram_usagebyfirmware v2.2 changed the behavior
> to explicitly match 2.1 for everything older rather than just using it
> by default.  If the version is 2.2 or newer, use the 2.2 parsing, for
> everything else, use the 2.1 parsing.  This restores the previous
> behavior for tables that didn't explicitly match 2.1.
>
> Fixes: 4864f2ee9ee2 ("drm/amdgpu: add vram reservation based on 
> vram_usagebyfirmware_v2_2")
> Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1215802
> Signed-off-by: Alex Deucher 
> ---
>  .../gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c   | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c
> index fb2681dd6b33..d8393e3f2778 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c
> @@ -181,18 +181,18 @@ int amdgpu_atomfirmware_allocate_fb_scratch(struct 
> amdgpu_device *adev)
> int usage_bytes = 0;
>
> if (amdgpu_atom_parse_data_header(ctx, index, NULL, &frev, &crev, 
> &data_offset)) {
> -   if (frev == 2 && crev == 1) {
> -   fw_usage_v2_1 =
> -   (struct vram_usagebyfirmware_v2_1 
> *)(ctx->bios + data_offset);
> -   amdgpu_atomfirmware_allocate_fb_v2_1(adev,
> -   fw_usage_v2_1,
> -   &usage_bytes);
> -   } else if (frev >= 2 && crev >= 2) {
> +   if (frev >= 2 && crev >= 2) {
> fw_usage_v2_2 =
> (struct vram_usagebyfirmware_v2_2 
> *)(ctx->bios + data_offset);
> amdgpu_atomfirmware_allocate_fb_v2_2(adev,
> -   fw_usage_v2_2,
> -   &usage_bytes);
> +fw_usage_v2_2,
> +&usage_bytes);
> +   } else {
> +   fw_usage_v2_1 =
> +   (struct vram_usagebyfirmware_v2_1 
> *)(ctx->bios + data_offset);
> +   amdgpu_atomfirmware_allocate_fb_v2_1(adev,
> +fw_usage_v2_1,
> +&usage_bytes);
> }
> }
>
> --
> 2.41.0
>


Re: [PATCH 1/2] drm/amdgpu/debugfs: fix error code when smc register accessors are NULL

2023-12-07 Thread Alex Deucher
Ping on this series?

Alex

On Mon, Nov 27, 2023 at 5:52 PM Alex Deucher  wrote:
>
> Should be -EOPNOTSUPP.
>
> Fixes: 5104fdf50d32 ("drm/amdgpu: Fix a null pointer access when the smc_rreg 
> pointer is NULL")
> Signed-off-by: Alex Deucher 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> index 0e61ebdb3f3e..8d4a3ff65c18 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c
> @@ -755,7 +755,7 @@ static ssize_t amdgpu_debugfs_regs_smc_read(struct file 
> *f, char __user *buf,
> int r;
>
> if (!adev->smc_rreg)
> -   return -EPERM;
> +   return -EOPNOTSUPP;
>
> if (size & 0x3 || *pos & 0x3)
> return -EINVAL;
> @@ -814,7 +814,7 @@ static ssize_t amdgpu_debugfs_regs_smc_write(struct file 
> *f, const char __user *
> int r;
>
> if (!adev->smc_wreg)
> -   return -EPERM;
> +   return -EOPNOTSUPP;
>
> if (size & 0x3 || *pos & 0x3)
> return -EINVAL;
> --
> 2.42.0
>


[PATCH 2/4] drm/amdgpu: fall back to INPUT power for AVG power via INFO IOCTL

2023-12-07 Thread Alex Deucher
For backwards compatibility with userspace.

Fixes: 47f1724db4fe ("drm/amd: Introduce `AMDGPU_PP_SENSOR_GPU_INPUT_POWER`")
Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2897
Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index b5ebafd4a3ad..bf4f48fe438d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1105,7 +1105,12 @@ int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file *filp)
if (amdgpu_dpm_read_sensor(adev,
   
AMDGPU_PP_SENSOR_GPU_AVG_POWER,
   (void *)&ui32, &ui32_size)) {
-   return -EINVAL;
+   /* fall back to input power for backwards 
compat */
+   if (amdgpu_dpm_read_sensor(adev,
+  
AMDGPU_PP_SENSOR_GPU_INPUT_POWER,
+  (void *)&ui32, 
&ui32_size)) {
+   return -EINVAL;
+   }
}
ui32 >>= 8;
break;
-- 
2.42.0



[PATCH 4/4] drm/amdgpu/pm: clarify debugfs pm output

2023-12-07 Thread Alex Deucher
On APUs power is SoC power, not just GPU.
Clarify that for UVD/VCE/VCN the IP is powered down,
not disabled which can confusing and lead to concerns
that the IP is actually not available.

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/pm/amdgpu_pm.c | 28 ++--
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/pm/amdgpu_pm.c 
b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
index 204e8d8aaace..063a8d09defc 100644
--- a/drivers/gpu/drm/amd/pm/amdgpu_pm.c
+++ b/drivers/gpu/drm/amd/pm/amdgpu_pm.c
@@ -4353,11 +4353,19 @@ static int amdgpu_debugfs_pm_info_pp(struct seq_file 
*m, struct amdgpu_device *a
if (!amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_VDDNB, (void 
*)&value, &size))
seq_printf(m, "\t%u mV (VDDNB)\n", value);
size = sizeof(uint32_t);
-   if (!amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_GPU_AVG_POWER, (void 
*)&query, &size))
-   seq_printf(m, "\t%u.%02u W (average GPU)\n", query >> 8, query 
& 0xff);
+   if (!amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_GPU_AVG_POWER, (void 
*)&query, &size)) {
+   if (adev->flags & AMD_IS_APU)
+   seq_printf(m, "\t%u.%02u W (average SoC including 
CPU)\n", query >> 8, query & 0xff);
+   else
+   seq_printf(m, "\t%u.%02u W (average SoC)\n", query >> 
8, query & 0xff);
+   }
size = sizeof(uint32_t);
-   if (!amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_GPU_INPUT_POWER, 
(void *)&query, &size))
-   seq_printf(m, "\t%u.%02u W (current GPU)\n", query >> 8, query 
& 0xff);
+   if (!amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_GPU_INPUT_POWER, 
(void *)&query, &size)) {
+   if (adev->flags & AMD_IS_APU)
+   seq_printf(m, "\t%u.%02u W (current SoC including 
CPU)\n", query >> 8, query & 0xff);
+   else
+   seq_printf(m, "\t%u.%02u W (current SoC)\n", query >> 
8, query & 0xff);
+   }
size = sizeof(value);
seq_printf(m, "\n");
 
@@ -4383,9 +4391,9 @@ static int amdgpu_debugfs_pm_info_pp(struct seq_file *m, 
struct amdgpu_device *a
/* VCN clocks */
if (!amdgpu_dpm_read_sensor(adev, 
AMDGPU_PP_SENSOR_VCN_POWER_STATE, (void *)&value, &size)) {
if (!value) {
-   seq_printf(m, "VCN: Disabled\n");
+   seq_printf(m, "VCN: Powered down\n");
} else {
-   seq_printf(m, "VCN: Enabled\n");
+   seq_printf(m, "VCN: Powered up\n");
if (!amdgpu_dpm_read_sensor(adev, 
AMDGPU_PP_SENSOR_UVD_DCLK, (void *)&value, &size))
seq_printf(m, "\t%u MHz (DCLK)\n", 
value/100);
if (!amdgpu_dpm_read_sensor(adev, 
AMDGPU_PP_SENSOR_UVD_VCLK, (void *)&value, &size))
@@ -4397,9 +4405,9 @@ static int amdgpu_debugfs_pm_info_pp(struct seq_file *m, 
struct amdgpu_device *a
/* UVD clocks */
if (!amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_UVD_POWER, 
(void *)&value, &size)) {
if (!value) {
-   seq_printf(m, "UVD: Disabled\n");
+   seq_printf(m, "UVD: Powered down\n");
} else {
-   seq_printf(m, "UVD: Enabled\n");
+   seq_printf(m, "UVD: Powered up\n");
if (!amdgpu_dpm_read_sensor(adev, 
AMDGPU_PP_SENSOR_UVD_DCLK, (void *)&value, &size))
seq_printf(m, "\t%u MHz (DCLK)\n", 
value/100);
if (!amdgpu_dpm_read_sensor(adev, 
AMDGPU_PP_SENSOR_UVD_VCLK, (void *)&value, &size))
@@ -4411,9 +4419,9 @@ static int amdgpu_debugfs_pm_info_pp(struct seq_file *m, 
struct amdgpu_device *a
/* VCE clocks */
if (!amdgpu_dpm_read_sensor(adev, AMDGPU_PP_SENSOR_VCE_POWER, 
(void *)&value, &size)) {
if (!value) {
-   seq_printf(m, "VCE: Disabled\n");
+   seq_printf(m, "VCE: Powered down\n");
} else {
-   seq_printf(m, "VCE: Enabled\n");
+   seq_printf(m, "VCE: Powered up\n");
if (!amdgpu_dpm_read_sensor(adev, 
AMDGPU_PP_SENSOR_VCE_ECCLK, (void *)&value, &size))
seq_printf(m, "\t%u MHz (ECCLK)\n", 
value/100);
}
-- 
2.42.0



[PATCH 1/4] drm/amdgpu: fix avg vs input power reporting on smu7

2023-12-07 Thread Alex Deucher
Hawaii, Bonaire, Fiji, and Tonga support average power, the others
support current power.

Signed-off-by: Alex Deucher 
---
 .../gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c | 17 -
 1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c 
b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
index 11372fcc59c8..a2c7b2e111fa 100644
--- a/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
+++ b/drivers/gpu/drm/amd/pm/powerplay/hwmgr/smu7_hwmgr.c
@@ -3995,6 +3995,7 @@ static int smu7_read_sensor(struct pp_hwmgr *hwmgr, int 
idx,
uint32_t sclk, mclk, activity_percent;
uint32_t offset, val_vid;
struct smu7_hwmgr *data = (struct smu7_hwmgr *)(hwmgr->backend);
+   struct amdgpu_device *adev = hwmgr->adev;
 
/* size must be at least 4 bytes for all sensors */
if (*size < 4)
@@ -4038,7 +4039,21 @@ static int smu7_read_sensor(struct pp_hwmgr *hwmgr, int 
idx,
*size = 4;
return 0;
case AMDGPU_PP_SENSOR_GPU_INPUT_POWER:
-   return smu7_get_gpu_power(hwmgr, (uint32_t *)value);
+   if ((adev->asic_type != CHIP_HAWAII) &&
+   (adev->asic_type != CHIP_BONAIRE) &&
+   (adev->asic_type != CHIP_FIJI) &&
+   (adev->asic_type != CHIP_TONGA))
+   return smu7_get_gpu_power(hwmgr, (uint32_t *)value);
+   else
+   return -EOPNOTSUPP;
+   case AMDGPU_PP_SENSOR_GPU_AVG_POWER:
+   if ((adev->asic_type != CHIP_HAWAII) &&
+   (adev->asic_type != CHIP_BONAIRE) &&
+   (adev->asic_type != CHIP_FIJI) &&
+   (adev->asic_type != CHIP_TONGA))
+   return -EOPNOTSUPP;
+   else
+   return smu7_get_gpu_power(hwmgr, (uint32_t *)value);
case AMDGPU_PP_SENSOR_VDDGFX:
if ((data->vr_config & VRCONF_VDDGFX_MASK) ==
(VR_SVI2_PLANE_2 << VRCONF_VDDGFX_SHIFT))
-- 
2.42.0



[PATCH 3/4] drm/amdgpu: add new INFO IOCTL query for input power

2023-12-07 Thread Alex Deucher
Some chips provide both average and input power.  Previously
we just exposed average power, add a new query for input
power.

Example userspace:
https://github.com/Umio-Yasuno/libdrm-amdgpu-sys-rs/tree/input_power

Signed-off-by: Alex Deucher 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 9 +
 include/uapi/drm/amdgpu_drm.h   | 2 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index bf4f48fe438d..48496bb585c7 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -1114,6 +1114,15 @@ int amdgpu_info_ioctl(struct drm_device *dev, void 
*data, struct drm_file *filp)
}
ui32 >>= 8;
break;
+   case AMDGPU_INFO_SENSOR_GPU_INPUT_POWER:
+   /* get input GPU power */
+   if (amdgpu_dpm_read_sensor(adev,
+  
AMDGPU_PP_SENSOR_GPU_INPUT_POWER,
+  (void *)&ui32, &ui32_size)) {
+   return -EINVAL;
+   }
+   ui32 >>= 8;
+   break;
case AMDGPU_INFO_SENSOR_VDDNB:
/* get VDDNB in millivolts */
if (amdgpu_dpm_read_sensor(adev,
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index ad21c613fec8..96e32dafd4f0 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -865,6 +865,8 @@ struct drm_amdgpu_cs_chunk_cp_gfx_shadow {
#define AMDGPU_INFO_SENSOR_PEAK_PSTATE_GFX_SCLK 0xa
/* Subquery id: Query GPU peak pstate memory clock */
#define AMDGPU_INFO_SENSOR_PEAK_PSTATE_GFX_MCLK 0xb
+   /* Subquery id: Query input GPU power   */
+   #define AMDGPU_INFO_SENSOR_GPU_INPUT_POWER  0xc
 /* Number of VRAM page faults on CPU access. */
 #define AMDGPU_INFO_NUM_VRAM_CPU_PAGE_FAULTS   0x1E
 #define AMDGPU_INFO_VRAM_LOST_COUNTER  0x1F
-- 
2.42.0



[PATCH 0/2] fdinfo shared stats

2023-12-07 Thread Alex Deucher
We had a request to add shared buffer stats to fdinfo for amdgpu and
while implementing that, Christian mentioned that just looking at
the GEM handle count doesn't take into account buffers shared with other
subsystems like V4L or RDMA.  Those subsystems don't use GEM, so it
doesn't really matter from a GPU top perspective, but it's more
correct if you actually want to see shared buffers.

Alex Deucher (2):
  drm: update drm_show_memory_stats() for dma-bufs
  drm/amdgpu: add shared fdinfo stats

 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
 drivers/gpu/drm/drm_file.c |  2 +-
 4 files changed, 22 insertions(+), 1 deletion(-)

-- 
2.42.0



[PATCH 1/2] drm: update drm_show_memory_stats() for dma-bufs

2023-12-07 Thread Alex Deucher
Show buffers as shared if they are shared via dma-buf as well
(e.g., shared with v4l or some other subsystem).

Signed-off-by: Alex Deucher 
Cc: Rob Clark 
---
 drivers/gpu/drm/drm_file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/drm_file.c b/drivers/gpu/drm/drm_file.c
index 5ddaffd32586..5d5f93b9c263 100644
--- a/drivers/gpu/drm/drm_file.c
+++ b/drivers/gpu/drm/drm_file.c
@@ -973,7 +973,7 @@ void drm_show_memory_stats(struct drm_printer *p, struct 
drm_file *file)
DRM_GEM_OBJECT_PURGEABLE;
}
 
-   if (obj->handle_count > 1) {
+   if ((obj->handle_count > 1) || obj->dma_buf) {
status.shared += obj->size;
} else {
status.private += obj->size;
-- 
2.42.0



[PATCH 2/2] drm/amdgpu: add shared fdinfo stats

2023-12-07 Thread Alex Deucher
Add shared stats.  Useful for seeing shared memory.

v2: take dma-buf into account as well

Signed-off-by: Alex Deucher 
Cc: Rob Clark 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c |  4 
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 11 +++
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h |  6 ++
 3 files changed, 21 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
index 5706b282a0c7..c7df7fa3459f 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_fdinfo.c
@@ -97,6 +97,10 @@ void amdgpu_show_fdinfo(struct drm_printer *p, struct 
drm_file *file)
   stats.requested_visible_vram/1024UL);
drm_printf(p, "amd-requested-gtt:\t%llu KiB\n",
   stats.requested_gtt/1024UL);
+   drm_printf(p, "drm-shared-vram:\t%llu KiB\n", stats.vram_shared/1024UL);
+   drm_printf(p, "drm-shared-gtt:\t%llu KiB\n", stats.gtt_shared/1024UL);
+   drm_printf(p, "drm-shared-cpu:\t%llu KiB\n", stats.cpu_shared/1024UL);
+
for (hw_ip = 0; hw_ip < AMDGPU_HW_IP_NUM; ++hw_ip) {
if (!usage[hw_ip])
continue;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
index d79b4ca1ecfc..1b37d95475b8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.c
@@ -1287,25 +1287,36 @@ void amdgpu_bo_get_memory(struct amdgpu_bo *bo,
  struct amdgpu_mem_stats *stats)
 {
uint64_t size = amdgpu_bo_size(bo);
+   struct drm_gem_object *obj;
unsigned int domain;
+   bool shared;
 
/* Abort if the BO doesn't currently have a backing store */
if (!bo->tbo.resource)
return;
 
+   obj = &bo->tbo.base;
+   shared = (obj->handle_count > 1) || obj->dma_buf;
+
domain = amdgpu_mem_type_to_domain(bo->tbo.resource->mem_type);
switch (domain) {
case AMDGPU_GEM_DOMAIN_VRAM:
stats->vram += size;
if (amdgpu_bo_in_cpu_visible_vram(bo))
stats->visible_vram += size;
+   if (shared)
+   stats->vram_shared += size;
break;
case AMDGPU_GEM_DOMAIN_GTT:
stats->gtt += size;
+   if (shared)
+   stats->gtt_shared += size;
break;
case AMDGPU_GEM_DOMAIN_CPU:
default:
stats->cpu += size;
+   if (shared)
+   stats->cpu_shared += size;
break;
}
 
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
index d28e21baef16..0503af75dc26 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_object.h
@@ -138,12 +138,18 @@ struct amdgpu_bo_vm {
 struct amdgpu_mem_stats {
/* current VRAM usage, includes visible VRAM */
uint64_t vram;
+   /* current shared VRAM usage, includes visible VRAM */
+   uint64_t vram_shared;
/* current visible VRAM usage */
uint64_t visible_vram;
/* current GTT usage */
uint64_t gtt;
+   /* current shared GTT usage */
+   uint64_t gtt_shared;
/* current system memory usage */
uint64_t cpu;
+   /* current shared system memory usage */
+   uint64_t cpu_shared;
/* sum of evicted buffers, includes visible VRAM */
uint64_t evicted_vram;
/* sum of evicted buffers due to CPU access */
-- 
2.42.0



[PATCH] drm/amd/display: fix cursor-plane-only atomic commits not triggering pageflips

2023-12-07 Thread Xaver Hugl
With VRR, every atomic commit affecting a given display must trigger
a new scanout cycle, so that userspace is able to control the refresh
rate of the display. Before this commit, this was not the case for
atomic commits that only contain cursor plane properties.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3034
Cc: sta...@vger.kernel.org

Signed-off-by: Xaver Hugl 
---
 drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index b452796fc6d3..b379c859fbef 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -8149,9 +8149,15 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,
/* Cursor plane is handled after stream updates */
if (plane->type == DRM_PLANE_TYPE_CURSOR) {
if ((fb && crtc == pcrtc) ||
-   (old_plane_state->fb && old_plane_state->crtc == 
pcrtc))
+   (old_plane_state->fb && old_plane_state->crtc == 
pcrtc)) {
cursor_update = true;
-
+   /*
+* With atomic modesetting, cursor changes must
+* also trigger a new refresh period with vrr
+*/
+   if (!state->legacy_cursor_update)
+   pflip_present = true;
+   }
continue;
}
 
-- 
2.43.0



Re: [PATCH] drm/amdkfd: Fix sparse __rcu annotation warnings

2023-12-07 Thread Felix Kuehling



On 2023-12-05 17:20, Felix Kuehling wrote:

Properly mark kfd_process->ef as __rcu and consistently access it with
rcu_dereference_protected.

Reported-by: kernel test robot 
Closes: 
https://lore.kernel.org/oe-kbuild-all/202312052245.yfpbsgnh-...@intel.com/
Signed-off-by: Felix Kuehling 


ping.

Christian, would you review this patch, please?

Thanks,
  Felix




---
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h   | 2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 4 ++--
  drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 2 +-
  drivers/gpu/drm/amd/amdkfd/kfd_process.c | 6 --
  4 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
index f2e920734c98..20cb266dcedd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd.h
@@ -314,7 +314,7 @@ void amdgpu_amdkfd_gpuvm_unmap_gtt_bo_from_kernel(struct 
kgd_mem *mem);
  int amdgpu_amdkfd_map_gtt_bo_to_gart(struct amdgpu_device *adev, struct 
amdgpu_bo *bo);
  
  int amdgpu_amdkfd_gpuvm_restore_process_bos(void *process_info,

-   struct dma_fence **ef);
+   struct dma_fence __rcu **ef);
  int amdgpu_amdkfd_gpuvm_get_vm_fault_info(struct amdgpu_device *adev,
  struct kfd_vm_fault_info *info);
  int amdgpu_amdkfd_gpuvm_import_dmabuf_fd(struct amdgpu_device *adev, int fd,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index 7d91f99acb59..8ba6f6c8363d 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -2806,7 +2806,7 @@ static void amdgpu_amdkfd_restore_userptr_worker(struct 
work_struct *work)
put_task_struct(usertask);
  }
  
-static void replace_eviction_fence(struct dma_fence **ef,

+static void replace_eviction_fence(struct dma_fence __rcu **ef,
   struct dma_fence *new_ef)
  {
struct dma_fence *old_ef = rcu_replace_pointer(*ef, new_ef, true
@@ -2841,7 +2841,7 @@ static void replace_eviction_fence(struct dma_fence **ef,
   * 7.  Add fence to all PD and PT BOs.
   * 8.  Unreserve all BOs
   */
-int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence **ef)
+int amdgpu_amdkfd_gpuvm_restore_process_bos(void *info, struct dma_fence __rcu 
**ef)
  {
struct amdkfd_process_info *process_info = info;
struct amdgpu_vm *peer_vm;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 45366b4ca976..5a24097a9f28 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -917,7 +917,7 @@ struct kfd_process {
 * fence will be triggered during eviction and new one will be created
 * during restore
 */
-   struct dma_fence *ef;
+   struct dma_fence __rcu *ef;
  
  	/* Work items for evicting and restoring BOs */

struct delayed_work eviction_work;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 71df51fcc1b0..14b11d61f8dd 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1110,6 +1110,8 @@ static void kfd_process_wq_release(struct work_struct 
*work)
  {
struct kfd_process *p = container_of(work, struct kfd_process,
 release_work);
+   struct dma_fence *ef = rcu_dereference_protected(p->ef,
+   kref_read(&p->ref) == 0);
  
  	kfd_process_dequeue_from_all_devices(p);

pqm_uninit(&p->pqm);
@@ -1118,7 +1120,7 @@ static void kfd_process_wq_release(struct work_struct 
*work)
 * destroyed. This allows any BOs to be freed without
 * triggering pointless evictions or waiting for fences.
 */
-   dma_fence_signal(p->ef);
+   dma_fence_signal(ef);
  
  	kfd_process_remove_sysfs(p);
  
@@ -1127,7 +1129,7 @@ static void kfd_process_wq_release(struct work_struct *work)

svm_range_list_fini(p);
  
  	kfd_process_destroy_pdds(p);

-   dma_fence_put(p->ef);
+   dma_fence_put(ef);
  
  	kfd_event_free_process(p);
  


Re: [PATCH] drm/amd/display: fix cursor-plane-only atomic commits not triggering pageflips

2023-12-07 Thread Xaver Hugl
Sorry, it looks like I sent this too soon. I tested the patch on a
second PC and it doesn't fix the issue there.


Am Do., 7. Dez. 2023 um 19:25 Uhr schrieb Xaver Hugl :
>
> With VRR, every atomic commit affecting a given display must trigger
> a new scanout cycle, so that userspace is able to control the refresh
> rate of the display. Before this commit, this was not the case for
> atomic commits that only contain cursor plane properties.
>
> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3034
> Cc: sta...@vger.kernel.org
>
> Signed-off-by: Xaver Hugl 
> ---
>  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
> b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index b452796fc6d3..b379c859fbef 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -8149,9 +8149,15 @@ static void amdgpu_dm_commit_planes(struct 
> drm_atomic_state *state,
> /* Cursor plane is handled after stream updates */
> if (plane->type == DRM_PLANE_TYPE_CURSOR) {
> if ((fb && crtc == pcrtc) ||
> -   (old_plane_state->fb && old_plane_state->crtc == 
> pcrtc))
> +   (old_plane_state->fb && old_plane_state->crtc == 
> pcrtc)) {
> cursor_update = true;
> -
> +   /*
> +* With atomic modesetting, cursor changes 
> must
> +* also trigger a new refresh period with vrr
> +*/
> +   if (!state->legacy_cursor_update)
> +   pflip_present = true;
> +   }
> continue;
> }
>
> --
> 2.43.0
>


Re: [PATCH] drm/amd/display: fix cursor-plane-only atomic commits not triggering pageflips

2023-12-07 Thread Harry Wentland

On 2023-12-07 13:25, Xaver Hugl wrote:

With VRR, every atomic commit affecting a given display must trigger
a new scanout cycle, so that userspace is able to control the refresh
rate of the display. Before this commit, this was not the case for
atomic commits that only contain cursor plane properties.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3034
Cc: sta...@vger.kernel.org

Signed-off-by: Xaver Hugl 


Reviewed-by: Harry Wentland 

Harry


---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index b452796fc6d3..b379c859fbef 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -8149,9 +8149,15 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,
/* Cursor plane is handled after stream updates */
if (plane->type == DRM_PLANE_TYPE_CURSOR) {
if ((fb && crtc == pcrtc) ||
-   (old_plane_state->fb && old_plane_state->crtc == 
pcrtc))
+   (old_plane_state->fb && old_plane_state->crtc == 
pcrtc)) {
cursor_update = true;
-
+   /*
+* With atomic modesetting, cursor changes must
+* also trigger a new refresh period with vrr
+*/
+   if (!state->legacy_cursor_update)
+   pflip_present = true;
+   }
continue;
}
  


Re: [PATCH] drm/amd/display: fix cursor-plane-only atomic commits not triggering pageflips

2023-12-07 Thread Harry Wentland




On 2023-12-07 14:30, Xaver Hugl wrote:

Sorry, it looks like I sent this too soon. I tested the patch on a
second PC and it doesn't fix the issue there.



Ah, too bad. Won't merge it then.

Harry



Am Do., 7. Dez. 2023 um 19:25 Uhr schrieb Xaver Hugl :


With VRR, every atomic commit affecting a given display must trigger
a new scanout cycle, so that userspace is able to control the refresh
rate of the display. Before this commit, this was not the case for
atomic commits that only contain cursor plane properties.

Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3034
Cc: sta...@vger.kernel.org

Signed-off-by: Xaver Hugl 
---
  drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c 
b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index b452796fc6d3..b379c859fbef 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -8149,9 +8149,15 @@ static void amdgpu_dm_commit_planes(struct 
drm_atomic_state *state,
 /* Cursor plane is handled after stream updates */
 if (plane->type == DRM_PLANE_TYPE_CURSOR) {
 if ((fb && crtc == pcrtc) ||
-   (old_plane_state->fb && old_plane_state->crtc == 
pcrtc))
+   (old_plane_state->fb && old_plane_state->crtc == 
pcrtc)) {
 cursor_update = true;
-
+   /*
+* With atomic modesetting, cursor changes must
+* also trigger a new refresh period with vrr
+*/
+   if (!state->legacy_cursor_update)
+   pflip_present = true;
+   }
 continue;
 }

--
2.43.0



Re: [PATCH][next] drm/amd/display: Fix spelling mistake "SMC_MSG_AllowZstatesEntr" -> "SMC_MSG_AllowZstatesEntry"

2023-12-07 Thread Alex Deucher
Applied.  Thanks!

Alex

On Thu, Dec 7, 2023 at 6:32 AM Colin Ian King  wrote:
>
> There is a spelling mistake in a smu_print message. Fix it.
>
> Signed-off-by: Colin Ian King 
> ---
>  drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.c | 8 
>  1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.c 
> b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.c
> index d6db9d7fced2..6d4a1ffab5ed 100644
> --- a/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.c
> +++ b/drivers/gpu/drm/amd/display/dc/clk_mgr/dcn35/dcn35_smu.c
> @@ -361,26 +361,26 @@ void dcn35_smu_set_zstate_support(struct 
> clk_mgr_internal *clk_mgr, enum dcn_zst
> case DCN_ZSTATE_SUPPORT_ALLOW:
> msg_id = VBIOSSMC_MSG_AllowZstatesEntry;
> param = (1 << 10) | (1 << 9) | (1 << 8);
> -   smu_print("%s: SMC_MSG_AllowZstatesEntr msg = ALLOW, param = 
> %d\n", __func__, param);
> +   smu_print("%s: SMC_MSG_AllowZstatesEntry msg = ALLOW, param = 
> %d\n", __func__, param);
> break;
>
> case DCN_ZSTATE_SUPPORT_DISALLOW:
> msg_id = VBIOSSMC_MSG_AllowZstatesEntry;
> param = 0;
> -   smu_print("%s: SMC_MSG_AllowZstatesEntr msg_id = DISALLOW, 
> param = %d\n",  __func__, param);
> +   smu_print("%s: SMC_MSG_AllowZstatesEntry msg_id = DISALLOW, 
> param = %d\n",  __func__, param);
> break;
>
>
> case DCN_ZSTATE_SUPPORT_ALLOW_Z10_ONLY:
> msg_id = VBIOSSMC_MSG_AllowZstatesEntry;
> param = (1 << 10);
> -   smu_print("%s: SMC_MSG_AllowZstatesEntr msg = ALLOW_Z10_ONLY, 
> param = %d\n", __func__, param);
> +   smu_print("%s: SMC_MSG_AllowZstatesEntry msg = 
> ALLOW_Z10_ONLY, param = %d\n", __func__, param);
> break;
>
> case DCN_ZSTATE_SUPPORT_ALLOW_Z8_Z10_ONLY:
> msg_id = VBIOSSMC_MSG_AllowZstatesEntry;
> param = (1 << 10) | (1 << 8);
> -   smu_print("%s: SMC_MSG_AllowZstatesEntr msg = 
> ALLOW_Z8_Z10_ONLY, param = %d\n", __func__, param);
> +   smu_print("%s: SMC_MSG_AllowZstatesEntry msg = 
> ALLOW_Z8_Z10_ONLY, param = %d\n", __func__, param);
> break;
>
> case DCN_ZSTATE_SUPPORT_ALLOW_Z8_ONLY:
> --
> 2.39.2
>


[PATCH v2 00/23] Support Host Trap Sampling for gfx941/gfx942

2023-12-07 Thread James Zhu
PC sampling is a form of software profiling, where the threads of an application
are periodically interrupted and the program counter that the threads are 
currently
attempting to execute is saved out for profiling.

David Yat Sin (4):
  drm/amdkfd/kfd_ioctl: add pc sampling support
  drm/amdkfd: add pc sampling support
  drm/amdkfd: enable pc sampling query
  drm/amdkfd: enable pc sampling create

James Zhu (19):
  drm/amdkfd: add pc sampling mutex
  drm/amdkfd: add trace_id return
  drm/amdkfd: check pcs_enrty valid
  drm/amdkfd: enable pc sampling destroy
  drm/amdkfd: add interface to trigger pc sampling trap
  drm/amdkfd: trigger pc sampling trap for gfx v9
  drm/amdkfd/gfx9: enable host trap
  drm/amdgpu: use trapID 4 for host trap
  drm/amdgpu: add sq host trap status check
  drm/amdkfd: trigger pc sampling trap for arcturus
  drm/amdkfd: trigger pc sampling trap for aldebaran
  drm/amdkfd: use bit operation set debug trap
  drm/amdkfd: add setting trap pc sampling flag
  drm/amdkfd: enable pc sampling stop
  drm/amdkfd: add queue remapping
  drm/amdkfd: enable pc sampling start
  drm/amdkfd: add pc sampling thread to trigger trap
  drm/amdkfd: add pc sampling release when process release
  drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

 .../drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c  |   11 +
 .../drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c   |   14 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |   73 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |7 +
 drivers/gpu/drm/amd/amdkfd/Makefile   |3 +-
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2106 +
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |   29 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c  |   44 +
 drivers/gpu/drm/amd/amdkfd/kfd_device.c   |   14 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.c |   11 +
 .../drm/amd/amdkfd/kfd_device_queue_manager.h |5 +
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  372 +++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h  |   35 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h |   43 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c  |   32 +-
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |2 +
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |5 +
 .../gpu/drm/amd/include/kgd_kfd_interface.h   |6 +
 include/uapi/linux/kfd_ioctl.h|   60 +-
 19 files changed, 1813 insertions(+), 1059 deletions(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

-- 
2.25.1



[PATCH v2 02/23] drm/amdkfd: add pc sampling support

2023-12-07 Thread James Zhu
From: David Yat Sin 

Add pc sampling functions in amdkfd.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/Makefile  |  3 +-
 drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 44 +++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 78 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h | 34 +
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 13 
 5 files changed, 171 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
 create mode 100644 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h

diff --git a/drivers/gpu/drm/amd/amdkfd/Makefile 
b/drivers/gpu/drm/amd/amdkfd/Makefile
index a5ae7bcf44eb..790fd028a681 100644
--- a/drivers/gpu/drm/amd/amdkfd/Makefile
+++ b/drivers/gpu/drm/amd/amdkfd/Makefile
@@ -57,7 +57,8 @@ AMDKFD_FILES  := $(AMDKFD_PATH)/kfd_module.o \
$(AMDKFD_PATH)/kfd_int_process_v11.o \
$(AMDKFD_PATH)/kfd_smi_events.o \
$(AMDKFD_PATH)/kfd_crat.o \
-   $(AMDKFD_PATH)/kfd_debug.o
+   $(AMDKFD_PATH)/kfd_debug.o \
+   $(AMDKFD_PATH)/kfd_pc_sampling.o
 
 ifneq ($(CONFIG_DEBUG_FS),)
 AMDKFD_FILES += $(AMDKFD_PATH)/kfd_debugfs.o
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
index f6d4748c1980..1a3a8ded9c93 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_chardev.c
@@ -41,6 +41,7 @@
 #include "kfd_priv.h"
 #include "kfd_device_queue_manager.h"
 #include "kfd_svm.h"
+#include "kfd_pc_sampling.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_smi_events.h"
 #include "amdgpu_dma_buf.h"
@@ -1750,6 +1751,38 @@ static int kfd_ioctl_svm(struct file *filep, struct 
kfd_process *p, void *data)
 }
 #endif
 
+static int kfd_ioctl_pc_sample(struct file *filep,
+  struct kfd_process *p, void __user *data)
+{
+   struct kfd_ioctl_pc_sample_args *args = data;
+   struct kfd_process_device *pdd;
+   int ret;
+
+   if (sched_policy == KFD_SCHED_POLICY_NO_HWS) {
+   pr_err("PC Sampling does not support sched_policy %i", 
sched_policy);
+   return -EINVAL;
+   }
+
+   mutex_lock(&p->mutex);
+   pdd = kfd_process_device_data_by_id(p, args->gpu_id);
+
+   if (!pdd) {
+   pr_debug("could not find gpu id 0x%x.", args->gpu_id);
+   ret = -EINVAL;
+   } else {
+   pdd = kfd_bind_process_to_device(pdd->dev, p);
+   if (IS_ERR(pdd)) {
+   pr_debug("failed to bind process %p with gpu id 0x%x", 
p, args->gpu_id);
+   ret = -ESRCH;
+   } else {
+   ret = kfd_pc_sample(pdd, args);
+   }
+   }
+   mutex_unlock(&p->mutex);
+
+   return ret;
+}
+
 static int criu_checkpoint_process(struct kfd_process *p,
 uint8_t __user *user_priv_data,
 uint64_t *priv_offset)
@@ -3224,6 +3257,9 @@ static const struct amdkfd_ioctl_desc amdkfd_ioctls[] = {
 
AMDKFD_IOCTL_DEF(AMDKFD_IOC_DBG_TRAP,
kfd_ioctl_set_debug_trap, 0),
+
+   AMDKFD_IOCTL_DEF(AMDKFD_IOC_PC_SAMPLE,
+   kfd_ioctl_pc_sample, KFD_IOC_FLAG_PERFMON),
 };
 
 #define AMDKFD_CORE_IOCTL_COUNTARRAY_SIZE(amdkfd_ioctls)
@@ -3300,6 +3336,14 @@ static long kfd_ioctl(struct file *filep, unsigned int 
cmd, unsigned long arg)
}
}
 
+   /* PC Sampling Monitor */
+   if (unlikely(ioctl->flags & KFD_IOC_FLAG_PERFMON)) {
+   if (!capable(CAP_PERFMON) && !capable(CAP_SYS_ADMIN)) {
+   retcode = -EACCES;
+   goto err_i1;
+   }
+   }
+
if (cmd & (IOC_IN | IOC_OUT)) {
if (asize <= sizeof(stack_kdata)) {
kdata = stack_kdata;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
new file mode 100644
index ..a7e78ff42d07
--- /dev/null
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -0,0 +1,78 @@
+// SPDX-License-Identifier: GPL-2.0 OR MIT
+/*
+ * Copyright 2023 Advanced Micro Devices, Inc.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be included in
+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY O

[PATCH v2 01/23] drm/amdkfd/kfd_ioctl: add pc sampling support

2023-12-07 Thread James Zhu
From: David Yat Sin 

Add pc sampling support in kfd_ioctl.

The user mode code which uses this new kfd_ioctl is linked to
https://github.com/RadeonOpenCompute/ROCT-Thunk-Interface
with master branch.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 include/uapi/linux/kfd_ioctl.h | 57 +-
 1 file changed, 56 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index f0ed68974c54..1bd1347effea 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -1446,6 +1446,58 @@ struct kfd_ioctl_dbg_trap_args {
};
 };
 
+/**
+ * kfd_ioctl_pc_sample_op - PC Sampling ioctl operations
+ *
+ * @KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES: Query device PC Sampling capabilities
+ * @KFD_IOCTL_PCS_OP_CREATE: Register this process with a 
per-device PC sampler instance
+ * @KFD_IOCTL_PCS_OP_DESTROY:Unregister from a previously 
registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_START:  Process begins taking samples from a 
previously registered PC sampler instance
+ * @KFD_IOCTL_PCS_OP_STOP:   Process stops taking samples from a 
previously registered PC sampler instance
+ */
+enum kfd_ioctl_pc_sample_op {
+   KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES,
+   KFD_IOCTL_PCS_OP_CREATE,
+   KFD_IOCTL_PCS_OP_DESTROY,
+   KFD_IOCTL_PCS_OP_START,
+   KFD_IOCTL_PCS_OP_STOP,
+};
+
+/* Values have to be a power of 2*/
+#define KFD_IOCTL_PCS_FLAG_POWER_OF_2 0x0001
+
+enum kfd_ioctl_pc_sample_method {
+   KFD_IOCTL_PCS_METHOD_HOSTTRAP = 1,
+   KFD_IOCTL_PCS_METHOD_STOCHASTIC,
+};
+
+enum kfd_ioctl_pc_sample_type {
+   KFD_IOCTL_PCS_TYPE_TIME_US,
+   KFD_IOCTL_PCS_TYPE_CLOCK_CYCLES,
+   KFD_IOCTL_PCS_TYPE_INSTRUCTIONS
+};
+
+struct kfd_pc_sample_info {
+   __u64 interval;  /* [IN] if PCS_TYPE_INTERVAL_US: sample interval 
in us
+ * if PCS_TYPE_CLOCK_CYCLES: sample interval in 
graphics core clk cycles
+ * if PCS_TYPE_INSTRUCTIONS: sample interval in 
instructions issued by
+ * graphics compute units
+ */
+   __u64 interval_min;  /* [OUT] */
+   __u64 interval_max;  /* [OUT] */
+   __u64 flags; /* [OUT] indicate potential restrictions e.g 
FLAG_POWER_OF_2 */
+   __u32 method;/* [IN/OUT] kfd_ioctl_pc_sample_method */
+   __u32 type;  /* [IN/OUT] kfd_ioctl_pc_sample_type */
+};
+
+struct kfd_ioctl_pc_sample_args {
+   __u64 sample_info_ptr;   /* array of kfd_pc_sample_info */
+   __u32 num_sample_info;
+   __u32 op;/* kfd_ioctl_pc_sample_op */
+   __u32 gpu_id;
+   __u32 trace_id;
+};
+
 #define AMDKFD_IOCTL_BASE 'K'
 #define AMDKFD_IO(nr)  _IO(AMDKFD_IOCTL_BASE, nr)
 #define AMDKFD_IOR(nr, type)   _IOR(AMDKFD_IOCTL_BASE, nr, type)
@@ -1566,7 +1618,10 @@ struct kfd_ioctl_dbg_trap_args {
 #define AMDKFD_IOC_DBG_TRAP\
AMDKFD_IOWR(0x26, struct kfd_ioctl_dbg_trap_args)
 
+#define AMDKFD_IOC_PC_SAMPLE   \
+   AMDKFD_IOWR(0x27, struct kfd_ioctl_pc_sample_args)
+
 #define AMDKFD_COMMAND_START   0x01
-#define AMDKFD_COMMAND_END 0x27
+#define AMDKFD_COMMAND_END 0x28
 
 #endif
-- 
2.25.1



[PATCH v2 04/23] drm/amdkfd: add pc sampling mutex

2023-12-07 Thread James Zhu
Add pc sampling mutex per node, and do init/destroy in node init.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c | 12 
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h   |  7 +++
 2 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0a9cf9dfc224..0e24e011f66b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -533,6 +533,16 @@ static void kfd_smi_init(struct kfd_node *dev)
spin_lock_init(&dev->smi_lock);
 }
 
+static void kfd_pc_sampling_init(struct kfd_node *dev)
+{
+   mutex_init(&dev->pcs_data.mutex);
+}
+
+static void kfd_pc_sampling_exit(struct kfd_node *dev)
+{
+   mutex_destroy(&dev->pcs_data.mutex);
+}
+
 static int kfd_init_node(struct kfd_node *node)
 {
int err = -1;
@@ -563,6 +573,7 @@ static int kfd_init_node(struct kfd_node *node)
}
 
kfd_smi_init(node);
+   kfd_pc_sampling_init(node);
 
return 0;
 
@@ -593,6 +604,7 @@ static void kfd_cleanup_nodes(struct kfd_dev *kfd, unsigned 
int num_nodes)
kfd_topology_remove_device(knode);
if (knode->gws)
amdgpu_amdkfd_free_gws(knode->adev, knode->gws);
+   kfd_pc_sampling_exit(knode);
kfree(knode);
kfd->nodes[i] = NULL;
}
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 99426182bfc6..cbaa1bccd94b 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -269,6 +269,11 @@ struct kfd_vmid_info {
 
 struct kfd_dev;
 
+/* Per device PC Sampling data */
+struct kfd_dev_pc_sampling {
+   struct mutex mutex;
+};
+
 struct kfd_node {
unsigned int node_id;
struct amdgpu_device *adev; /* Duplicated here along with keeping
@@ -322,6 +327,8 @@ struct kfd_node {
struct kfd_local_mem_info local_mem_info;
 
struct kfd_dev *kfd;
+
+   struct kfd_dev_pc_sampling pcs_data;
 };
 
 struct kfd_dev {
-- 
2.25.1



[PATCH v2 05/23] drm/amdkfd: enable pc sampling create

2023-12-07 Thread James Zhu
From: David Yat Sin 

Enable pc sampling create.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 53 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h| 10 
 2 files changed, 62 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 49fecbc7013e..7828a6340edf 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -97,7 +97,58 @@ static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
 static int kfd_pc_sample_create(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
-   return -EINVAL;
+   struct kfd_pc_sample_info *supported_format = NULL;
+   struct kfd_pc_sample_info user_info;
+   int ret;
+   int i;
+
+   if (user_args->num_sample_info != 1)
+   return -EINVAL;
+
+   ret = copy_from_user(&user_info, (void __user *) 
user_args->sample_info_ptr,
+   sizeof(struct kfd_pc_sample_info));
+   if (ret) {
+   pr_debug("Failed to copy PC sampling info from user\n");
+   return -EFAULT;
+   }
+
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+   if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version
+   && user_info.method == 
supported_formats[i].sample_info->method
+   && user_info.type == 
supported_formats[i].sample_info->type
+   && user_info.interval <= 
supported_formats[i].sample_info->interval_max
+   && user_info.interval >= 
supported_formats[i].sample_info->interval_min) {
+   supported_format =
+   (struct kfd_pc_sample_info 
*)supported_formats[i].sample_info;
+   break;
+   }
+   }
+
+   if (!supported_format) {
+   pr_debug("Sampling format is not supported!");
+   return -EOPNOTSUPP;
+   }
+
+   mutex_lock(&pdd->dev->pcs_data.mutex);
+   if (pdd->dev->pcs_data.hosttrap_entry.base.use_count &&
+   memcmp(&pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   &user_info, sizeof(user_info))) {
+   ret = copy_to_user((void __user *) user_args->sample_info_ptr,
+   &pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(&pdd->dev->pcs_data.mutex);
+   return ret ? -EFAULT : -EEXIST;
+   }
+
+   /* TODO: add trace_id return */
+
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+   pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = 
user_info;
+
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
+   mutex_unlock(&pdd->dev->pcs_data.mutex);
+
+   return 0;
 }
 
 static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index cbaa1bccd94b..db2d09db8000 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -269,9 +269,19 @@ struct kfd_vmid_info {
 
 struct kfd_dev;
 
+struct kfd_dev_pc_sampling_data {
+   uint32_t use_count; /* Num of PC sampling sessions */
+   struct kfd_pc_sample_info pc_sample_info;
+};
+
+struct kfd_dev_pcs_hosttrap {
+   struct kfd_dev_pc_sampling_data base;
+};
+
 /* Per device PC Sampling data */
 struct kfd_dev_pc_sampling {
struct mutex mutex;
+   struct kfd_dev_pcs_hosttrap hosttrap_entry;
 };
 
 struct kfd_node {
-- 
2.25.1



[PATCH v2 07/23] drm/amdkfd: check pcs_enrty valid

2023-12-07 Thread James Zhu
Check pcs_entry valid for pc sampling ioctl.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 33 ++--
 1 file changed, 30 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index b44dfea15539..e5aa87b2da4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -178,6 +178,24 @@ static int kfd_pc_sample_destroy(struct kfd_process_device 
*pdd, uint32_t trace_
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args)
 {
+   struct pc_sampling_entry *pcs_entry;
+
+   if (args->op != KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES &&
+   args->op != KFD_IOCTL_PCS_OP_CREATE) {
+
+   mutex_lock(&pdd->dev->pcs_data.mutex);
+   pcs_entry = 
idr_find(&pdd->dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+   args->trace_id);
+   mutex_unlock(&pdd->dev->pcs_data.mutex);
+
+   /* pcs_entry is only for this pc sampling process,
+* which has kfd_process->mutex protected here.
+*/
+   if (!pcs_entry ||
+   pcs_entry->pdd != pdd)
+   return -EINVAL;
+   }
+
switch (args->op) {
case KFD_IOCTL_PCS_OP_QUERY_CAPABILITIES:
return kfd_pc_sample_query_cap(pdd, args);
@@ -186,13 +204,22 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
return kfd_pc_sample_create(pdd, args);
 
case KFD_IOCTL_PCS_OP_DESTROY:
-   return kfd_pc_sample_destroy(pdd, args->trace_id);
+   if (pcs_entry->enabled)
+   return -EBUSY;
+   else
+   return kfd_pc_sample_destroy(pdd, args->trace_id);
 
case KFD_IOCTL_PCS_OP_START:
-   return kfd_pc_sample_start(pdd);
+   if (pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_start(pdd);
 
case KFD_IOCTL_PCS_OP_STOP:
-   return kfd_pc_sample_stop(pdd);
+   if (!pcs_entry->enabled)
+   return -EALREADY;
+   else
+   return kfd_pc_sample_stop(pdd);
}
 
return -EINVAL;
-- 
2.25.1



[PATCH v2 03/23] drm/amdkfd: enable pc sampling query

2023-12-07 Thread James Zhu
From: David Yat Sin 

Enable pc sampling to query system capability.

Co-developed-by: James Zhu 
Signed-off-by: James Zhu 
Signed-off-by: David Yat Sin 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 54 +++-
 1 file changed, 53 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index a7e78ff42d07..49fecbc7013e 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -25,10 +25,62 @@
 #include "amdgpu_amdkfd.h"
 #include "kfd_pc_sampling.h"
 
+struct supported_pc_sample_info {
+   uint32_t ip_version;
+   const struct kfd_pc_sample_info *sample_info;
+};
+
+const struct kfd_pc_sample_info sample_info_hosttrap_9_0_0 = {
+   0, 1, ~0ULL, 0, KFD_IOCTL_PCS_METHOD_HOSTTRAP, 
KFD_IOCTL_PCS_TYPE_TIME_US };
+
+struct supported_pc_sample_info supported_formats[] = {
+   { IP_VERSION(9, 4, 1), &sample_info_hosttrap_9_0_0 },
+   { IP_VERSION(9, 4, 2), &sample_info_hosttrap_9_0_0 },
+};
+
 static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
-   return -EINVAL;
+   uint64_t sample_offset;
+   int num_method = 0;
+   int i;
+
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++)
+   if (KFD_GC_VERSION(pdd->dev) == supported_formats[i].ip_version)
+   num_method++;
+
+   if (!num_method) {
+   pr_debug("PC Sampling not supported on GC_HWIP:0x%x.",
+   pdd->dev->adev->ip_versions[GC_HWIP][0]);
+   return -EOPNOTSUPP;
+   }
+
+   if (!user_args->sample_info_ptr) {
+   user_args->num_sample_info = num_method;
+   return 0;
+   }
+
+   if (user_args->num_sample_info < num_method) {
+   user_args->num_sample_info = num_method;
+   pr_debug("Sample info buffer is not large enough, "
+"ASIC requires space for %d kfd_pc_sample_info 
entries.", num_method);
+   return -ENOSPC;
+   }
+
+   sample_offset = user_args->sample_info_ptr;
+   for (i = 0; i < ARRAY_SIZE(supported_formats); i++) {
+   if (KFD_GC_VERSION(pdd->dev) == 
supported_formats[i].ip_version) {
+   int ret = copy_to_user((void __user *) sample_offset,
+   supported_formats[i].sample_info, sizeof(struct 
kfd_pc_sample_info));
+   if (ret) {
+   pr_debug("Failed to copy PC sampling info to 
user.");
+   return -EFAULT;
+   }
+   sample_offset += sizeof(struct kfd_pc_sample_info);
+   }
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_start(struct kfd_process_device *pdd)
-- 
2.25.1



[PATCH v2 10/23] drm/amdkfd: trigger pc sampling trap for gfx v9

2023-12-07 Thread James Zhu
Implement trigger pc sampling trap for gfx v9.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 36 +++
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h |  7 
 2 files changed, 43 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 5a35a8ca8922..7d8c0e13ac12 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1144,6 +1144,42 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
kgd_gfx_v9_unlock_srbm(adev, inst);
 }
 
+uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t max_wave_slot,
+   uint32_t max_simd,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) {
+   uint32_t value = 0;
+
+   value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP);
+   value = REG_SET_FIELD(value, SQ_CMD, MODE, 
SQ_IND_CMD_MODE_SINGLE);
+
+   /* select *target_simd */
+   value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd);
+   /* select *target_wave_slot */
+   value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, 
(*target_wave_slot)++);
+
+   mutex_lock(&adev->grbm_idx_mutex);
+   amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x, 0);
+   WREG32_SOC15(GC, 0, mmSQ_CMD, value);
+   mutex_unlock(&adev->grbm_idx_mutex);
+
+   *target_wave_slot %= max_wave_slot;
+   if (!(*target_wave_slot)) {
+   (*target_simd)++;
+   *target_simd %= max_simd;
+   }
+   } else {
+   pr_debug("PC Sampling method %d not supported.", method);
+   return -EOPNOTSUPP;
+   }
+   return 0;
+}
+
 const struct kfd2kgd_calls gfx_v9_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
index ce424615f59b..b47b926891a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.h
@@ -101,3 +101,10 @@ void kgd_gfx_v9_build_grace_period_packet_info(struct 
amdgpu_device *adev,
   uint32_t grace_period,
   uint32_t *reg_offset,
   uint32_t *reg_data);
+uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t max_wave_slot,
+   uint32_t max_simd,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method);
-- 
2.25.1



[PATCH v2 06/23] drm/amdkfd: add trace_id return

2023-12-07 Thread James Zhu
Add trace_id return for new pc sampling creation per device,
Use IDR to quickly locate pc_sampling_entry for reference.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device.c  |  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  6 ++
 3 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
index 0e24e011f66b..bcaeedac8fe0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device.c
@@ -536,10 +536,12 @@ static void kfd_smi_init(struct kfd_node *dev)
 static void kfd_pc_sampling_init(struct kfd_node *dev)
 {
mutex_init(&dev->pcs_data.mutex);
+   idr_init_base(&dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, 1);
 }
 
 static void kfd_pc_sampling_exit(struct kfd_node *dev)
 {
+   idr_destroy(&dev->pcs_data.hosttrap_entry.base.pc_sampling_idr);
mutex_destroy(&dev->pcs_data.mutex);
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 7828a6340edf..b44dfea15539 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -99,6 +99,7 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
 {
struct kfd_pc_sample_info *supported_format = NULL;
struct kfd_pc_sample_info user_info;
+   struct pc_sampling_entry *pcs_entry;
int ret;
int i;
 
@@ -140,7 +141,19 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
return ret ? -EFAULT : -EEXIST;
}
 
-   /* TODO: add trace_id return */
+   pcs_entry = kzalloc(sizeof(*pcs_entry), GFP_KERNEL);
+   if (!pcs_entry) {
+   mutex_unlock(&pdd->dev->pcs_data.mutex);
+   return -ENOMEM;
+   }
+
+   i = 
idr_alloc_cyclic(&pdd->dev->pcs_data.hosttrap_entry.base.pc_sampling_idr,
+   pcs_entry, 1, 0, GFP_KERNEL);
+   if (i < 0) {
+   mutex_unlock(&pdd->dev->pcs_data.mutex);
+   kfree(pcs_entry);
+   return i;
+   }
 
if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info = 
user_info;
@@ -148,6 +161,11 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
pdd->dev->pcs_data.hosttrap_entry.base.use_count++;
mutex_unlock(&pdd->dev->pcs_data.mutex);
 
+   pcs_entry->pdd = pdd;
+   user_args->trace_id = (uint32_t)i;
+
+   pr_debug("alloc pcs_entry = %p, trace_id = 0x%x on gpu 0x%x", 
pcs_entry, i, pdd->dev->id);
+
return 0;
 }
 
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index db2d09db8000..7ca7cc726246 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -271,6 +271,7 @@ struct kfd_dev;
 
 struct kfd_dev_pc_sampling_data {
uint32_t use_count; /* Num of PC sampling sessions */
+   struct idr pc_sampling_idr;
struct kfd_pc_sample_info pc_sample_info;
 };
 
@@ -756,6 +757,11 @@ enum kfd_pdd_bound {
  */
 #define SDMA_ACTIVITY_DIVISOR  100
 
+struct pc_sampling_entry {
+   bool enabled;
+   struct kfd_process_device *pdd;
+};
+
 /* Data that is per-process-per device. */
 struct kfd_process_device {
/* The device that owns this data. */
-- 
2.25.1



[PATCH v2 08/23] drm/amdkfd: enable pc sampling destroy

2023-12-07 Thread James Zhu
Enable pc sampling destroy.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 20 +---
 1 file changed, 17 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index e5aa87b2da4f..18fe06d712c5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -169,10 +169,24 @@ static int kfd_pc_sample_create(struct kfd_process_device 
*pdd,
return 0;
 }
 
-static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id)
+static int kfd_pc_sample_destroy(struct kfd_process_device *pdd, uint32_t 
trace_id,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   pr_debug("free pcs_entry = %p, trace_id = 0x%x on gpu 0x%x",
+   pcs_entry, trace_id, pdd->dev->id);
+
+   mutex_lock(&pdd->dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.use_count--;
+   idr_remove(&pdd->dev->pcs_data.hosttrap_entry.base.pc_sampling_idr, 
trace_id);
 
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.use_count)
+   memset(&pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info, 
0x0,
+   sizeof(struct kfd_pc_sample_info));
+   mutex_unlock(&pdd->dev->pcs_data.mutex);
+
+   kvfree(pcs_entry);
+
+   return 0;
 }
 
 int kfd_pc_sample(struct kfd_process_device *pdd,
@@ -207,7 +221,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (pcs_entry->enabled)
return -EBUSY;
else
-   return kfd_pc_sample_destroy(pdd, args->trace_id);
+   return kfd_pc_sample_destroy(pdd, args->trace_id, 
pcs_entry);
 
case KFD_IOCTL_PCS_OP_START:
if (pcs_entry->enabled)
-- 
2.25.1



[PATCH v2 09/23] drm/amdkfd: add interface to trigger pc sampling trap

2023-12-07 Thread James Zhu
Add interface to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/include/kgd_kfd_interface.h | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h 
b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
index 6d094cf3587d..05b0255aca37 100644
--- a/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
+++ b/drivers/gpu/drm/amd/include/kgd_kfd_interface.h
@@ -33,6 +33,7 @@
 #include 
 #include "amdgpu_irq.h"
 #include "amdgpu_gfx.h"
+#include 
 
 struct pci_dev;
 struct amdgpu_device;
@@ -318,6 +319,11 @@ struct kfd2kgd_calls {
void (*program_trap_handler_settings)(struct amdgpu_device *adev,
uint32_t vmid, uint64_t tba_addr, uint64_t tma_addr,
uint32_t inst);
+   uint32_t (*trigger_pc_sample_trap)(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method method);
 };
 
 #endif /* KGD_KFD_INTERFACE_H_INCLUDED */
-- 
2.25.1



[PATCH v2 14/23] drm/amdkfd: trigger pc sampling trap for arcturus

2023-12-07 Thread James Zhu
Implement trigger pc sampling trap for arcturus.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c| 14 +-
 1 file changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
index 0ba15dcbe4e1..10b362e072a6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_arcturus.c
@@ -390,6 +390,17 @@ static uint32_t kgd_arcturus_disable_debug_trap(struct 
amdgpu_device *adev,
 
return 0;
 }
+
+static uint32_t kgd_arcturus_trigger_pc_sample_trap(struct amdgpu_device *adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 10, 4,
+   target_simd, target_wave_slot, method);
+}
+
 const struct kfd2kgd_calls arcturus_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -418,5 +429,6 @@ const struct kfd2kgd_calls arcturus_kfd2kgd = {
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.get_cu_occupancy = kgd_gfx_v9_get_cu_occupancy,
-   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings
+   .program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
+   .trigger_pc_sample_trap = kgd_arcturus_trigger_pc_sample_trap
 };
-- 
2.25.1



[PATCH v2 11/23] drm/amdkfd/gfx9: enable host trap

2023-12-07 Thread James Zhu
Enable host trap.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 63 +++
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm | 24 ---
 2 files changed, 52 insertions(+), 35 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index df75863393fc..747426bd5181 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,14 +274,14 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf820258,
+   0xbf820001, 0xbf82025e,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -294,7 +294,7 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -303,13 +303,16 @@ static const uint32_t cwsr_trap_gfx9_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   0xc0031c3d, 0x0010,
+   0xc0071bbd, 0x,
0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x86ee6e6e,
+   0xbf8cc07f, 0x8671ff6d,
+   0x0100, 0xbf840004,
+   0x92f1ff70, 0x00010001,
+   0xbf840016, 0xbf820005,
+   0x86708170, 0x8e709770,
+   0x8977ff77, 0x0080,
+   0x8077, 0x86ee6e6e,
0xbf840001, 0xbe801d6e,
0x866eff6d, 0x01ff,
0xbf850005, 0x8778ff78,
@@ -1098,14 +1101,14 @@ static const uint32_t cwsr_trap_nv1x_hex[] = {
 };
 
 static const uint32_t cwsr_trap_arcturus_hex[] = {
-   0xbf820001, 0xbf8202d4,
+   0xbf820001, 0xbf8202da,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1118,7 +1121,7 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -1127,13 +1130,16 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   0xc0031c3d, 0x0010,
+   0xc0071bbd, 0x,
0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x86ee6e6e,
+   0xbf8cc07f, 0x8671ff6d,
+   0x0100, 0xbf840004,
+   0x92f1ff70, 0x00010001,
+   0xbf840016, 0xbf820005,
+   0x86708170, 0x8e709770,
+   0x8977ff77, 0x0080,
+   0x8077, 0x86ee6e6e,
0xbf840001, 0xbe801d6e,
0x866eff6d, 0x01ff,
0xbf850005, 0x8778ff78,
@@ -1578,14 +1584,14 @@ static const uint32_t cwsr_trap_arcturus_hex[] = {
 };
 
 static const uint32_t cwsr_trap_aldebaran_hex[] = {
-   0xbf820001, 0xbf8202df,
+   0xbf820001, 0xbf8202e5,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
0x00ff, 0xbf85001e,
0x866eff7b, 0x0400,
-   0xbf850055, 0xbf8e0010,
+   0xbf85005b, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
0xbf850015, 0x866eff7b,
@@ -1598,7 +1604,7 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0xbf850007, 0xb8eef801,
0x866eff6e, 0x0800,
0xbf850003, 0x866eff7b,
-   0x0400, 0xbf85003a,
+   0x0400, 0xbf850040,
0xb8faf807, 0x867aff7a,
0x001f8000, 0x8e7a8b7a,
0x8977ff77, 0xfc00,
@@ -1607,13 +1613,16 @@ static const uint32_t cwsr_trap_aldebaran_hex[] = {
0xb8fbf813, 0x8efa887a,
0xbf0d8f7b, 0xbf840002,
0x877bff7b, 0x,
-   0xc0031bbd, 0x0010,
-   0xbf8cc07f, 0x8e6e976e,
-   0x8977ff77, 0x0080,
-   0x87776e77, 0xc0071bbd,
-   0x, 0xbf8cc07f,
+   0xc

[PATCH v2 13/23] drm/amdgpu: add sq host trap status check

2023-12-07 Thread James Zhu
Before fire a new host trap, check the host trap status.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c | 35 +++
 .../amd/include/asic_reg/gc/gc_9_0_offset.h   |  2 ++
 .../amd/include/asic_reg/gc/gc_9_0_sh_mask.h  |  5 +++
 3 files changed, 42 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index adfe5e5585e5..43edd62df5fe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1144,6 +1144,35 @@ void kgd_gfx_v9_program_trap_handler_settings(struct 
amdgpu_device *adev,
kgd_gfx_v9_unlock_srbm(adev, inst);
 }
 
+static uint32_t kgd_aldebaran_get_hosttrap_status(struct amdgpu_device *adev)
+{
+   uint32_t sq_hosttrap_status = 0x0;
+   int i, j;
+
+   mutex_lock(&adev->grbm_idx_mutex);
+   for (i = 0; i < adev->gfx.config.max_shader_engines; i++) {
+   for (j = 0; j < adev->gfx.config.max_sh_per_se; j++) {
+   amdgpu_gfx_select_se_sh(adev, i, j, 0x, 0);
+   sq_hosttrap_status = RREG32_SOC15(GC, 0, 
mmSQ_HOSTTRAP_STATUS);
+
+   if (sq_hosttrap_status & 
SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK) {
+   WREG32_SOC15(GC, 0, mmSQ_HOSTTRAP_STATUS,
+   
SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK);
+   sq_hosttrap_status = 0x0;
+   continue;
+   }
+   if (sq_hosttrap_status)
+   goto out;
+   }
+   }
+
+out:
+   amdgpu_gfx_select_se_sh(adev, 0x, 0x, 0x, 0);
+   mutex_unlock(&adev->grbm_idx_mutex);
+
+   return sq_hosttrap_status;
+}
+
 uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct amdgpu_device *adev,
uint32_t vmid,
uint32_t max_wave_slot,
@@ -1154,6 +1183,12 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct 
amdgpu_device *adev,
 {
if (method == KFD_IOCTL_PCS_METHOD_HOSTTRAP) {
uint32_t value = 0;
+   uint32_t sq_hosttrap_status = 0x0;
+
+   sq_hosttrap_status = kgd_aldebaran_get_hosttrap_status(adev);
+   /* skip when last host trap request is still pending to 
complete */
+   if (sq_hosttrap_status)
+   return 0;
 
value = REG_SET_FIELD(value, SQ_CMD, CMD, SQ_IND_CMD_CMD_TRAP);
value = REG_SET_FIELD(value, SQ_CMD, MODE, 
SQ_IND_CMD_MODE_SINGLE);
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
index 12d451e5475b..5b17d9066452 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_offset.h
@@ -462,6 +462,8 @@
 #define mmSQ_IND_DATA_BASE_IDX 
0
 #define mmSQ_CMD   
0x037b
 #define mmSQ_CMD_BASE_IDX  
0
+#define mmSQ_HOSTTRAP_STATUS   
0x0376
+#define mmSQ_HOSTTRAP_STATUS_BASE_IDX  
0
 #define mmSQ_TIME_HI   
0x037c
 #define mmSQ_TIME_HI_BASE_IDX  
0
 #define mmSQ_TIME_LO   
0x037d
diff --git a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h 
b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
index efc16ddf274a..3dfe4ab31421 100644
--- a/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
+++ b/drivers/gpu/drm/amd/include/asic_reg/gc/gc_9_0_sh_mask.h
@@ -2616,6 +2616,11 @@
 //SQ_CMD_TIMESTAMP
 #define SQ_CMD_TIMESTAMP__TIMESTAMP__SHIFT 
   0x0
 #define SQ_CMD_TIMESTAMP__TIMESTAMP_MASK   
   0x00FFL
+//SQ_HOSTTRAP_STATUS
+#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT__SHIFT  
   0x0
+#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE__SHIFT  
   0x8
+#define SQ_HOSTTRAP_STATUS__HTPENDINGCOUNT_MASK
   0x00FFL
+#define SQ_HOSTTRAP_STATUS__HTPENDING_OVERRIDE_MASK
   0x

[PATCH v2 20/23] drm/amdkfd: enable pc sampling start

2023-12-07 Thread James Zhu
Enable pc sampling start.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 26 +---
 1 file changed, 23 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 7d0722498bf5..49b5d4c9f7e0 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -84,9 +84,29 @@ static int kfd_pc_sample_query_cap(struct kfd_process_device 
*pdd,
return 0;
 }
 
-static int kfd_pc_sample_start(struct kfd_process_device *pdd)
+static int kfd_pc_sample_start(struct kfd_process_device *pdd,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   bool pc_sampling_start = false;
+
+   pcs_entry->enabled = true;
+   mutex_lock(&pdd->dev->pcs_data.mutex);
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count)
+   pc_sampling_start = true;
+   pdd->dev->pcs_data.hosttrap_entry.base.active_count++;
+   mutex_unlock(&pdd->dev->pcs_data.mutex);
+
+   while (pc_sampling_start) {
+   if 
(READ_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable)) {
+   usleep_range(1000, 2000);
+   } else {
+   kfd_process_set_trap_pc_sampling_flag(&pdd->qpd,
+   
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+   break;
+   }
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
@@ -252,7 +272,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (pcs_entry->enabled)
return -EALREADY;
else
-   return kfd_pc_sample_start(pdd);
+   return kfd_pc_sample_start(pdd, pcs_entry);
 
case KFD_IOCTL_PCS_OP_STOP:
if (!pcs_entry->enabled)
-- 
2.25.1



[PATCH v2 17/23] drm/amdkfd: add setting trap pc sampling flag

2023-12-07 Thread James Zhu
Add setting trap pc sampling flag.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  2 ++
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 13 +
 2 files changed, 15 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 7ca7cc726246..b9a36891d099 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -1198,6 +1198,8 @@ void kfd_process_set_trap_handler(struct 
qcm_process_device *qpd,
  uint64_t tma_addr);
 void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
 bool enabled);
+void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd,
+enum kfd_ioctl_pc_sample_method method, 
bool enabled);
 
 /* CWSR initialization */
 int kfd_process_init_cwsr_apu(struct kfd_process *process, struct file *filep);
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 1a31b556a5ff..6bc9dcfad484 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1460,6 +1460,19 @@ void kfd_process_set_trap_debug_flag(struct 
qcm_process_device *qpd,
}
 }
 
+void kfd_process_set_trap_pc_sampling_flag(struct qcm_process_device *qpd,
+enum kfd_ioctl_pc_sample_method method, 
bool enabled)
+{
+   if (qpd->cwsr_kaddr) {
+   volatile unsigned long *tma =
+   (volatile unsigned long *)(qpd->cwsr_kaddr + 
KFD_CWSR_TMA_OFFSET);
+   if (enabled)
+   set_bit(method, &tma[2]);
+   else
+   clear_bit(method, &tma[2]);
+   }
+}
+
 /*
  * On return the kfd_process is fully operational and will be freed when the
  * mm is released
-- 
2.25.1



[PATCH v2 16/23] drm/amdkfd: use bit operation set debug trap

2023-12-07 Thread James Zhu
1st level TMA's 2nd byte which used for trap type setting,
to use bit operation to change selected bit only.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_process.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 71df51fcc1b0..1a31b556a5ff 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -1440,13 +1440,23 @@ bool kfd_process_xnack_mode(struct kfd_process *p, bool 
supported)
return true;
 }
 
+/* bit offset in 1st-level TMA's 2nd byte which used for KFD_TRAP_TYPE_BIT */
+enum KFD_TRAP_TYPE_BIT {
+   KFD_TRAP_TYPE_DEBUG = 0,/* bit 0 for debug trap */
+   KFD_TRAP_TYPE_HOST,
+   KFD_TRAP_TYPE_STOCHASTIC,
+};
+
 void kfd_process_set_trap_debug_flag(struct qcm_process_device *qpd,
 bool enabled)
 {
if (qpd->cwsr_kaddr) {
-   uint64_t *tma =
-   (uint64_t *)(qpd->cwsr_kaddr + KFD_CWSR_TMA_OFFSET);
-   tma[2] = enabled;
+   volatile unsigned long *tma =
+   (volatile unsigned long *)(qpd->cwsr_kaddr + 
KFD_CWSR_TMA_OFFSET);
+   if (enabled)
+   set_bit(KFD_TRAP_TYPE_DEBUG, &tma[2]);
+   else
+   clear_bit(KFD_TRAP_TYPE_DEBUG, &tma[2]);
}
 }
 
-- 
2.25.1



[PATCH v2 18/23] drm/amdkfd: enable pc sampling stop

2023-12-07 Thread James Zhu
Enable pc sampling stop.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 28 +---
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  4 +++
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 18fe06d712c5..29a6f9f40f83 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -88,10 +88,32 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd)
return -EINVAL;
 }
 
-static int kfd_pc_sample_stop(struct kfd_process_device *pdd)
+static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
+   struct pc_sampling_entry *pcs_entry)
 {
-   return -EINVAL;
+   bool pc_sampling_stop = false;
+
+   pcs_entry->enabled = false;
+   mutex_lock(&pdd->dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.active_count--;
+   if (!pdd->dev->pcs_data.hosttrap_entry.base.active_count) {
+   WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, 
true);
+   pc_sampling_stop = true;
+   }
+   mutex_unlock(&pdd->dev->pcs_data.mutex);
 
+   if (pc_sampling_stop) {
+   kfd_process_set_trap_pc_sampling_flag(&pdd->qpd,
+   
pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
+
+   mutex_lock(&pdd->dev->pcs_data.mutex);
+   pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;
+   pdd->dev->pcs_data.hosttrap_entry.base.target_wave_slot = 0;
+   WRITE_ONCE(pdd->dev->pcs_data.hosttrap_entry.base.stop_enable, 
false);
+   mutex_unlock(&pdd->dev->pcs_data.mutex);
+   }
+
+   return 0;
 }
 
 static int kfd_pc_sample_create(struct kfd_process_device *pdd,
@@ -233,7 +255,7 @@ int kfd_pc_sample(struct kfd_process_device *pdd,
if (!pcs_entry->enabled)
return -EALREADY;
else
-   return kfd_pc_sample_stop(pdd);
+   return kfd_pc_sample_stop(pdd, pcs_entry);
}
 
return -EINVAL;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index b9a36891d099..0839a0ca3099 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
@@ -271,6 +271,10 @@ struct kfd_dev;
 
 struct kfd_dev_pc_sampling_data {
uint32_t use_count; /* Num of PC sampling sessions */
+   uint32_t active_count;  /* Num of active sessions */
+   uint32_t target_simd;   /* target simd for trap */
+   uint32_t target_wave_slot;  /* target wave slot for trap */
+   bool stop_enable;   /* pc sampling stop in process */
struct idr pc_sampling_idr;
struct kfd_pc_sample_info pc_sample_info;
 };
-- 
2.25.1



[PATCH v2 15/23] drm/amdkfd: trigger pc sampling trap for aldebaran

2023-12-07 Thread James Zhu
Implement trigger pc sampling trap for aldebaran.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
index aff08321e976..27eda75ceecb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c
@@ -163,6 +163,16 @@ static uint32_t kgd_gfx_aldebaran_set_address_watch(
return watch_address_cntl;
 }
 
+static uint32_t kgd_aldebaran_trigger_pc_sample_trap(struct amdgpu_device 
*adev,
+   uint32_t vmid,
+   uint32_t *target_simd,
+   uint32_t *target_wave_slot,
+   enum kfd_ioctl_pc_sample_method 
method)
+{
+   return kgd_gfx_v9_trigger_pc_sample_trap(adev, vmid, 8, 4,
+   target_simd, target_wave_slot, method);
+}
+
 const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.program_sh_mem_settings = kgd_gfx_v9_program_sh_mem_settings,
.set_pasid_vmid_mapping = kgd_gfx_v9_set_pasid_vmid_mapping,
@@ -191,4 +201,5 @@ const struct kfd2kgd_calls aldebaran_kfd2kgd = {
.get_iq_wait_times = kgd_gfx_v9_get_iq_wait_times,
.build_grace_period_packet_info = 
kgd_gfx_v9_build_grace_period_packet_info,
.program_trap_handler_settings = 
kgd_gfx_v9_program_trap_handler_settings,
+   .trigger_pc_sample_trap = kgd_aldebaran_trigger_pc_sample_trap,
 };
-- 
2.25.1



[PATCH v2 19/23] drm/amdkfd: add queue remapping

2023-12-07 Thread James Zhu
Add queue remapping to ensure that any waves executing the PC sampling
part of the trap handler are done before kfd_pc_sample_stop returns,
and that no new waves enter that part of the trap handler afterwards.
This avoids race conditions that could lead to use-after-free. Unmapping
and remapping the queues either waits for the waves to drain, or preempts
them with CWSR, which itself executes a trap and waits for previous traps
to finish.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c | 11 +++
 drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h |  5 +
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c  |  3 +++
 3 files changed, 19 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
index c0e71543389a..a3f57be63f4f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.c
@@ -3155,6 +3155,17 @@ int debug_refresh_runlist(struct device_queue_manager 
*dqm)
return debug_map_and_unlock(dqm);
 }
 
+void remap_queue(struct device_queue_manager *dqm,
+   enum kfd_unmap_queues_filter filter,
+   uint32_t filter_param,
+   uint32_t grace_period)
+{
+   dqm_lock(dqm);
+   if (!dqm->dev->kfd->shared_resources.enable_mes)
+   execute_queues_cpsch(dqm, filter, filter_param, grace_period);
+   dqm_unlock(dqm);
+}
+
 #if defined(CONFIG_DEBUG_FS)
 
 static void seq_reg_dump(struct seq_file *m,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
index cf7e182588f8..f8aae3747a36 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_device_queue_manager.h
@@ -303,6 +303,11 @@ int debug_lock_and_unmap(struct device_queue_manager *dqm);
 int debug_map_and_unlock(struct device_queue_manager *dqm);
 int debug_refresh_runlist(struct device_queue_manager *dqm);
 
+void remap_queue(struct device_queue_manager *dqm,
+   enum kfd_unmap_queues_filter filter,
+   uint32_t filter_param,
+   uint32_t grace_period);
+
 static inline unsigned int get_sh_mem_bases_32(struct kfd_process_device *pdd)
 {
return (pdd->lds_base >> 16) & 0xFF;
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 29a6f9f40f83..7d0722498bf5 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -24,6 +24,7 @@
 #include "kfd_priv.h"
 #include "amdgpu_amdkfd.h"
 #include "kfd_pc_sampling.h"
+#include "kfd_device_queue_manager.h"
 
 struct supported_pc_sample_info {
uint32_t ip_version;
@@ -105,6 +106,8 @@ static int kfd_pc_sample_stop(struct kfd_process_device 
*pdd,
if (pc_sampling_stop) {
kfd_process_set_trap_pc_sampling_flag(&pdd->qpd,

pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
+   remap_queue(pdd->dev->dqm,
+   KFD_UNMAP_QUEUES_FILTER_ALL_QUEUES, 0, 
USE_DEFAULT_GRACE_PERIOD);
 
mutex_lock(&pdd->dev->pcs_data.mutex);
pdd->dev->pcs_data.hosttrap_entry.base.target_simd = 0;
-- 
2.25.1



[PATCH v2 23/23] drm/amdkfd: bump kfd ioctl minor version for pc sampling availability

2023-12-07 Thread James Zhu
Bump the minor version to declare pc sampling feature is now
available.

Signed-off-by: James Zhu 
---
 include/uapi/linux/kfd_ioctl.h | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi/linux/kfd_ioctl.h
index 1bd1347effea..62d8642d3d1c 100644
--- a/include/uapi/linux/kfd_ioctl.h
+++ b/include/uapi/linux/kfd_ioctl.h
@@ -40,9 +40,10 @@
  * - 1.12 - Add DMA buf export ioctl
  * - 1.13 - Add debugger API
  * - 1.14 - Update kfd_event_data
+ * - 1.15 - Add PC Sampling ioctl
  */
 #define KFD_IOCTL_MAJOR_VERSION 1
-#define KFD_IOCTL_MINOR_VERSION 14
+#define KFD_IOCTL_MINOR_VERSION 15
 
 struct kfd_ioctl_get_version_args {
__u32 major_version;/* from KFD */
-- 
2.25.1



[PATCH v2 21/23] drm/amdkfd: add pc sampling thread to trigger trap

2023-12-07 Thread James Zhu
Add a kthread to trigger pc sampling trap.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 68 +++-
 drivers/gpu/drm/amd/amdkfd/kfd_priv.h|  1 +
 2 files changed, 68 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 49b5d4c9f7e0..04cc25c79a76 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -39,6 +39,66 @@ struct supported_pc_sample_info supported_formats[] = {
{ IP_VERSION(9, 4, 2), &sample_info_hosttrap_9_0_0 },
 };
 
+static int kfd_pc_sample_thread(void *param)
+{
+   struct amdgpu_device *adev;
+   struct kfd_node *node = param;
+   uint32_t timeout = 0;
+
+   mutex_lock(&node->pcs_data.mutex);
+   if (node->pcs_data.hosttrap_entry.base.active_count &&
+   node->pcs_data.hosttrap_entry.base.pc_sample_info.interval &&
+   node->kfd2kgd->trigger_pc_sample_trap) {
+   switch (node->pcs_data.hosttrap_entry.base.pc_sample_info.type) 
{
+   case KFD_IOCTL_PCS_TYPE_TIME_US:
+   timeout = 
(uint32_t)node->pcs_data.hosttrap_entry.base.pc_sample_info.interval;
+   break;
+   default:
+   pr_debug("PC Sampling type %d not supported.",
+   
node->pcs_data.hosttrap_entry.base.pc_sample_info.type);
+   }
+   }
+   mutex_unlock(&node->pcs_data.mutex);
+   if (!timeout)
+   return -EINVAL;
+
+   adev = node->adev;
+
+   allow_signal(SIGKILL);
+   while (!kthread_should_stop() ||
+   
!READ_ONCE(node->pcs_data.hosttrap_entry.base.stop_enable)) {
+   node->kfd2kgd->trigger_pc_sample_trap(adev, 
node->vm_info.last_vmid_kfd,
+   &node->pcs_data.hosttrap_entry.base.target_simd,
+   
&node->pcs_data.hosttrap_entry.base.target_wave_slot,
+   
node->pcs_data.hosttrap_entry.base.pc_sample_info.method);
+   pr_debug_ratelimited("triggered a host trap.");
+
+   if 
(signal_pending(node->pcs_data.hosttrap_entry.base.pc_sample_thread))
+   break;
+   usleep_range(timeout, timeout + 10);
+   }
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL;
+
+   return 0;
+}
+
+static int kfd_pc_sample_thread_start(struct kfd_node *node)
+{
+   char thread_name[32];
+   int ret = 0;
+
+   snprintf(thread_name, 32, "pc_sampling_%08x", node->id);
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread =
+   kthread_run(kfd_pc_sample_thread, node, thread_name);
+   if (IS_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread)) {
+   ret = 
PTR_ERR(node->pcs_data.hosttrap_entry.base.pc_sample_thread);
+   node->pcs_data.hosttrap_entry.base.pc_sample_thread = NULL;
+   pr_debug("Failed to create pc sample thread for %s.\n", 
thread_name);
+   }
+
+   return ret;
+}
+
 static int kfd_pc_sample_query_cap(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*user_args)
 {
@@ -88,6 +148,7 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd,
struct pc_sampling_entry *pcs_entry)
 {
bool pc_sampling_start = false;
+   int ret = 0;
 
pcs_entry->enabled = true;
mutex_lock(&pdd->dev->pcs_data.mutex);
@@ -102,11 +163,13 @@ static int kfd_pc_sample_start(struct kfd_process_device 
*pdd,
} else {
kfd_process_set_trap_pc_sampling_flag(&pdd->qpd,

pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, true);
+   if 
(!pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread)
+   ret = kfd_pc_sample_thread_start(pdd->dev);
break;
}
}
 
-   return 0;
+   return ret;
 }
 
 static int kfd_pc_sample_stop(struct kfd_process_device *pdd,
@@ -124,6 +187,9 @@ static int kfd_pc_sample_stop(struct kfd_process_device 
*pdd,
mutex_unlock(&pdd->dev->pcs_data.mutex);
 
if (pc_sampling_stop) {
+   
kthread_stop(pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread);
+   while (pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_thread)
+   usleep_range(1000, 2000);
kfd_process_set_trap_pc_sampling_flag(&pdd->qpd,

pdd->dev->pcs_data.hosttrap_entry.base.pc_sample_info.method, false);
remap_queue(pdd->dev->dqm,
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_priv.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_priv.h
index 0839a0ca3099..f69d76c7a388 100644
--- a/dr

[PATCH v2 12/23] drm/amdgpu: use trapID 4 for host trap

2023-12-07 Thread James Zhu
Since TRAPSTS.HOST_TRAP won't work pre-gfx943, so use
TTMP1 (bit 24: HT) and (bit 16-23: trapID) to identify
the host trap.

Signed-off-by: James Zhu 
---
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c |2 +
 .../gpu/drm/amd/amdkfd/cwsr_trap_handler.h| 2117 +
 .../drm/amd/amdkfd/cwsr_trap_handler_gfx9.asm |5 +
 3 files changed, 1070 insertions(+), 1054 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
index 7d8c0e13ac12..adfe5e5585e5 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gfx_v9.c
@@ -1162,6 +1162,8 @@ uint32_t kgd_gfx_v9_trigger_pc_sample_trap(struct 
amdgpu_device *adev,
value = REG_SET_FIELD(value, SQ_CMD, SIMD_ID, *target_simd);
/* select *target_wave_slot */
value = REG_SET_FIELD(value, SQ_CMD, WAVE_ID, 
(*target_wave_slot)++);
+   /* set TrapID 4 for HOSTTRAP */
+   value = REG_SET_FIELD(value, SQ_CMD, DATA, 0x4);
 
mutex_lock(&adev->grbm_idx_mutex);
amdgpu_gfx_select_se_sh(adev, 0x, 0x, 
0x, 0);
diff --git a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h 
b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
index 747426bd5181..44955838f307 100644
--- a/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
+++ b/drivers/gpu/drm/amd/amdkfd/cwsr_trap_handler.h
@@ -274,155 +274,263 @@ static const uint32_t cwsr_trap_gfx8_hex[] = {
 
 
 static const uint32_t cwsr_trap_gfx9_hex[] = {
-   0xbf820001, 0xbf82025e,
+   0xbf820001, 0xbf820263,
0xb8f8f802, 0x8978ff78,
0x00020006, 0xb8fbf803,
0x866eff78, 0x2000,
0xbf840009, 0x866eff6d,
-   0x00ff, 0xbf85001e,
+   0x00ff, 0xbf850023,
0x866eff7b, 0x0400,
-   0xbf85005b, 0xbf8e0010,
+   0xbf850060, 0xbf8e0010,
0xb8fbf803, 0xbf82fffa,
0x866eff7b, 0x03c00900,
-   0xbf850015, 0x866eff7b,
-   0x71ff, 0xbf840008,
-   0x866fff7b, 0x7080,
-   0xbf840001, 0xbeee1a87,
-   0xb8eff801, 0x8e6e8c6e,
-   0x866e6f6e, 0xbf85000a,
-   0x866eff6d, 0x00ff,
-   0xbf850007, 0xb8eef801,
-   0x866eff6e, 0x0800,
-   0xbf850003, 0x866eff7b,
-   0x0400, 0xbf850040,
-   0xb8faf807, 0x867aff7a,
-   0x001f8000, 0x8e7a8b7a,
-   0x8977ff77, 0xfc00,
-   0x8a77, 0xba7ff807,
-   0x, 0xb8faf812,
-   0xb8fbf813, 0x8efa887a,
-   0xbf0d8f7b, 0xbf840002,
-   0x877bff7b, 0x,
-   0xc0031c3d, 0x0010,
-   0xc0071bbd, 0x,
-   0xc0071ebd, 0x0008,
-   0xbf8cc07f, 0x8671ff6d,
-   0x0100, 0xbf840004,
-   0x92f1ff70, 0x00010001,
-   0xbf840016, 0xbf820005,
-   0x86708170, 0x8e709770,
-   0x8977ff77, 0x0080,
-   0x8077, 0x86ee6e6e,
-   0xbf840001, 0xbe801d6e,
-   0x866eff6d, 0x01ff,
-   0xbf850005, 0x8778ff78,
-   0x2000, 0x80ec886c,
-   0x82ed806d, 0xbf820005,
-   0x866eff6d, 0x0100,
-   0xbf850002, 0x806c846c,
-   0x826d806d, 0x866dff6d,
-   0x, 0x8f7a8b77,
+   0xbf85001a, 0x866eff6d,
+   0x01ff, 0xbf06ff6e,
+   0x0104, 0xbf850015,
+   0x866eff7b, 0x71ff,
+   0xbf840008, 0x866fff7b,
+   0x7080, 0xbf840001,
+   0xbeee1a87, 0xb8eff801,
+   0x8e6e8c6e, 0x866e6f6e,
+   0xbf85000a, 0x866eff6d,
+   0x00ff, 0xbf850007,
+   0xb8eef801, 0x866eff6e,
+   0x0800, 0xbf850003,
+   0x866eff7b, 0x0400,
+   0xbf850040, 0xb8faf807,
0x867aff7a, 0x001f8000,
-   0xb97af807, 0x86fe7e7e,
-   0x86ea6a6a, 0x8f6e8378,
-   0xb96ee0c2, 0xbf82,
-   0xb9780002, 0xbe801f6c,
+   0x8e7a8b7a, 0x8977ff77,
+   0xfc00, 0x8a77,
+   0xba7ff807, 0x,
+   0xb8faf812, 0xb8fbf813,
+   0x8efa887a, 0xbf0d8f7b,
+   0xbf840002, 0x877bff7b,
+   0x, 0xc0031c3d,
+   0x0010, 0xc0071bbd,
+   0x, 0xc0071ebd,
+   0x0008, 0xbf8cc07f,
+   0x8671ff6d, 0x0100,
+   0xbf840004, 0x92f1ff70,
+   0x00010001, 0xbf840016,
+   0xbf820005, 0x86708170,
+   0x8e709770, 0x8977ff77,
+   0x0080, 0x8077,
+   0x86ee6e6e, 0xbf840001,
+   0xbe801d6e, 0x866eff6d,
+   0x01ff, 0xbf850005,
+   0x8778ff78, 0x2000,
+   0x80ec886c, 0x82ed806d,
+   0xbf820005, 0x866eff6d,
+   0x0100, 0xbf850002,
+   0x806c846c, 0x826d806d,
0x866dff6d, 0x,
-   0xbefa0080, 0xb97a0283,
-   0xb8faf807, 0x867aff7a,
-   0x001f8000, 0x8e7a8b7a,
-   0x8977ff77, 0xfc00,
-   0x8a77, 0xba7ff807,
-   0x, 0xbeee007e,
-   0xbeef007f, 0xbefe0180,
-   0xbf94, 0x877a8478,
-   0xb97af802, 0xbf8e0002,
-   0xbf88fffe, 0xb8fa2a05,
-   0x807a817a, 0x8e7a8a7a,
- 

[PATCH v2 22/23] drm/amdkfd: add pc sampling release when process release

2023-12-07 Thread James Zhu
Add pc sampling release when process release, it will force to
stop all activate sessions with this process.

Signed-off-by: James Zhu 
---
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c | 21 
 drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h |  1 +
 drivers/gpu/drm/amd/amdkfd/kfd_process.c |  3 +++
 3 files changed, 25 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
index 04cc25c79a76..a05dd8b1a7da 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.c
@@ -300,6 +300,27 @@ static int kfd_pc_sample_destroy(struct kfd_process_device 
*pdd, uint32_t trace_
return 0;
 }
 
+void kfd_pc_sample_release(struct kfd_process_device *pdd)
+{
+   struct pc_sampling_entry *pcs_entry;
+   struct idr *idp;
+   uint32_t id;
+
+   /* force to release all PC sampling task for this process */
+   idp = &pdd->dev->pcs_data.hosttrap_entry.base.pc_sampling_idr;
+   mutex_lock(&pdd->dev->pcs_data.mutex);
+   idr_for_each_entry(idp, pcs_entry, id) {
+   if (pcs_entry->pdd != pdd)
+   continue;
+   mutex_unlock(&pdd->dev->pcs_data.mutex);
+   if (pcs_entry->enabled)
+   kfd_pc_sample_stop(pdd, pcs_entry);
+   kfd_pc_sample_destroy(pdd, id, pcs_entry);
+   mutex_lock(&pdd->dev->pcs_data.mutex);
+   }
+   mutex_unlock(&pdd->dev->pcs_data.mutex);
+}
+
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args)
 {
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h 
b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
index 4eeded4ea5b6..6175563ca9be 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_pc_sampling.h
@@ -30,5 +30,6 @@
 
 int kfd_pc_sample(struct kfd_process_device *pdd,
struct kfd_ioctl_pc_sample_args __user 
*args);
+void kfd_pc_sample_release(struct kfd_process_device *pdd);
 
 #endif /* KFD_PC_SAMPLING_H_ */
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_process.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
index 6bc9dcfad484..1f8d6098dfb2 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_process.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_process.c
@@ -43,6 +43,7 @@ struct mm_struct;
 #include "kfd_svm.h"
 #include "kfd_smi_events.h"
 #include "kfd_debug.h"
+#include "kfd_pc_sampling.h"
 
 /*
  * List of struct kfd_process (field kfd_process).
@@ -1021,6 +1022,8 @@ static void kfd_process_destroy_pdds(struct kfd_process 
*p)
pr_debug("Releasing pdd (topology id %d) for process (pasid 
0x%x)\n",
pdd->dev->id, p->pasid);
 
+   kfd_pc_sample_release(pdd);
+
kfd_process_device_destroy_cwsr_dgpu(pdd);
kfd_process_device_destroy_ib_mem(pdd);
 
-- 
2.25.1



[PATCH] drm/amdgpu: xgmi_fill_topology_info

2023-12-07 Thread Vignesh Chander
1. Use the mirrored topology info to fill links for VF.
The new solution is required to simplify and optimize host driver logic.
Only use the new solution for VFs that support full duplex and
extended_peer_link_info otherwise the info would be incomplete.

2. avoid calling extended_link_info on VF as its not supported

Signed-off-by: Vignesh Chander 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c  |  4 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c | 58 
 2 files changed, 52 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
index a21045d018f2..1bf975b8d083 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c
@@ -1433,8 +1433,8 @@ int psp_xgmi_get_topology_info(struct psp_context *psp,
 get_extended_data) ||
amdgpu_ip_version(psp->adev, MP0_HWIP, 0) ==
IP_VERSION(13, 0, 6);
-   bool ta_port_num_support = psp->xgmi_context.xgmi_ta_caps &
-   EXTEND_PEER_LINK_INFO_CMD_FLAG;
+   bool ta_port_num_support = amdgpu_sriov_vf(psp->adev) ? 0 :
+   psp->xgmi_context.xgmi_ta_caps & 
EXTEND_PEER_LINK_INFO_CMD_FLAG;
 
/* popluate the shared output buffer rather than the cmd input 
buffer
 * with node_ids as the input for GET_PEER_LINKS command 
execution.
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
index 44d8c1a11e1b..dd82d73daed6 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_xgmi.c
@@ -823,6 +823,28 @@ static int 
amdgpu_xgmi_initialize_hive_get_data_partition(struct amdgpu_hive_inf
return 0;
 }
 
+void amdgpu_xgmi_fill_topology_info(struct amdgpu_device *adev,
+   struct amdgpu_device *peer_adev)
+{
+   struct psp_xgmi_topology_info *top_info = 
&adev->psp.xgmi_context.top_info;
+   struct psp_xgmi_topology_info *peer_info = 
&peer_adev->psp.xgmi_context.top_info;
+
+   for (int i = 0; i < peer_info->num_nodes; i++) {
+   if (peer_info->nodes[i].node_id == adev->gmc.xgmi.node_id) {
+   for (int j = 0; j < top_info->num_nodes; j++) {
+   if (top_info->nodes[j].node_id == 
peer_adev->gmc.xgmi.node_id) {
+   peer_info->nodes[i].num_hops = 
top_info->nodes[j].num_hops;
+   peer_info->nodes[i].is_sharing_enabled =
+   
top_info->nodes[j].is_sharing_enabled;
+   peer_info->nodes[i].num_links =
+   
top_info->nodes[j].num_links;
+   return;
+   }
+   }
+   }
+   }
+}
+
 int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
 {
struct psp_xgmi_topology_info *top_info;
@@ -897,18 +919,38 @@ int amdgpu_xgmi_add_device(struct amdgpu_device *adev)
goto exit_unlock;
}
 
-   /* get latest topology info for each device from psp */
-   list_for_each_entry(tmp_adev, &hive->device_list, 
gmc.xgmi.head) {
-   ret = psp_xgmi_get_topology_info(&tmp_adev->psp, count,
-   &tmp_adev->psp.xgmi_context.top_info, 
false);
+   if (amdgpu_sriov_vf(adev) &&
+   psp->xgmi_context.xgmi_ta_caps & 
EXTEND_PEER_LINK_INFO_CMD_FLAG) {
+   /* only get topology for VF being init if it can 
support full duplex */
+   ret = psp_xgmi_get_topology_info(&adev->psp, count,
+   
&adev->psp.xgmi_context.top_info, false);
if (ret) {
-   dev_err(tmp_adev->dev,
+   dev_err(adev->dev,
"XGMI: Get topology failure on device 
%llx, hive %llx, ret %d",
-   tmp_adev->gmc.xgmi.node_id,
-   tmp_adev->gmc.xgmi.hive_id, ret);
-   /* To do : continue with some node failed or 
disable the whole hive */
+   adev->gmc.xgmi.node_id,
+   adev->gmc.xgmi.hive_id, ret);
+   /* To do: continue with some node failed or 
disable the whole hive*/
goto exit_unlock;
}
+
+   /* fill the topology info for peers instead of getting 
from PSP */
+   list_for_each_entry(tmp_adev, &hive->device_list, 
gmc.xgmi