There is no hole on 32-bit unfortunately. It looks like the hole on 64-bit
is now ABI.

I moved the field to replace _pad1. The patch is attached (with your Rb).

Marek

On Fri, Jan 13, 2023 at 4:20 PM Alex Deucher <alexdeuc...@gmail.com> wrote:

> On Fri, Jan 13, 2023 at 4:02 PM Marek Olšák <mar...@gmail.com> wrote:
> >
> > i've added the comments and indeed pahole shows the hole as expected.
>
> What about on 32-bit?
>
> Alex
>
> >
> > Marek
> >
> > On Thu, Jan 12, 2023 at 11:44 AM Alex Deucher <alexdeuc...@gmail.com>
> wrote:
> >>
> >> On Thu, Jan 12, 2023 at 6:50 AM Christian König
> >> <christian.koe...@amd.com> wrote:
> >> >
> >> > Am 11.01.23 um 21:48 schrieb Alex Deucher:
> >> > > On Wed, Jan 4, 2023 at 3:17 PM Marek Olšák <mar...@gmail.com>
> wrote:
> >> > >> Yes, it's meant to be like a spec sheet. We are not interested in
> the current bandwidth utilization.
> >> > > After chatting with Marek on IRC and thinking about this more, I
> think
> >> > > this patch is fine.  It's not really meant for bandwidth per se, but
> >> > > rather as a limit to determine what the driver should do in certain
> >> > > cases (i.e., when does it make sense to copy to vram vs not).  It's
> >> > > not straightforward for userspace to parse the full topology to
> >> > > determine what links may be slow.  I guess one potential pitfall
> would
> >> > > be that if you pass the device into a VM, the driver may report the
> >> > > wrong values.  Generally in a VM the VM doesn't get the full view up
> >> > > to the root port.  I don't know if the hypervisors report properly
> for
> >> > > pcie_bandwidth_available() in a VM or if it just shows the info
> about
> >> > > the endpoint in the VM.
> >> >
> >> > So this basically doesn't return the gen and lanes of the device, but
> >> > rather what was negotiated between the device and the upstream root
> port?
> >>
> >> Correct. It exposes the max gen and lanes of the slowest link between
> >> the device and the root port.
> >>
> >> >
> >> > If I got that correctly then we should probably document that cause
> >> > otherwise somebody will try to "fix" it at some time.
> >>
> >> Good point.
> >>
> >> Alex
> >>
> >> >
> >> > Christian.
> >> >
> >> > >
> >> > > Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>
> >> > >
> >> > > Alex
> >> > >
> >> > >> Marek
> >> > >>
> >> > >> On Wed, Jan 4, 2023 at 10:33 AM Lazar, Lijo <lijo.la...@amd.com>
> wrote:
> >> > >>> [AMD Official Use Only - General]
> >> > >>>
> >> > >>>
> >> > >>> To clarify, with DPM in place, the current bandwidth will be
> changing based on the load.
> >> > >>>
> >> > >>> If apps/umd already has a way to know the current bandwidth
> utilisation, then possible maximum also could be part of the same API.
> Otherwise, this only looks like duplicate information. We have the same
> information in sysfs DPM nodes.
> >> > >>>
> >> > >>> BTW, I don't know to what extent app/umd really makes use of
> this. Take that memory frequency as an example (I'm reading it as 16GHz).
> It only looks like a spec sheet.
> >> > >>>
> >> > >>> Thanks,
> >> > >>> Lijo
> >> > >>> ________________________________
> >> > >>> From: Marek Olšák <mar...@gmail.com>
> >> > >>> Sent: Wednesday, January 4, 2023 8:40:00 PM
> >> > >>> To: Lazar, Lijo <lijo.la...@amd.com>
> >> > >>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org>
> >> > >>> Subject: Re: [PATCH 1/2] drm/amdgpu: return the PCIe gen and
> lanes from the INFO
> >> > >>>
> >> > >>> On Wed, Jan 4, 2023 at 9:19 AM Lazar, Lijo <lijo.la...@amd.com>
> wrote:
> >> > >>>
> >> > >>>
> >> > >>>
> >> > >>> On 1/4/2023 7:43 PM, Marek Olšák wrote:
> >> > >>>> On Wed, Jan 4, 2023 at 6:50 AM Lazar, Lijo <lijo.la...@amd.com
> >> > >>>> <mailto:lijo.la...@amd.com>> wrote:
> >> > >>>>
> >> > >>>>
> >> > >>>>
> >> > >>>>      On 1/4/2023 4:11 AM, Marek Olšák wrote:
> >> > >>>>       > I see. Well, those sysfs files are not usable, and I
> don't think it
> >> > >>>>       > would be important even if they were usable, but for
> completeness:
> >> > >>>>       >
> >> > >>>>       > The ioctl returns:
> >> > >>>>       >      pcie_gen = 1
> >> > >>>>       >      pcie_num_lanes = 16
> >> > >>>>       >
> >> > >>>>       > Theoretical bandwidth from those values: 4.0 GB/s
> >> > >>>>       > My DMA test shows this write bandwidth: 3.5 GB/s
> >> > >>>>       > It matches the expectation.
> >> > >>>>       >
> >> > >>>>       > Let's see the devices (there is only 1 GPU Navi21 in the
> system):
> >> > >>>>       > $ lspci |egrep '(PCI|VGA).*Navi'
> >> > >>>>       > 0a:00.0 PCI bridge: Advanced Micro Devices, Inc.
> [AMD/ATI] Navi
> >> > >>>>      10 XL
> >> > >>>>       > Upstream Port of PCI Express Switch (rev c3)
> >> > >>>>       > 0b:00.0 PCI bridge: Advanced Micro Devices, Inc.
> [AMD/ATI] Navi
> >> > >>>>      10 XL
> >> > >>>>       > Downstream Port of PCI Express Switch
> >> > >>>>       > 0c:00.0 VGA compatible controller: Advanced Micro
> Devices, Inc.
> >> > >>>>       > [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT]
> (rev c3)
> >> > >>>>       >
> >> > >>>>       > Let's read sysfs:
> >> > >>>>       >
> >> > >>>>       > $ cat
> /sys/bus/pci/devices/0000:0a:00.0/current_link_width
> >> > >>>>       > 16
> >> > >>>>       > $ cat
> /sys/bus/pci/devices/0000:0b:00.0/current_link_width
> >> > >>>>       > 16
> >> > >>>>       > $ cat
> /sys/bus/pci/devices/0000:0c:00.0/current_link_width
> >> > >>>>       > 16
> >> > >>>>       > $ cat
> /sys/bus/pci/devices/0000:0a:00.0/current_link_speed
> >> > >>>>       > 2.5 GT/s PCIe
> >> > >>>>       > $ cat
> /sys/bus/pci/devices/0000:0b:00.0/current_link_speed
> >> > >>>>       > 16.0 GT/s PCIe
> >> > >>>>       > $ cat
> /sys/bus/pci/devices/0000:0c:00.0/current_link_speed
> >> > >>>>       > 16.0 GT/s PCIe
> >> > >>>>       >
> >> > >>>>       > Problem 1: None of the speed numbers match 4 GB/s.
> >> > >>>>
> >> > >>>>      US bridge = 2.5GT/s means operating at PCIe Gen 1 speed.
> Total
> >> > >>>>      theoretical bandwidth is then derived based on encoding and
> total
> >> > >>>>      number
> >> > >>>>      of lanes.
> >> > >>>>
> >> > >>>>       > Problem 2: Userspace doesn't know the bus index of the
> bridges,
> >> > >>>>      and it's
> >> > >>>>       > not clear which bridge should be used.
> >> > >>>>
> >> > >>>>      In general, modern ones have this arch= US->DS->EP. US is
> the one
> >> > >>>>      connected to physical link.
> >> > >>>>
> >> > >>>>       > Problem 3: The PCIe gen number is missing.
> >> > >>>>
> >> > >>>>      Current link speed is based on whether it's Gen1/2/3/4/5.
> >> > >>>>
> >> > >>>>      BTW, your patch makes use of capabilities flags which gives
> the maximum
> >> > >>>>      supported speed/width by the device. It may not necessarily
> reflect the
> >> > >>>>      current speed/width negotiated. I guess in NV, this info is
> already
> >> > >>>>      obtained from PMFW and made available through metrics table.
> >> > >>>>
> >> > >>>>
> >> > >>>> It computes the minimum of the device PCIe gen and the
> motherboard/slot
> >> > >>>> PCIe gen to get the final value. These 2 lines do that. The low
> 16 bits
> >> > >>>> of the mask contain the device PCIe gen mask. The high 16 bits
> of the
> >> > >>>> mask contain the slot PCIe gen mask.
> >> > >>>> + pcie_gen_mask = adev->pm.pcie_gen_mask &
> (adev->pm.pcie_gen_mask >> 16);
> >> > >>>> + dev_info->pcie_gen = fls(pcie_gen_mask);
> >> > >>>>
> >> > >>> With DPM in place on some ASICs, how much does this static info
> help for
> >> > >>> upper level apps?
> >> > >>>
> >> > >>>
> >> > >>> It helps UMDs make better decisions if they know the maximum
> achievable bandwidth. UMDs also compute the maximum memory bandwidth and
> compute performance (FLOPS). Right now it's printed by Mesa to give users
> detailed information about their GPU. For example:
> >> > >>>
> >> > >>> $ AMD_DEBUG=info glxgears
> >> > >>> Device info:
> >> > >>>      name = NAVI21
> >> > >>>      marketing_name = AMD Radeon RX 6800
> >> > >>>      num_se = 3
> >> > >>>      num_rb = 12
> >> > >>>      num_cu = 60
> >> > >>>      max_gpu_freq = 2475 MHz
> >> > >>>      max_gflops = 19008 GFLOPS
> >> > >>>      l0_cache_size = 16 KB
> >> > >>>      l1_cache_size = 128 KB
> >> > >>>      l2_cache_size = 4096 KB
> >> > >>>      l3_cache_size = 128 MB
> >> > >>>      memory_channels = 16 (TCC blocks)
> >> > >>>      memory_size = 16 GB (16384 MB)
> >> > >>>      memory_freq = 16 GHz
> >> > >>>      memory_bus_width = 256 bits
> >> > >>>      memory_bandwidth = 512 GB/s
> >> > >>>      pcie_gen = 1
> >> > >>>      pcie_num_lanes = 16
> >> > >>>      pcie_bandwidth = 4.0 GB/s
> >> > >>>
> >> > >>> Marek
> >> >
>
From 6220395fb0b9c10c92ea67b80e09120e6f92a499 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.ol...@amd.com>
Date: Sat, 24 Dec 2022 17:44:26 -0500
Subject: [PATCH] drm/amdgpu: return the PCIe gen and lanes from the INFO ioctl
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

For computing PCIe bandwidth in userspace and troubleshooting PCIe
bandwidth issues.

For example, my Navi21 has been limited to PCIe gen 1 and this is
the first time I noticed it after 2 years.

Note that this intentionally fills a hole and padding
in drm_amdgpu_info_device.

Signed-off-by: Marek Olšák <marek.ol...@amd.com>
Reviewed-by: Alex Deucher <alexander.deuc...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c |  3 ++-
 drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 14 +++++++++++++-
 include/uapi/drm/amdgpu_drm.h           |  6 ++++--
 3 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 06aba201d4db..a75dba2caeca 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -106,9 +106,10 @@
  * - 3.49.0 - Add gang submit into CS IOCTL
  * - 3.50.0 - Update AMDGPU_INFO_DEV_INFO IOCTL for minimum engine and memory clock
  *            Update AMDGPU_INFO_SENSOR IOCTL for PEAK_PSTATE engine and memory clock
+ *   3.51.0 - Return the PCIe gen and lanes from the INFO ioctl
  */
 #define KMS_DRIVER_MAJOR	3
-#define KMS_DRIVER_MINOR	50
+#define KMS_DRIVER_MINOR	51
 #define KMS_DRIVER_PATCHLEVEL	0
 
 unsigned int amdgpu_vram_limit = UINT_MAX;
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
index 903e8770e275..fba306e0ef87 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c
@@ -42,6 +42,7 @@
 #include "amdgpu_gem.h"
 #include "amdgpu_display.h"
 #include "amdgpu_ras.h"
+#include "amd_pcie.h"
 
 void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev)
 {
@@ -766,6 +767,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 	case AMDGPU_INFO_DEV_INFO: {
 		struct drm_amdgpu_info_device *dev_info;
 		uint64_t vm_size;
+		uint32_t pcie_gen_mask;
 		int ret;
 
 		dev_info = kzalloc(sizeof(*dev_info), GFP_KERNEL);
@@ -798,7 +800,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 		dev_info->num_rb_pipes = adev->gfx.config.max_backends_per_se *
 			adev->gfx.config.max_shader_engines;
 		dev_info->num_hw_gfx_contexts = adev->gfx.config.max_hw_contexts;
-		dev_info->_pad = 0;
 		dev_info->ids_flags = 0;
 		if (adev->flags & AMD_IS_APU)
 			dev_info->ids_flags |= AMDGPU_IDS_FLAGS_FUSION;
@@ -852,6 +853,17 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp)
 
 		dev_info->tcc_disabled_mask = adev->gfx.config.tcc_disabled_mask;
 
+		/* Combine the chip gen mask with the platform (CPU/mobo) mask. */
+		pcie_gen_mask = adev->pm.pcie_gen_mask & (adev->pm.pcie_gen_mask >> 16);
+		dev_info->pcie_gen = fls(pcie_gen_mask);
+		dev_info->pcie_num_lanes =
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X32 ? 32 :
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X16 ? 16 :
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X12 ? 12 :
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X8 ? 8 :
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X4 ? 4 :
+			adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X2 ? 2 : 1;
+
 		ret = copy_to_user(out, dev_info,
 				   min((size_t)size, sizeof(*dev_info))) ? -EFAULT : 0;
 		kfree(dev_info);
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index fe7f871e3080..973af6d06626 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -1053,7 +1053,8 @@ struct drm_amdgpu_info_device {
 	__u32 enabled_rb_pipes_mask;
 	__u32 num_rb_pipes;
 	__u32 num_hw_gfx_contexts;
-	__u32 _pad;
+	/* PCIe version (the smaller of the GPU and the CPU/motherboard) */
+	__u32 pcie_gen;
 	__u64 ids_flags;
 	/** Starting virtual address for UMDs. */
 	__u64 virtual_address_offset;
@@ -1100,7 +1101,8 @@ struct drm_amdgpu_info_device {
 	__u32 gs_prim_buffer_depth;
 	/* max gs wavefront per vgt*/
 	__u32 max_gs_waves_per_vgt;
-	__u32 _pad1;
+	/* PCIe number of lanes (the smaller of the GPU and the CPU/motherboard) */
+	__u32 pcie_num_lanes;
 	/* always on cu bitmap */
 	__u32 cu_ao_bitmap[4][4];
 	/** Starting high virtual address for UMDs. */
-- 
2.34.1

Reply via email to