There is no hole on 32-bit unfortunately. It looks like the hole on 64-bit is now ABI.
I moved the field to replace _pad1. The patch is attached (with your Rb). Marek On Fri, Jan 13, 2023 at 4:20 PM Alex Deucher <alexdeuc...@gmail.com> wrote: > On Fri, Jan 13, 2023 at 4:02 PM Marek Olšák <mar...@gmail.com> wrote: > > > > i've added the comments and indeed pahole shows the hole as expected. > > What about on 32-bit? > > Alex > > > > > Marek > > > > On Thu, Jan 12, 2023 at 11:44 AM Alex Deucher <alexdeuc...@gmail.com> > wrote: > >> > >> On Thu, Jan 12, 2023 at 6:50 AM Christian König > >> <christian.koe...@amd.com> wrote: > >> > > >> > Am 11.01.23 um 21:48 schrieb Alex Deucher: > >> > > On Wed, Jan 4, 2023 at 3:17 PM Marek Olšák <mar...@gmail.com> > wrote: > >> > >> Yes, it's meant to be like a spec sheet. We are not interested in > the current bandwidth utilization. > >> > > After chatting with Marek on IRC and thinking about this more, I > think > >> > > this patch is fine. It's not really meant for bandwidth per se, but > >> > > rather as a limit to determine what the driver should do in certain > >> > > cases (i.e., when does it make sense to copy to vram vs not). It's > >> > > not straightforward for userspace to parse the full topology to > >> > > determine what links may be slow. I guess one potential pitfall > would > >> > > be that if you pass the device into a VM, the driver may report the > >> > > wrong values. Generally in a VM the VM doesn't get the full view up > >> > > to the root port. I don't know if the hypervisors report properly > for > >> > > pcie_bandwidth_available() in a VM or if it just shows the info > about > >> > > the endpoint in the VM. > >> > > >> > So this basically doesn't return the gen and lanes of the device, but > >> > rather what was negotiated between the device and the upstream root > port? > >> > >> Correct. It exposes the max gen and lanes of the slowest link between > >> the device and the root port. > >> > >> > > >> > If I got that correctly then we should probably document that cause > >> > otherwise somebody will try to "fix" it at some time. > >> > >> Good point. > >> > >> Alex > >> > >> > > >> > Christian. > >> > > >> > > > >> > > Reviewed-by: Alex Deucher <alexander.deuc...@amd.com> > >> > > > >> > > Alex > >> > > > >> > >> Marek > >> > >> > >> > >> On Wed, Jan 4, 2023 at 10:33 AM Lazar, Lijo <lijo.la...@amd.com> > wrote: > >> > >>> [AMD Official Use Only - General] > >> > >>> > >> > >>> > >> > >>> To clarify, with DPM in place, the current bandwidth will be > changing based on the load. > >> > >>> > >> > >>> If apps/umd already has a way to know the current bandwidth > utilisation, then possible maximum also could be part of the same API. > Otherwise, this only looks like duplicate information. We have the same > information in sysfs DPM nodes. > >> > >>> > >> > >>> BTW, I don't know to what extent app/umd really makes use of > this. Take that memory frequency as an example (I'm reading it as 16GHz). > It only looks like a spec sheet. > >> > >>> > >> > >>> Thanks, > >> > >>> Lijo > >> > >>> ________________________________ > >> > >>> From: Marek Olšák <mar...@gmail.com> > >> > >>> Sent: Wednesday, January 4, 2023 8:40:00 PM > >> > >>> To: Lazar, Lijo <lijo.la...@amd.com> > >> > >>> Cc: amd-gfx@lists.freedesktop.org <amd-gfx@lists.freedesktop.org> > >> > >>> Subject: Re: [PATCH 1/2] drm/amdgpu: return the PCIe gen and > lanes from the INFO > >> > >>> > >> > >>> On Wed, Jan 4, 2023 at 9:19 AM Lazar, Lijo <lijo.la...@amd.com> > wrote: > >> > >>> > >> > >>> > >> > >>> > >> > >>> On 1/4/2023 7:43 PM, Marek Olšák wrote: > >> > >>>> On Wed, Jan 4, 2023 at 6:50 AM Lazar, Lijo <lijo.la...@amd.com > >> > >>>> <mailto:lijo.la...@amd.com>> wrote: > >> > >>>> > >> > >>>> > >> > >>>> > >> > >>>> On 1/4/2023 4:11 AM, Marek Olšák wrote: > >> > >>>> > I see. Well, those sysfs files are not usable, and I > don't think it > >> > >>>> > would be important even if they were usable, but for > completeness: > >> > >>>> > > >> > >>>> > The ioctl returns: > >> > >>>> > pcie_gen = 1 > >> > >>>> > pcie_num_lanes = 16 > >> > >>>> > > >> > >>>> > Theoretical bandwidth from those values: 4.0 GB/s > >> > >>>> > My DMA test shows this write bandwidth: 3.5 GB/s > >> > >>>> > It matches the expectation. > >> > >>>> > > >> > >>>> > Let's see the devices (there is only 1 GPU Navi21 in the > system): > >> > >>>> > $ lspci |egrep '(PCI|VGA).*Navi' > >> > >>>> > 0a:00.0 PCI bridge: Advanced Micro Devices, Inc. > [AMD/ATI] Navi > >> > >>>> 10 XL > >> > >>>> > Upstream Port of PCI Express Switch (rev c3) > >> > >>>> > 0b:00.0 PCI bridge: Advanced Micro Devices, Inc. > [AMD/ATI] Navi > >> > >>>> 10 XL > >> > >>>> > Downstream Port of PCI Express Switch > >> > >>>> > 0c:00.0 VGA compatible controller: Advanced Micro > Devices, Inc. > >> > >>>> > [AMD/ATI] Navi 21 [Radeon RX 6800/6800 XT / 6900 XT] > (rev c3) > >> > >>>> > > >> > >>>> > Let's read sysfs: > >> > >>>> > > >> > >>>> > $ cat > /sys/bus/pci/devices/0000:0a:00.0/current_link_width > >> > >>>> > 16 > >> > >>>> > $ cat > /sys/bus/pci/devices/0000:0b:00.0/current_link_width > >> > >>>> > 16 > >> > >>>> > $ cat > /sys/bus/pci/devices/0000:0c:00.0/current_link_width > >> > >>>> > 16 > >> > >>>> > $ cat > /sys/bus/pci/devices/0000:0a:00.0/current_link_speed > >> > >>>> > 2.5 GT/s PCIe > >> > >>>> > $ cat > /sys/bus/pci/devices/0000:0b:00.0/current_link_speed > >> > >>>> > 16.0 GT/s PCIe > >> > >>>> > $ cat > /sys/bus/pci/devices/0000:0c:00.0/current_link_speed > >> > >>>> > 16.0 GT/s PCIe > >> > >>>> > > >> > >>>> > Problem 1: None of the speed numbers match 4 GB/s. > >> > >>>> > >> > >>>> US bridge = 2.5GT/s means operating at PCIe Gen 1 speed. > Total > >> > >>>> theoretical bandwidth is then derived based on encoding and > total > >> > >>>> number > >> > >>>> of lanes. > >> > >>>> > >> > >>>> > Problem 2: Userspace doesn't know the bus index of the > bridges, > >> > >>>> and it's > >> > >>>> > not clear which bridge should be used. > >> > >>>> > >> > >>>> In general, modern ones have this arch= US->DS->EP. US is > the one > >> > >>>> connected to physical link. > >> > >>>> > >> > >>>> > Problem 3: The PCIe gen number is missing. > >> > >>>> > >> > >>>> Current link speed is based on whether it's Gen1/2/3/4/5. > >> > >>>> > >> > >>>> BTW, your patch makes use of capabilities flags which gives > the maximum > >> > >>>> supported speed/width by the device. It may not necessarily > reflect the > >> > >>>> current speed/width negotiated. I guess in NV, this info is > already > >> > >>>> obtained from PMFW and made available through metrics table. > >> > >>>> > >> > >>>> > >> > >>>> It computes the minimum of the device PCIe gen and the > motherboard/slot > >> > >>>> PCIe gen to get the final value. These 2 lines do that. The low > 16 bits > >> > >>>> of the mask contain the device PCIe gen mask. The high 16 bits > of the > >> > >>>> mask contain the slot PCIe gen mask. > >> > >>>> + pcie_gen_mask = adev->pm.pcie_gen_mask & > (adev->pm.pcie_gen_mask >> 16); > >> > >>>> + dev_info->pcie_gen = fls(pcie_gen_mask); > >> > >>>> > >> > >>> With DPM in place on some ASICs, how much does this static info > help for > >> > >>> upper level apps? > >> > >>> > >> > >>> > >> > >>> It helps UMDs make better decisions if they know the maximum > achievable bandwidth. UMDs also compute the maximum memory bandwidth and > compute performance (FLOPS). Right now it's printed by Mesa to give users > detailed information about their GPU. For example: > >> > >>> > >> > >>> $ AMD_DEBUG=info glxgears > >> > >>> Device info: > >> > >>> name = NAVI21 > >> > >>> marketing_name = AMD Radeon RX 6800 > >> > >>> num_se = 3 > >> > >>> num_rb = 12 > >> > >>> num_cu = 60 > >> > >>> max_gpu_freq = 2475 MHz > >> > >>> max_gflops = 19008 GFLOPS > >> > >>> l0_cache_size = 16 KB > >> > >>> l1_cache_size = 128 KB > >> > >>> l2_cache_size = 4096 KB > >> > >>> l3_cache_size = 128 MB > >> > >>> memory_channels = 16 (TCC blocks) > >> > >>> memory_size = 16 GB (16384 MB) > >> > >>> memory_freq = 16 GHz > >> > >>> memory_bus_width = 256 bits > >> > >>> memory_bandwidth = 512 GB/s > >> > >>> pcie_gen = 1 > >> > >>> pcie_num_lanes = 16 > >> > >>> pcie_bandwidth = 4.0 GB/s > >> > >>> > >> > >>> Marek > >> > >
From 6220395fb0b9c10c92ea67b80e09120e6f92a499 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Marek=20Ol=C5=A1=C3=A1k?= <marek.ol...@amd.com> Date: Sat, 24 Dec 2022 17:44:26 -0500 Subject: [PATCH] drm/amdgpu: return the PCIe gen and lanes from the INFO ioctl MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit For computing PCIe bandwidth in userspace and troubleshooting PCIe bandwidth issues. For example, my Navi21 has been limited to PCIe gen 1 and this is the first time I noticed it after 2 years. Note that this intentionally fills a hole and padding in drm_amdgpu_info_device. Signed-off-by: Marek Olšák <marek.ol...@amd.com> Reviewed-by: Alex Deucher <alexander.deuc...@amd.com> --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 14 +++++++++++++- include/uapi/drm/amdgpu_drm.h | 6 ++++-- 3 files changed, 19 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 06aba201d4db..a75dba2caeca 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -106,9 +106,10 @@ * - 3.49.0 - Add gang submit into CS IOCTL * - 3.50.0 - Update AMDGPU_INFO_DEV_INFO IOCTL for minimum engine and memory clock * Update AMDGPU_INFO_SENSOR IOCTL for PEAK_PSTATE engine and memory clock + * 3.51.0 - Return the PCIe gen and lanes from the INFO ioctl */ #define KMS_DRIVER_MAJOR 3 -#define KMS_DRIVER_MINOR 50 +#define KMS_DRIVER_MINOR 51 #define KMS_DRIVER_PATCHLEVEL 0 unsigned int amdgpu_vram_limit = UINT_MAX; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index 903e8770e275..fba306e0ef87 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c @@ -42,6 +42,7 @@ #include "amdgpu_gem.h" #include "amdgpu_display.h" #include "amdgpu_ras.h" +#include "amd_pcie.h" void amdgpu_unregister_gpu_instance(struct amdgpu_device *adev) { @@ -766,6 +767,7 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) case AMDGPU_INFO_DEV_INFO: { struct drm_amdgpu_info_device *dev_info; uint64_t vm_size; + uint32_t pcie_gen_mask; int ret; dev_info = kzalloc(sizeof(*dev_info), GFP_KERNEL); @@ -798,7 +800,6 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) dev_info->num_rb_pipes = adev->gfx.config.max_backends_per_se * adev->gfx.config.max_shader_engines; dev_info->num_hw_gfx_contexts = adev->gfx.config.max_hw_contexts; - dev_info->_pad = 0; dev_info->ids_flags = 0; if (adev->flags & AMD_IS_APU) dev_info->ids_flags |= AMDGPU_IDS_FLAGS_FUSION; @@ -852,6 +853,17 @@ int amdgpu_info_ioctl(struct drm_device *dev, void *data, struct drm_file *filp) dev_info->tcc_disabled_mask = adev->gfx.config.tcc_disabled_mask; + /* Combine the chip gen mask with the platform (CPU/mobo) mask. */ + pcie_gen_mask = adev->pm.pcie_gen_mask & (adev->pm.pcie_gen_mask >> 16); + dev_info->pcie_gen = fls(pcie_gen_mask); + dev_info->pcie_num_lanes = + adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X32 ? 32 : + adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X16 ? 16 : + adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X12 ? 12 : + adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X8 ? 8 : + adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X4 ? 4 : + adev->pm.pcie_mlw_mask & CAIL_PCIE_LINK_WIDTH_SUPPORT_X2 ? 2 : 1; + ret = copy_to_user(out, dev_info, min((size_t)size, sizeof(*dev_info))) ? -EFAULT : 0; kfree(dev_info); diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index fe7f871e3080..973af6d06626 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -1053,7 +1053,8 @@ struct drm_amdgpu_info_device { __u32 enabled_rb_pipes_mask; __u32 num_rb_pipes; __u32 num_hw_gfx_contexts; - __u32 _pad; + /* PCIe version (the smaller of the GPU and the CPU/motherboard) */ + __u32 pcie_gen; __u64 ids_flags; /** Starting virtual address for UMDs. */ __u64 virtual_address_offset; @@ -1100,7 +1101,8 @@ struct drm_amdgpu_info_device { __u32 gs_prim_buffer_depth; /* max gs wavefront per vgt*/ __u32 max_gs_waves_per_vgt; - __u32 _pad1; + /* PCIe number of lanes (the smaller of the GPU and the CPU/motherboard) */ + __u32 pcie_num_lanes; /* always on cu bitmap */ __u32 cu_ao_bitmap[4][4]; /** Starting high virtual address for UMDs. */ -- 2.34.1