On 2025-03-19 03:02, Daisuke Matsuda (Fujitsu) wrote:
On Tue, Mar 18, 2025 5:35 AM Felix Kuehling wrote:
On 2025-03-17 15:07, Deucher, Alexander wrote:
[Public]

-----Original Message-----
From: Daisuke Matsuda <matsuda-dais...@fujitsu.com>
Sent: Thursday, March 13, 2025 9:18 PM
To: amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org; Deucher,
Alexander <alexander.deuc...@amd.com>; Koenig, Christian
<christian.koe...@amd.com>
Cc: airl...@gmail.com; sim...@ffwll.ch; Daisuke Matsuda <matsuda-
dais...@fujitsu.com>
Subject: [PATCH] drm/amdgpu: Higher log level for missing PCIe atomics caps

Currently, ROCm requires CPUs that support PCIe atomics. The message is more
urgent for GPGPU users, meaning basic functionalities of ROCm are not available
on the node.

+ Felix

My understanding is that PCIe atomics are not strictly required, but there are 
some features that are not available
without them.  Warning seems a bit overkill and potentially confusing to users 
that have an existing system that
otherwise works fine.

I could see either argument. The GPU is capable of PCIe atomics, but the system 
configuration (chipset, PCIe switch, etc.)
is preventing them from working, which has an impact on some 
application-visible functionality. A naive application that
depends on atomics and is not carefully written to provide fallbacks or fail 
gracefully in case atomics are unavailable,
would fail silently or produce incorrect results. So maybe that justifies a 
warning message.
The Linux kernel commonly uses "warn level" messages to alert users when an 
incompatibility restricts functionality, and this approach aligns with that practice.
To address the concern about the message being too general, how about making it 
more specific like this:
   dev_warn(adev->dev, "PCIe atomic ops are not supported.  ROCm performance and 
functionality may be limited.\n");
This is more direct and focused on the impact to ROCm users.

I think it's not a performance problem. There is no general lower-performing fallback I'm aware of. The functionality impact is limited to kernels that try to do fine-grained synchronization with atomic operations to system memory. So I'd be wary of making the message too scary without giving any detail about how to avoid problems. Anyone who reads this message and doesn't know what PCIe atomics are, is probably not affected.

Regards,
  Felix



Regards,
Daisuke

Regards,
   Felix


Alex


Signed-off-by: Daisuke Matsuda <matsuda-dais...@fujitsu.com>
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  | 4 ++--
drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h | 2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 018dfccd771b..faeef136e272 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -4374,7 +4374,7 @@ int amdgpu_device_init(struct amdgpu_device *adev,
                       return r;
       }

-     /* enable PCIE atomic ops */
+     /* enable PCIe atomic ops */
       if (amdgpu_sriov_vf(adev)) {
               if (adev->virt.fw_reserve.p_pf2vf)
                       adev->have_atomics_support = ((struct
amd_sriov_msg_pf2vf_info *) @@ -4395,7 +4395,7 @@ int
amdgpu_device_init(struct amdgpu_device *adev,
       }

       if (!adev->have_atomics_support)
-             dev_info(adev->dev, "PCIE atomic ops is not supported\n");
+             dev_warn(adev->dev, "PCIe atomic ops are not supported\n");

       /* doorbell bar mapping and doorbell index init*/
       amdgpu_doorbell_init(adev);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
index b4f9c2f4e92c..c52605a07597 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgv_sriovmsg.h
@@ -240,7 +240,7 @@ struct amd_sriov_msg_pf2vf_info {
       } mm_bw_management[AMD_SRIOV_MSG_RESERVE_VCN_INST];
       /* UUID info */
       struct amd_sriov_msg_uuid_info uuid_info;
-     /* PCIE atomic ops support flag */
+     /* PCIe atomic ops support flag */
       uint32_t pcie_atomic_ops_support_flags;
       /* Portion of GPU memory occupied by VF.  MAX value is 65535, but set to
uint32_t to maintain alignment with reserved size */
       uint32_t gpu_capacity;
--
2.39.1

Reply via email to