On 8/28/2024 05:59, WangYuli wrote:
From: wenlunpeng <wenlunp...@uniontech.com>

The quirk is for reboot-stability.

A device reboot stress test has been observed to cause
random system hangs when amdgpu_dpm is enabled.

Disabling amdgpu_dpm can fix this.

However, a boot-param can still overwrite it to enable
amdgpu_dpm.

Serial log when error occurs:
...
Console: switching to colour frame buffer device 160x45
amdgpu 0000:01:00.0: fb0: amdgpudrmfb frame buffer device
[drm:amdgpu_device_ip_late_init] *ERROR* late_init of IP block <si_dpm> failed 
-22
amdgpu 0000:01:00.0: amdgpu_device_ip_late_init failed
amdgpu 0000:01:00.0: Fatal error during GPU init
[drm] amdgpu: finishing device.
Console: switching to colour dummy device 80x25
...

This is production hardware?

Have you already checked whether a BIOS upgrade for the device could help this issue?


Signed-off-by: wenlunpeng <wenlunp...@uniontech.com>
Signed-off-by: WangYuli <wangy...@uniontech.com>

Just to clarify did you guys co-work on this patch, or are you submitting on behalf of wenlunpeng? It right now shows up as you submitting on behalf of wenlunpeng. If you co-worked on it you should also use a Co-Developed-by tag.

---
  drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 23 +++++++++++++++++++++++
  1 file changed, 23 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 094498a0964b..81716fcac7cd 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
@@ -32,6 +32,7 @@
  #include <drm/drm_vblank.h>
#include <linux/cc_platform.h>
+#include <linux/dmi.h>
  #include <linux/dynamic_debug.h>
  #include <linux/module.h>
  #include <linux/mmu_notifier.h>
@@ -3023,10 +3024,32 @@ static struct pci_driver amdgpu_kms_pci_driver = {
        .dev_groups = amdgpu_sysfs_groups,
  };
+static int quirk_set_amdgpu_dpm_0(const struct dmi_system_id *dmi)
+{
+       amdgpu_dpm = 0;
+       pr_info("Identified '%s', set amdgpu_dpm to 0.\n", dmi->ident);
+       return 1;
+}
+
+static const struct dmi_system_id amdgpu_quirklist[] = {
+       {
+               .ident = "DS25 Desktop",
+               .matches = {
+                       DMI_MATCH(DMI_BOARD_NAME, "THTF-SW831-1W-DS25_MB"),

As this is suspected to be a BIOS issue, I would like to better understand if the BIOS upgrade fixes it. If it does but you would still like a quirk for the system it should include the BIOS version here.

+               },
+               .callback = quirk_set_amdgpu_dpm_0,
+       },
+       {}
+};
+
  static int __init amdgpu_init(void)
  {
        int r;
+ /* quirks for some hardware, applied only when it's untouched */
+       if (amdgpu_dpm == -1)
+               dmi_check_system(amdgpu_quirklist);
+
        if (drm_firmware_drivers_only())
                return -EINVAL;

Reply via email to