amdgpu_detect_virtualization reads register, amdgpu_device_rreg access
adev->reset_domain->sem if kernel defined CONFIG_LOCKDEP, below is the
random boot hang backtrace on Vega10. It may get random NULL pointer
access backtrace if amdgpu_sriov_runtime is true too.

Move amdgpu_reset_create_reset_domain before calling to RREG32.

 BUG: kernel NULL pointer dereference, address:
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: 0000 [#1] PREEMPT SMP NOPTI
 Workqueue: events work_for_cpu_fn
 RIP: 0010:down_read_trylock+0x13/0xf0
 Call Trace:
  <TASK>
  amdgpu_device_skip_hw_access+0x38/0x80 [amdgpu]
  amdgpu_device_rreg+0x1b/0x170 [amdgpu]
  amdgpu_detect_virtualization+0x73/0x100 [amdgpu]
  amdgpu_device_init.cold.60+0xbe/0x16b1 [amdgpu]
  ? pci_bus_read_config_word+0x43/0x70
  amdgpu_driver_load_kms+0x15/0x120 [amdgpu]
  amdgpu_pci_probe+0x1a1/0x3a0 [amdgpu]

Fixes: 1228996f0 ("drm/amdgpu: Move reset sem into reset_domain")
Signed-off-by: Philip Yang <philip.y...@amd.com>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 71876ff466df..a35c6acfe2ab 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -3665,6 +3665,15 @@ int amdgpu_device_init(struct amdgpu_device *adev,
        if (amdgpu_mes && adev->asic_type >= CHIP_NAVI10)
                adev->enable_mes = true;
 
+       /*
+        * Reset domain needs to be present early, before XGMI hive discovered
+        * (if any) and intitialized to use reset sem and in_gpu reset flag
+        * early on during init and before calling to RREG32.
+        */
+       adev->reset_domain = amdgpu_reset_create_reset_domain(SINGLE_DEVICE, 
"amdgpu-reset-dev");
+       if (!adev->reset_domain)
+               return -ENOMEM;
+
        /* detect hw virtualization here */
        amdgpu_detect_virtualization(adev);
 
@@ -3674,15 +3683,6 @@ int amdgpu_device_init(struct amdgpu_device *adev,
                return r;
        }
 
-       /*
-        * Reset domain needs to be present early, before XGMI hive discovered
-        * (if any) and intitialized to use reset sem and in_gpu reset flag
-        * early on during init.
-        */
-       adev->reset_domain = amdgpu_reset_create_reset_domain(SINGLE_DEVICE, 
"amdgpu-reset-dev");
-       if (!adev->reset_domain)
-               return -ENOMEM;
-
        /* early init functions */
        r = amdgpu_device_ip_early_init(adev);
        if (r)
-- 
2.35.1

Reply via email to