Enhance error handling in function amdgpu_pci_probe() to avoid
possible resource leakage.
Signed-off-by: Jiang Liu
Reviewed-by: Mario Limonciello
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd
Introduce new interface amdgpu_xcp_drm_dev_free() to free a specific
drm_device crreated by amdgpu_xcp_drm_dev_alloc(), which will be used
to do error recovery.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c | 63 +
drivers/gpu/drm/amd/amdxcp
a/0x30
[ 90.092742]
[ 90.252277] ---[ end trace ]---
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index 272954
is unnecessary.
3) add amdgpu_xcp_drm_dev_free() in patch 0003 to enhance amdxcp
driver to better support device remove and error handling.
4) reworked patch 0005 to fix it in amdgpu instead of drm core.
Jiang Liu (5):
drm/amdxcp: introduce new API amdgpu_xcp_drm_dev_free()
drm/amdgpu: fix
Introduce amdgpu_device_fini_schedulers() to clean scheduler related
resources, and avoid possible invalid memory access.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 29 +++---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 ---
2 files changed
+0x176/0x310
[16002.344324] do_syscall_64+0x5d/0x170
[16002.348858] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[16002.354956] RIP: 0033:0x7f2736a620cb-12-26
Fix it by removing xcp drm devices when failed to probe GPU devices.
Signed-off-by: Jiang Liu
Tested-by: Shuo Liu
Reviewed-by: Lijo Lazar
Enhance error handling in function amdgpu_pci_probe() to avoid
possible resource leakage.
Signed-off-by: Jiang Liu
Reviewed-by: Mario Limonciello
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd
a/0x30
[ 90.092742]
[ 90.252277] ---[ end trace ]---
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c
index 272954
Introduce amdgpu_device_fini_schedulers() to clean scheduler related
resources, and avoid possible invalid memory access.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 35 +++---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 --
2 files changed
amdgpu_xcp_drm_dev_free() in patch 0003 to enhance amdxcp
driver to better support device remove and error handling.
4) reworked patch 0005 to fix it in amdgpu instead of drm core.
Jiang Liu (5):
drm/amdxcp: introduce new API amdgpu_xcp_drm_dev_free()
drm/amdgpu: fix use after free bug related to
Introduce new interface amdgpu_xcp_drm_dev_free() to free a specific
drm_device crreated by amdgpu_xcp_drm_dev_alloc(), which will be used
to do error recovery.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c | 63 +
drivers/gpu/drm/amd/amdxcp
+0x176/0x310
[16002.344324] do_syscall_64+0x5d/0x170
[16002.348858] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[16002.354956] RIP: 0033:0x7f2736a620cb-12-26
Fix it by removing xcp drm devices when failed to probe GPU devices.
Signed-off-by: Jiang Liu
Tested-by: Shuo Liu
Reviewed-by: Lijo Lazar
Introduce helper amdgpu_bo_get_pinned_gpu_addr(), which will be
used to update GPU address of pinned kernel BO during resume.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c | 9 +
drivers/gpu/drm/amd/amdgpu/amdgpu_object.h | 1 +
drivers/gpu/drm/amd/amdgpu
so we can't test
our hypothesis. And we are not sure whether there are still other
blocking to enable resume with different AMD SR-IOV vGPUs.
Help is needed to identify more task items to enable resume with
different AMD SR-IOV vGPUs:)
Jiang Liu (2):
drm/amdgpu: update cached vram base addres
When resume on a different SR-IOV vGPU device, the VRAM base addresses
may have changed. So we need to update those cached addresses.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 15 +++
drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h| 6 --
drivers/gpu
: amdgpu_device_ip_resume failed
(-110).
[ 555.126965] PM: dpm_run_callback(): pci_pm_resume+0x0/0xe0 returns -110
[ 555.126966] PM: Device :0a:00.0 failed to resume async: error -110
This fix has been tested on Mi308X.
Signed-off-by: Jiang Liu
Tested-by: Shuo Liu
---
drivers/gpu/drm/amd/amdgpu/soc15.c
() AMDGPU_IP_STATE_SW
.sw_fini() AMDGPU_IP_STATE_EARLY
.late_fini() AMDGPU_IP_STATE_INVALID
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/aldebaran.c | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 48 ++-
drivers/gpu/drm/amd/amdgpu
Add a flag to track ras debugfs creation status, to avoid possible
incorrect reference count management for ras block object in function
amdgpu_ras_aca_is_supported().
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9
- refine the way to define status markers
- split amdgpu_dm related change into a dedicated patch
- add patch 13 to walk ip blocks in reverse order when shutdown
Jiang Liu (15):
drm/amdgpu: add helper functions to track status for ras manager
drm/amdgpu: add a flag to track ras debugfs creation
d-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 44 +-
2 files changed, 31 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_
` callback.
4) call amdgpu_ras_fini() before invoking ip_blocks[i].late_fini.
There's one more task left to analyze GPU reset related state machine
transitions.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 22 --
1 file changed, 20 inser
Introduce following IP block iterators to reduce duplicated code:
- amdgpu_for_each_ip_block
- amdgpu_for_each_ip_block_reverse
- amdgpu_for_each_ip_block_valid
- amdgpu_for_each_ip_block_valid_reverse
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/aldebaran.c| 46
Currently we track the refcount on ras block object for features by
checking `if (obj && amdgpu_ras_is_feature_enabled(adev, head))`,
which is a little unreliable. So introduce a dedicated flag to track
the reference count.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/
amdgpu_nbio_ras_early_fini() to undo work done by
amdgpu_nbio_ras_late_init().
2) remove call of amdgpu_irq_put in _hw_fini().
3) record the status where reference count is held for specific irq.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 16 +++-
drivers/gpu/drm/amd
Walk IP blocks in reverse order in function amdgpu_device_ip_fini_early
and amdgpu_device_smu_fini_early, to keep consistence with other finish
functions.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff
().
3) call xgpu_nv_mailbox_put_irq() for nv.c to avoid possible resource
leakage.
4) use flags to track irq reference count usage.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/nv.c| 14 +++-
drivers/gpu/drm/amd/amdgpu/soc15.c | 22 +++
drivers/gpu/drm/amd
Rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini(), to keep same
style with other code.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c| 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +-
drivers/gpu
true
sw_init:sw = true
hw_init:hw = true
late_init: late_initialized = true
early_fini: late_initialized = false
hw_fini:hw = false
sw_fini:sw = false
late_fini: valid = false
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.
Enhance amdgpu_dm_early_fini() so it can be called in power
management operations.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
b/drivers/gpu/drm/amd
Free all allocated resources on error recovery path in function
amdgpu_ras_init().
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 ++-
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu
Enhance amdgpu_ras_block_late_fini() to revert what has been done
by amdgpu_ras_block_late_init(), and fix a possible resource leakage
in function amdgpu_ras_block_late_init().
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 ++--
1 file changed, 10
Add helper functions to track status for ras manager and ip blocks.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 38 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 37
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 10 +++
3
Function detects initialization status by checking sched->ops, so set
sched->ops to non-NULL just before return in function
amdgpu_fence_driver_sw_fini() and amdgpu_device_init_schedulers()
to avoid possible invalid memory access on error recover path.
Signed-off-by: Jiang Liu
---
drive
0
[ 1802.213878]
[ 1802.213879] ---[ end trace ]---
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 +
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu
+0x176/0x310
[16002.344324] do_syscall_64+0x5d/0x170
[16002.348858] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[16002.354956] RIP: 0033:0x7f2736a620cb-12-26
Fix it by removing xcp drm devices when failed to probe GPU devices.
Signed-off-by: Jiang Liu
Tested-by: Shuo Liu
Reviewed-by: Lijo Lazar
Introduce new interface amdgpu_xcp_drm_dev_free() to free a specific
drm_device crreated by amdgpu_xcp_drm_dev_alloc(), which will be used
to do error recovery.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c | 65 +
drivers/gpu/drm/amd/amdxcp
Enhance error handling in function amdgpu_pci_probe() to avoid
possible resource leakage.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu
drm core.
Jiang Liu (5):
drm/amdgpu: clear adev->in_suspend flag when fails to suspend
drm/amdxcp: introduce new API amdgpu_xcp_drm_dev_free()
drm/amdgpu: fix use after free bug related to
amdgpu_driver_release_kms()
drm/amdgpu: enhance error handling in function amdgpu_pci_pr
amdgpu_nbio_ras_early_fini() to undo work done by
amdgpu_nbio_ras_late_init().
2) remove call of amdgpu_irq_put in _hw_fini().
3) record the status where reference count is held for specific irq.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_nbio.c | 16 +++-
drivers/gpu/drm/amd
Add helper functions to track status for ras manager and ip blocks.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 38 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 38 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 10
d-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 ++-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c| 45 ++
2 files changed, 32 insertions(+), 19 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_
etc, to follow the new design. Currently we have only taken the
nbio and asic as examples to show the proposed changes. Once we have
confirmed that's the right way to go, we will handle the lefting
subsystems.
This is in early stage and requesting for comments, any comments and
suggestions
Enhance amdgpu_ras_block_late_fini() to revert what has been done
by amdgpu_ras_block_late_init(), and fix a possible resource leakage
in function amdgpu_ras_block_late_init().
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 ++--
1 file changed, 10
Rename amdgpu_ras_pre_fini() to amdgpu_ras_early_fini(), to keep same
style with other code.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++---
drivers/gpu/drm/amd/amdgpu/amdgpu_gfx.c| 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_jpeg.c | 2 +-
drivers/gpu
Add a flag to track ras debugfs creation status, to avoid possible
incorrect reference count management for ras block object in function
amdgpu_ras_aca_is_supported().
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 9
_init:sw = true
hw_init:hw = true
late_init: late_initialized = true
early_fini: late_initialized = false
hw_fini:hw = false
sw_fini:sw = false
late_fini: valid = false
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 -
Currently we track the refcount on ras block object for features by
checking `if (obj && amdgpu_ras_is_feature_enabled(adev, head))`,
which is a little unreliable. So introduce a dedicated flag to track
the reference count.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/
().
3) call xgpu_nv_mailbox_put_irq() for nv.c to avoid possible resource
leakage.
4) use flags to track irq reference count usage.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/nv.c| 14 +++-
drivers/gpu/drm/amd/amdgpu/soc15.c | 22 +++
drivers/gpu/drm/amd
_init(amdgpu_irq_get), but
sdma_v4_4_2_xcp_suspend() invokes amdgpu_irq_put(), thus causes
unbalanced irq reference count. Fix it by calling amdgpu_irq_get()
in function sdma_v4_4_2_xcp_resume().
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h | 2 +-
drivers/gpu/drm/amd/a
The adev->ip_blocks array is not indexed by AMD_IP_BLOCK_TYPE_xxx,
instead we should call amdgpu_device_ip_get_ip_block() to get the
corresponding IP block oject.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 8 ++--
1 file changed, 6 insertions(+), 2 deleti
_fini.
There's one more task left to analyze GPU reset related state machine
transitions.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c| 22 +--
.../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 3 +++
2 files changed, 23 insertions(+), 2 deletion
Free all allocated resources on error recovery path in function
amdgpu_ras_init().
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 19 ++-
1 file changed, 14 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
b/drivers/gpu
301408] ret_from_fork+0x1f/0x30
[ 1209.301410] ---[ end trace 733f120fe2ab13e5 ]---
[ 1209.301418] [ cut here ]
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
2 files changed, 8 insertions
Enhance error handling in function amdgpu_pci_probe() to avoid
possible resource leakage.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 12 +---
1 file changed, 9 insertions(+), 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu
Function detects initialization status by checking sched->ops, so set
sched->ops to non-NULL just before return in function
amdgpu_fence_driver_sw_fini() and amdgpu_device_init_schedulers()
to avoid possible invalid memory access on error recover path.
Signed-off-by: Jiang Liu
---
drive
348858] entry_SYSCALL_64_after_hwframe+0x76/0x7e
2024-12-26 16:17:46 [16002.354956] RIP: 0033:0x7f2736a620cb-12-26
Fix it by removing xcp drm devices when failed to probe GPU devices.
Signed-off-by: Jiang Liu
Tested-by: Shuo Liu
Reviewed-by: Lijo Lazar
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +-
drive
Introduce new interface amdgpu_xcp_drm_dev_free() to free a specific
drm_device crreated by amdgpu_xcp_drm_dev_alloc(), which will be used
to do error recovery.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdxcp/amdgpu_xcp_drv.c | 76 +
drivers/gpu/drm/amd/amdxcp
amdxcp
driver to better support device remove and error handling.
4) reworked patch 0005 to fix it in amdgpu instead of drm core.
Jiang Liu (6):
drm/amdgpu: clear adev->in_suspend flag when fails to suspend
drm/amdxcp: introduce new API amdgpu_xcp_drm_dev_free()
drm/amdgpu: fix use af
0
[ 1802.213878]
[ 1802.213879] ---[ end trace ]---
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 +
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu
348858] entry_SYSCALL_64_after_hwframe+0x76/0x7e
2024-12-26 16:17:46 [16002.354956] RIP: 0033:0x7f2736a620cb-12-26
Fix it by unplugging xcp drm devices when failed to probe GPU devices.
Signed-off-by: Jiang Liu
Tested-by: Shuo Liu
Reviewed-by: Lijo Lazar
---
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 +++-
Introduce new interface amdgpu_xcp_drm_dev_free() to free a specific
drm_device crreated by amdgpu_xcp_drm_dev_alloc(), which will be used
to do error recovery.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.c | 11 +++-
drivers/gpu/drm/amd/amdgpu/amdgpu_xcp.h | 1
301408] ret_from_fork+0x1f/0x30
[ 1209.301410] ---[ end trace 733f120fe2ab13e5 ]---
[ 1209.301418] [ cut here ]
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
2 files changed, 8 insertions
Function detects initialization status by checking sched->ops, so set
sched->ops to non-NULL just before return in function drm_sched_init()
to avoid possible invalid memory access on error recover path.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 1 +
drive
0
[ 1802.213878]
[ 1802.213879] ---[ end trace ]---
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 +
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu
amd-staging-drm-next.
2) removed the first patch, which is unnecessary.
3) add amdgpu_xcp_drm_dev_free() in patch 0003 to enhance amdxcp
driver to better support device remove and error handling.
4) reworked patch 0005 to fix it in amdgpu instead of drm core.
Jiang Liu (6):
amdgpu: fix invalid
Fix possible resource leakage on error recovery path in function
kgd2kfd_device_init().
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 9 +
1 file changed, 5 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_device.c
b/drivers/gpu/drm
Function detects initialization status by checking sched->ops, so set
sched->ops to non-NULL just before return in function drm_sched_init()
to avoid possible invalid memory access on error recover path.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/scheduler/sched_main.c | 3 +++
1 file c
01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 b0
00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 75 6d 19 00 f7 d8 64 89 01
48
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdkfd/kfd_device.c | 7 ++-
1 file changed, 6 insertions(+), 1 deletion(-)
diff --g
0
[ 1802.213878]
[ 1802.213879] ---[ end trace ]---
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 13 +
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu
This patchset tries to fix several memory leakages/invalid memory
accesses on error handling path during GPU driver loading/unloading.
They applies to:
https://github.com/ROCm/ROCK-Kernel-Driver/tree/master/drivers
Jiang Liu (6):
amdgpu: add flags to track sysfs initialization status
amdgpu
348858] entry_SYSCALL_64_after_hwframe+0x76/0x7e
2024-12-26 16:17:46 [16002.354956] RIP: 0033:0x7f2736a620cb-12-26
Fix it by unplugging xcp drm devices when failed to probe GPU devices.
Signed-off-by: Jiang Liu
Tested-by: Shuo Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 4 +++-
drivers/gpu/drm/amd/
301408] ret_from_fork+0x1f/0x30
[ 1209.301410] ---[ end trace 733f120fe2ab13e5 ]---
[ 1209.301418] [ cut here ]
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++--
drivers/gpu/drm/amd/amdgpu/amdgpu_ring.h | 1 +
2 files changed, 8 insertions
Add flags to track sysfs initialization status, so we can correctly
clean them up on error recover paths.
Signed-off-by: Jiang Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu.h| 3 ++
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 34 +-
2 files changed, 30 insertions(+), 7
73 matches
Mail list logo