From: Vitaly Prosyak
User queues are disabled before GEM objects are released
(protecting against user app crashes).
No races with PCI hot-unplug (because drm_dev_enter prevents cleanup
if device is being removed).
Cc: Christian König
Cc: Alex Deucher
Signed-off-by: Vitaly Prosyak
---
driver
From: Vitaly Prosyak
User queues are disabled before GEM objects are released
(protecting against user app crashes).
No races with PCI hot-unplug (because drm_dev_enter prevents cleanup
if device is being removed).
Signed-off-by: Vitaly Prosyak
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 +
From: Vitaly Prosyak
[ +0.20] BUG: KASAN: slab-use-after-free in
amdgpu_userq_suspend+0x51a/0x5a0 [amdgpu]
[ +0.000817] Read of size 8 at addr 88812eec8c58 by task
amd_pci_unplug/1733
[ +0.27] CPU: 10 UID: 0 PID: 1733 Comm: amd_pci_unplug Tainted: GW
6.14.0+ #2
From: Vitaly Prosyak
This reverts commit 0203ef5eb3b2a3a10dd31bac8fc2fa3b439cbb09.
The original patch moved `amdgpu_userq_mgr_fini()` to the driver's `postclose`
callback, which is called after `drm_gem_release()` in the DRM file cleanup
sequence. If a user application crashes or aborts without
From: Vitaly Prosyak
The issue was reproduced on NV10 using IGT pci_unplug test.
It is expected that `amdgpu_driver_postclose_kms()` is called prior to
`amdgpu_drm_release()`.
However, the bug is that `amdgpu_fpriv` was freed in
`amdgpu_driver_postclose_kms()`, and then
later accessed in `amdgp
From: Vitaly Prosyak
The issue was reproduced on NV10 using IGT pci_unplug test.
It is expected that `amdgpu_driver_postclose_kms()` is called prior to
`amdgpu_drm_release()`.
However, the bug is that `amdgpu_fpriv` was freed in
`amdgpu_driver_postclose_kms()`, and then
later accessed in `amdgp
From: Vitaly Prosyak
The cleaner shader is a piece of GPU code that is used to clear or
initialize certain GPU resources, such as Local Data Share (LDS), Vector
General Purpose Registers (VGPRs), and Scalar General Purpose Registers
(SGPRs).
Cc: Christian König
Cc: Alex Deucher
Signed-off-by:
From: Vitaly Prosyak
[ +0.21] BUG: KASAN: slab-use-after-free in
drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[ +0.27] Read of size 8 at addr 8881b8605f88 by task
amd_pci_unplug/2147
[ +0.23] CPU: 6 PID: 2147 Comm: amd_pci_unplug Not tainted 6.10.0+ #1
[ +0.16] Hardwa
From: Vitaly Prosyak
[ +0.21] BUG: KASAN: slab-use-after-free in
drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[ +0.27] Read of size 8 at addr 8881b8605f88 by task
amd_pci_unplug/2147
[ +0.23] CPU: 6 PID: 2147 Comm: amd_pci_unplug Not tainted 6.10.0+ #1
[ +0.16] Hardwa
From: Vitaly Prosyak
[ +0.21] BUG: KASAN: slab-use-after-free in
drm_sched_entity_flush+0x6cb/0x7a0 [gpu_sched]
[ +0.27] Read of size 8 at addr 8881b8605f88 by task
amd_pci_unplug/2147
[ +0.23] CPU: 6 PID: 2147 Comm: amd_pci_unplug Not tainted 6.10.0+ #1
[ +0.16] Hardwa
From: Vitaly Prosyak
The current implementation of drm_sched_start uses a hardcoded -ECANCELED to
dispose of a job when
the parent/hw fence is NULL. This results in drm_sched_job_done being called
with -ECANCELED for
each job with a NULL parent in the pending list, making it difficult to
disti
From: Vitaly Prosyak
The current implementation of drm_sched_start uses a hardcoded -ECANCELED to
dispose of a job when
the parent/hw fence is NULL. This results in drm_sched_job_done being called
with -ECANCELED for
each job with a NULL parent in the pending list, making it difficult to
disti
From: Vitaly Prosyak
The current implementation of drm_sched_start uses a hardcoded -ECANCELED to
dispose of a job when
the parent/hw fence is NULL. This results in drm_sched_job_done being called
with -ECANCELED for
each job with a NULL parent in the pending list, making it difficult to
disti
From: Vitaly Prosyak
In the case of a queue reset, we need the ability to customize the
error code from -ECANCELED to -ENODATA for scenarios where the queue
reset is successful. It was decided to use -ECANCELED for GPU reset cases
and -ENODATA for queue reset cases. This change introduces an erro
From: Vitaly Prosyak
[ +0.006038] BUG: kernel NULL pointer dereference, address: 0028
[ +0.006969] #PF: supervisor read access in kernel mode
[ +0.005139] #PF: error_code(0x) - not-present page
[ +0.005139] PGD 0 P4D 0
[ +0.002530] Oops: [#1] PREEMPT SMP NOPTI
[ +0.0043
From: Vitaly Prosyak
The bug can be triggered by sending an amdgpu_cs_wait_ioctl
to the AMDGPU DRM driver on any ASICs with valid context.
The bug was reported by Joonkyo Jung .
For example the following code:
static void Syzkaller2(int fd)
{
union drm_amdgpu_ctx arg1;
un
From: Vitaly Prosyak
The bug can be triggered by sending an amdgpu_cs_wait_ioctl
to the AMDGPU DRM driver on any ASICs with valid context.
The bug was reported by Joonkyo Jung .
For example the following code:
static void Syzkaller2(int fd)
{
union drm_amdgpu_ctx arg1;
un
From: Vitaly Prosyak
The bug can be triggered by sending an amdgpu_cs_wait_ioctl
to the AMDGPU DRM driver on any ASICs with valid context.
The bug was reported by Joonkyo Jung .
For example the following code:
static void Syzkaller2(int fd)
{
union drm_amdgpu_ctx arg1;
un
From: Vitaly Prosyak
The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl
to the AMDGPU DRM driver on any ASICs with an invalid address and size.
The bug was reported by Joonkyo Jung .
For example the following code:
static void Syzkaller1(int fd)
{
struct drm_amdgpu_gem
From: Vitaly Prosyak
The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl
to the AMDGPU DRM driver on any ASICs with an invalid address and size.
The bug was reported by Joonkyo Jung .
For example the following code:
static void Syzkaller1(int fd)
{
struct drm_amdgpu_gem
From: Vitaly Prosyak
The bug can be triggered by sending a single amdgpu_gem_userptr_ioctl
to the AMDGPU DRM driver on any ASICs with an invalid address and size.
The bug was reported by Joonkyo Jung .
For example the following code:
static void Syzkaller1(int fd)
{
struct drm_amdgpu_gem
From: Vitaly Prosyak
The issue started to appear after the following commit
11b3b9f461c5c4f700f6c8da202fcc2fd6418e1f (scheduler to variable number
of run-queues). The scheduler flag ready (ring->sched.ready) could not be
used to validate multiple scenarios, for example, check job is running
From: Vitaly Prosyak
When software 'pci unplug' using IGT is executed we got a sysfs directory
entry is NULL for differant ras blocks like hdp, umc, etc.
Before call 'sysfs_remove_file_from_group' and 'sysfs_remove_group'
check that 'sd' is not NULL.
[ +0.01] RIP: 0010:sysfs_remove_group+0
From: Vitaly Prosyak
During an IGT GPU reset test we see again oops despite of
commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling
timeout handling).
It uses ready condition whether to call drm_sched_fault which unwind
the TDR leads to GPU reset.
However it looks the ready con
From: Vitaly Prosyak
During an IGT GPU reset test we see again oops despite of
commit 0c8c901aaaebc9 (drm/sched: Check scheduler ready before calling
timeout handling).
It uses ready condition whether to call drm_sched_fault which unwind
the TDR leads to GPU reset.
However it looks the ready con
From: Vitaly Prosyak
During an IGT GPU reset test we see again oops despite of
commit 0c8c901aaaebc9bf8bf189ffc116e678f7a2dc16
drm/sched: Check scheduler ready before calling timeout handling.
It uses ready condition whether to call drm_sched_fault which unwind
the TDR leads to GPU reset.
Howeve
From: Vitaly Prosyak
We allow sending PSP messages LOAD_ASD and UNLOAD_TA without
acquiring a lock in drm_dev_enter during driver unload
because we must call drm_dev_unplug as the beginning
of unload driver sequence.
Added WARNING if other PSP messages are sent without a lock.
After this commit,
From: Vitaly Prosyak
We allow sending PSP messages LOAD_ASD and UNLOAD_TA without
acquiring a lock in drm_dev_enter during driver unload
because we must call drm_dev_unplug as the beginning
of unload driver sequence.
Added WARNING if other PSP messages are sent without a lock.
After this commit,
From: Vitaly Prosyak
This reverts commit fac53471d0ea9693d314aa2df08d62b2e7e3a0f8.
The following change: move the drm_dev_unplug call after
amdgpu_driver_unload_kms in amdgpu_pci_remove. The reason is
the following: amdgpu_pci_remove calls drm_dev_unregister
and it should be called first to ensur
From: Vitaly Prosyak
We allow sending PSP messages LOAD_ASD and UNLOAD_TA without
acquiring a lock in drm_dev_enter during driver unload
because we must call drm_dev_unplug as the beginning
of unload driver sequence.
Added WARNING if other PSP messages are sent without a lock.
After this commit,
From: Vitaly Prosyak
We allow sending PSP messages LOAD_ASD and UNLOAD_TA without
acquiring a lock in drm_dev_enter during driver unload
because we must call drm_dev_unplug as the beginning
of unload driver sequence.
Added WARNING if other PSP messages are sent without a lock.
After this commit,
From: Vitaly Prosyak
Added condition for pci_dev_is_disconnected and keeps
drm_dev_is_unplugged to check whether we should unmap MMIO.
Suggested by Alex regarding pci_dev_is_disconnected.
Suggested by Christian keeping drm_dev_is_unplugged.
Signed-off-by: Vitaly Prosyak
Reviewed-by Alex Deucher
From: Vitaly Prosyak
This reverts commit fac53471d0ea9693d314aa2df08d62b2e7e3a0f8.
The following change: move the drm_dev_unplug call after
amdgpu_driver_unload_kms in amdgpu_pci_remove. The reason is
the following: amdgpu_pci_remove calls drm_dev_unregister
and it should be called first to ensur
From: Vitaly Prosyak
We allow sending PSP messages LOAD_ASD and UNLOAD_TA without acquiring a lock
in drm_dev_enter during driver unload because we must call drm_dev_unplug as the
beginning of unload driver sequence.
Added WARNING if other PSP messages are sent without a lock.
After this commit,
From: Vitaly Prosyak
This reverts commit fac53471d0ea9693d314aa2df08d62b2e7e3a0f8.
The following change: move the drm_dev_unplug call after
amdgpu_driver_unload_kms in amdgpu_pci_remove. The reason is
the following: amdgpu_pci_remove calls drm_dev_unregister
and it should be called first to ensur
From: Vitaly Prosyak
Revert the following change: move drm_dev_unplug call after
amdgpu_driver_unload_kms in amdgpu_pci_remove.
The reason is following: amdgpu_pci_remove calls drm_dev_unregister
and it should be called first to ensure userspace can't access the
device instance anymore. If we cal
36 matches
Mail list logo