After looking more
"Failed to initialize parser !" is expected since the reason is jobs
from guilty context (context causing the GPU hang) are canceled.
Locking imbalance is actually gone with Monk's patches.
Thanks,
Andrey
On 02/28/2018 11:40 AM, Andrey Grodzovsky wrote:
No new issues found with those patches, testing GPU reset using libdrm
deadlock detection test on Ellsmire.
The patches are Tested-By: Andrey Grodzovsky <andrey.grodzov...@amd.com>
P.S
Noticed existing issues (before Monk's patches)
Multiple [drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize
parser !
And occasional unlock imbalance forn amdgpu_cs_ioctl
DEBUG_LOCKS_WARN_ON(depth <= 0)
[ 93.069011 < 0.000017>] WARNING: CPU: 3 PID: 2215 at
kernel/locking/lockdep.c:3682 lock_release+0x2e8/0x360
On CZ full reset hangs the system.
Gonna take a look at those issues.
Thanks,
Andrey
From
On 02/28/2018 08:31 AM, Liu, Monk wrote:
Already sent
-----Original Message-----
From: Grodzovsky, Andrey
Sent: 2018年2月28日 21:31
To: Koenig, Christian <christian.koe...@amd.com>; Liu, Monk
<monk....@amd.com>; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/4] drm/amdgpu: stop all rings before doing gpu
recover
Will do once Monk sends V2 for [PATCH 4/4] drm/amdgpu: try again kiq
access if not in IRQ
Andrey
On 02/28/2018 07:20 AM, Christian König wrote:
Andrey please give this set a good testing as well.
Am 28.02.2018 um 08:21 schrieb Monk Liu:
found recover_vram_from_shadow sometimes get executed in paralle with
SDMA scheduler, should stop all schedulers before doing gpu
reset/recover
Change-Id: Ibaef3e3c015f3cf88f84b2eaf95cda95ae1a64e3
Signed-off-by: Monk Liu <monk....@amd.com>
For now this patch is Reviewed-by: Christian König
<christian.koe...@amd.com>.
Regards,
Christian.
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 40
+++++++++++-------------------
1 file changed, 15 insertions(+), 25 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 75d1733..e9d81a8 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2649,22 +2649,23 @@ int amdgpu_device_gpu_recover(struct
amdgpu_device *adev,
/* block TTM */
resched = ttm_bo_lock_delayed_workqueue(&adev->mman.bdev);
+
/* store modesetting */
if (amdgpu_device_has_dc_support(adev))
state = drm_atomic_helper_suspend(adev->ddev);
- /* block scheduler */
+ /* block all schedulers and reset given job's ring */
for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
struct amdgpu_ring *ring = adev->rings[i];
if (!ring || !ring->sched.thread)
continue;
- /* only focus on the ring hit timeout if &job not NULL */
+ kthread_park(ring->sched.thread);
+
if (job && job->ring->idx != i)
continue;
- kthread_park(ring->sched.thread);
drm_sched_hw_job_reset(&ring->sched, &job->base);
/* after all hw jobs are reset, hw fence is meaningless,
so force_completion */ @@ -2707,33 +2708,22 @@ int
amdgpu_device_gpu_recover(struct amdgpu_device *adev,
}
dma_fence_put(fence);
}
+ }
- for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
- struct amdgpu_ring *ring = adev->rings[i];
-
- if (!ring || !ring->sched.thread)
- continue;
+ for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
+ struct amdgpu_ring *ring = adev->rings[i];
- /* only focus on the ring hit timeout if &job not NULL
*/
- if (job && job->ring->idx != i)
- continue;
+ if (!ring || !ring->sched.thread)
+ continue;
+ /* only need recovery sched of the given job's ring
+ * or all rings (in the case @job is NULL)
+ * after above amdgpu_reset accomplished
+ */
+ if ((!job || job->ring->idx == i) && !r)
drm_sched_job_recovery(&ring->sched);
- kthread_unpark(ring->sched.thread);
- }
- } else {
- for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
- struct amdgpu_ring *ring = adev->rings[i];
- if (!ring || !ring->sched.thread)
- continue;
-
- /* only focus on the ring hit timeout if &job not NULL
*/
- if (job && job->ring->idx != i)
- continue;
-
- kthread_unpark(adev->rings[i]->sched.thread);
- }
+ kthread_unpark(ring->sched.thread);
}
if (amdgpu_device_has_dc_support(adev)) {
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx