Yeah, that is a known issue which came to front again because Andrey's
patch is slightly buggy.
Please test and review the attached (only compile tested) fix for
Andrey's patch.
Still working on finding the root cause, but so far didn't had time for
that.
Regards,
Christian.
Am 19.12.2017 um 18:26 schrieb Michel Dänzer:
On 2017-12-13 08:44 PM, Andrey Grodzovsky wrote:
With introduction of amdgpu_gpu_recovery we don't need any more
to rely on amdgpu_lockup_timeout == 0 for disabling GPU reset.
Signed-off-by: Andrey Grodzovsky <andrey.grodzov...@amd.com>
Since this change landed, I'm once again unable to finish a piglit run
on my development machine, see the attached dmesg output (happens pretty
quickly, after ~5% of piglit tests have run). I realized that with
lockup_timeout != 0, the
WARN_ON_ONCE(bo->tbo.mem.mem_type == TTM_PL_SYSTEM);
at the top of amdgpu_bo_gpu_offset has been triggering since the 4.15
development cycle. See the bisection result below. Note that I'm not
100% sure this is the correct guilty commit, since it's probably been
the most painful bisection I've ever done so far (14 skips, had to
revert 4 commits causing other regressions). But I'm quite sure this
regression happened in the
84d43463a2d09c28c9222fbb7d1082c078e2523a..3f3333f8a0e90ac26f84ed7b0aa344efce695c08
range.
3f3333f8a0e90ac26f84ed7b0aa344efce695c08 is the first bad commit
commit 3f3333f8a0e90ac26f84ed7b0aa344efce695c08
Author: Christian König <christian.koe...@amd.com>
Date: Thu Aug 3 14:02:13 2017 +0200
drm/amdgpu: track evicted page tables v2
Instead of validating all page tables when one was evicted,
track which one needs a validation.
v2: simplify amdgpu_vm_ready as well
Signed-off-by: Christian König <christian.koe...@amd.com>
Reviewed-by: Alex Deucher <alexander.deuc...@amd.com> (v1)
Reviewed-by: Chunming Zhou <david1.z...@amd.com>
>From 769eb5d08910e16058902fada499d21655ccdba4 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Christian=20K=C3=B6nig?= <christian.koe...@amd.com>
Date: Tue, 19 Dec 2017 18:44:43 +0100
Subject: [PATCH] drm/amdgpu: fix test for shadow page tables
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
They don't work 100% correctly at the moment.
Signed-off-by: Christian König <christian.koe...@amd.com>
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 3c87b7f902c9..9fac284315cb 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2833,7 +2833,11 @@ bool amdgpu_need_backup(struct amdgpu_device *adev)
if (adev->flags & AMD_IS_APU)
return false;
- return amdgpu_gpu_recovery;
+ if (amdgpu_gpu_recovery == 0 ||
+ (amdgpu_gpu_recovery == -1 && !amdgpu_sriov_vf(adev)))
+ return false;
+
+ return true;
}
static int amdgpu_recover_vram_from_shadow(struct amdgpu_device *adev,
--
2.11.0
_______________________________________________
amd-gfx mailing list
amd-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/amd-gfx