On 2017-05-08 11:51 PM, Liu, Monk wrote:
Can you explain your reasoning behind your current position that the KIQ
shouldn't be used by baremetal amdgpu?
[ML] I didn't mean KIQ shouldn't leveraged by bare-metal, instead how it is
used by bare-metal is none of my interest ...
I mean it better
>Can you explain your reasoning behind your current position that the KIQ
>shouldn't be used by baremetal amdgpu?
[ML] I didn't mean KIQ shouldn't leveraged by bare-metal, instead how it is
used by bare-metal is none of my interest ...
I mean it better not be used under SR-IOV case by other cli
On 2017年05月05日 22:27, Alex Deucher wrote:
Need to use the atomfirmware interface rather than atombios since
soc15 is atomfirmware based.
Signed-off-by: Alex Deucher
The series is Reviewed-by: Chunming Zhou
---
drivers/gpu/drm/amd/amdgpu/soc15.c | 6 +++---
1 file changed, 3 insertions(+
On 2017年05月06日 06:57, Felix Kuehling wrote:
We ran into a similar problem when we played with priorities on KFD
queues. You can't change an MQD of a currently mapped queue. To change a
queue priority we need to unmap it, update the MQD, and then map it again.
I wonder if there is similar requi
>
> - /* block scheduler */
> - for (i = 0; i < AMDGPU_MAX_RINGS; ++i) {
> - ring = adev->rings[i];
> + /* we start from the ring trigger GPU hang */
> + j = job ? job->ring->idx : 0;
> +
> + if (job)
> + if (amd_sched_invalidate_job(&job->base, amdgpu
On 05/08/2017 10:32 AM, Alex Xie wrote:
> Signed-off-by: Alex Xie
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> ind
On 4 May 2017 at 18:16, Chris Wilson wrote:
> On Wed, Apr 26, 2017 at 01:28:29PM +1000, Dave Airlie wrote:
>> +#include
>
> I wonder if Daniel has already split everything used here into its own
> headers?
not sure, if drm_file is out there yet. I'll find out when I rebase
this onto something ne
On 09/05/17 10:26 AM, Alex Xie wrote:
> Signed-off-by: Alex Xie
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8
> 1 file changed, 4 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 2
Hi Michel,
The code change has been submitted into our internal git server.
I have a follow up commit in another email thread.
The commit fixes more errors in comments.
Thanks,
Alex Bin
From: Michel Dänzer
Sent: Monday, May 8, 2017 9:13 PM
To: Xie, AlexBin
Signed-off-by: Alex Xie
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 8
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 2704f88..480f3cd 100644
--- a/drivers/gpu/drm/amd/amdgp
On 09/05/17 12:32 AM, Alex Xie wrote:
> Signed-off-by: Alex Xie
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++
> 1 file changed, 3 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
> index 66
On 2017-05-08 02:32 PM, Alex Deucher wrote:
On Fri, May 5, 2017 at 10:27 AM, Alex Deucher wrote:
Update the scratch reg for when the engine is hung.
Signed-off-by: Alex Deucher
ping on this series.
I'm not an expert on this and haven't had a chance to look up the
atomfirmware definition
On 2017-05-08 03:07 PM, Dave Airlie wrote:
On 9 May 2017 at 04:54, Harry Wentland wrote:
Hi Daniel,
Thanks for taking the time to look at DC.
I had a couple more questions/comments in regard to the patch you posted on
IRC: http://paste.debian.net/plain/930704
My impression is that this ite
On 9 May 2017 at 04:54, Harry Wentland wrote:
> Hi Daniel,
>
> Thanks for taking the time to look at DC.
>
> I had a couple more questions/comments in regard to the patch you posted on
> IRC: http://paste.debian.net/plain/930704
>
> My impression is that this item is the most important next step f
Hi Daniel,
Thanks for taking the time to look at DC.
I had a couple more questions/comments in regard to the patch you posted
on IRC: http://paste.debian.net/plain/930704
My impression is that this item is the most important next step for us:
From a quick glance I think what we want ins
On Mon, May 8, 2017 at 9:25 AM, Christian König wrote:
> From: Christian König
>
> This kind of reset handling was removed a long time ago.
>
> Signed-off-by: Christian König
Reviewed-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 45
> -
>
On Fri, May 5, 2017 at 10:27 AM, Alex Deucher wrote:
> Update the scratch reg for when the engine is hung.
>
> Signed-off-by: Alex Deucher
ping on this series.
Alex
> ---
> drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.c | 13 +
> drivers/gpu/drm/amd/amdgpu/amdgpu_atomfirmware.h
On Wed, Apr 5, 2017 at 9:01 AM, Nath, Arindam wrote:
>
> >-Original Message-
> >From: Daniel Drake [mailto:dr...@endlessm.com]
> >Sent: Thursday, March 30, 2017 7:15 PM
> >To: Nath, Arindam
> >Cc: j...@8bytes.org; Deucher, Alexander; Bridgman, John; amd-
> >g...@lists.freedesktop.org; io..
Local variable use_doorbell is assigned to a constant value and it is never
updated again. Remove this variable and the dead code it guards.
Addresses-Coverity-ID: 1401828
Signed-off-by: Gustavo A. R. Silva
---
drivers/gpu/drm/amd/amdgpu/gfx_v8_0.c | 53 +--
1 fil
> -Original Message-
> From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf
> Of Alex Xie
> Sent: Monday, May 08, 2017 11:32 AM
> To: amd-gfx@lists.freedesktop.org
> Cc: Xie, AlexBin
> Subject: [PATCH] drm/amdgpu: fix errors in comments.
>
> Signed-off-by: Alex Xie
Re
Local variable use_doorbell is assigned to a constant value and it is never
updated again. Remove this variable and the dead code it guards.
Addresses-Coverity-ID: 1401837
Signed-off-by: Gustavo A. R. Silva
---
drivers/gpu/drm/amd/amdgpu/gfx_v7_0.c | 20 ++--
1 file changed, 6 in
Unfortunately, further testing shows that this doesn't actually fix the
problem. FWIW, that test runs very reliably on SI with the radeon drm,
but with the amdgpu drm it fails. VI is fine on amdgpu, which is why I
was sent down this road.
Anyway, back to trying to figure this out :/
Cheers,
N
On 2017-05-08 02:08 AM, Liu, Monk wrote:
> Andres
>
> Some previous patches like move KIQ mutex-lock from amdgpu_virt to common
> place jumped my NAK, but from technique perspective it's no matter anyway,
> But this patch and the following patches are go to a dead end,
>
> 1, Don't use KIQ
Signed-off-by: Alex Xie
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 7 +++
1 file changed, 3 insertions(+), 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
index 66bb60e..aab3206 100644
--- a/drivers/gpu/drm/amd/amdgpu
From: Christian König
This kind of reset handling was removed a long time ago.
Signed-off-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_gem.c | 45 -
1 file changed, 11 insertions(+), 34 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ge
Am 08.05.2017 um 09:01 schrieb Liu, Monk:
@Christian
This one is changed to guilty job scheme accordingly with your response
BR Monk
-Original Message-
From: Monk Liu [mailto:monk@amd.com]
Sent: Monday, May 08, 2017 3:00 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk
Subject:
Because we can always rely on TDR and HYPERVISOR to detect GPU hang and
resubmit malicious jobs or even kick them out later,
and the gpu reset will eventually be invoked, so there is no reason to manually
and voluntarily call gpu reset under SRIOV case.
Well there is a rather good reason, we det
The VM fault interrupt or illegal instruction will be delivered to GPU no
matter it's SR-IOV or bare-metal case,
And I removed them from invoking GPU reset is due to the same reason:
Don't trigger gpu reset for sriov case if possible, always beware that trigger
GPU reset under SR-IOV is a heavy
Sounds good, but what do we do with the amdgpu_irq_reset_work_func?
Please note that I find that calling amdgpu_gpu_reset() here is a bad
idea in the first place.
Instead we should consider the scheduler as faulting and let the
scheduler handle that as in the same way as a job timeout.
But
Am 08.05.2017 um 11:28 schrieb Monk Liu:
Change-Id: Ie9730852da54ceb8b4c2c44acac2df3556a32d17
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
b/drivers
From: Nicolai Hähnle
Bring the code in line with what the radeon module does.
Without this change, the fence following the IB may be signalled
to the CPU even though some data written by shaders may not have
been written back yet.
This change fixes the OpenGL CTS test
GL45-CTS.gtf32.GL3Tests.pa
Change-Id: Ie9730852da54ceb8b4c2c44acac2df3556a32d17
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c | 9 +++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_fence.c
index fe17
I agree with disabling debugfs for amdgpu_reset when SRIOV detected.
-Original Message-
From: Christian König [mailto:deathsim...@vodafone.de]
Sent: Monday, May 08, 2017 5:20 PM
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 1/4] drm/amdgpu:don't invoke srio-gpu-reset
You know that gpu reset under SR-IOV will have very big impact on all other VFs
...
Mhm, good argument. But in this case we need to give at least some
warning message instead of doing nothing.
Or even better disable creating the amdgpu_reste debugfs file
altogether. This way nobody will wonde
yeah my mistake, thanks for catch
-Original Message-
From: Christian König [mailto:deathsim...@vodafone.de]
Sent: Monday, May 08, 2017 5:11 PM
To: Liu, Monk ; amd-gfx@lists.freedesktop.org
Subject: Re: [PATCH 3/4] drm/amdgpu:only call flr_work under infinite timeout
Am 08.05.2017 um 08:
Am 08.05.2017 um 08:51 schrieb Monk Liu:
Change-Id: I541aa5109f4fcab06ece4761a09dc7e053ec6837
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c | 15 +--
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/mxgpu_vi.c
b/drivers/
For SR-IOV use case, we call gpu reset under the case we have no choice ...
So many places like debug fs shouldn't a good reason to trigger gpu reset
You know that gpu reset under SR-IOV will have very big impact on all other VFs
...
BR Monk
-Original Message-
From: Christian König [ma
Am 08.05.2017 um 08:51 schrieb Monk Liu:
that way we can know which job cause hang and
can do per sched reset/recovery instead of all
sched.
Change-Id: Ifc98cd74b2d93823c489de6a89087ba188957eff
Signed-off-by: Monk Liu
Reviewed-by: Christian König
---
drivers/gpu/drm/amd/amdgpu/amdgpu_dev
Am 08.05.2017 um 08:51 schrieb Monk Liu:
because we don't want to do sriov-gpu-reset under certain
cases, so just split those two funtion and don't invoke
sr-iov one from bare-metal one.
Change-Id: I641126c241e2ee2dfd54e6d16c389b159f99cfe0
Signed-off-by: Monk Liu
---
drivers/gpu/drm/amd/amdgp
@Christian
This one is changed to guilty job scheme accordingly with your response
BR Monk
-Original Message-
From: Monk Liu [mailto:monk@amd.com]
Sent: Monday, May 08, 2017 3:00 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk
Subject: [PATCH] drm/amdgpu/SRIOV:implement guilty
Sorry , drop this one, this one doesn't remove debug code
Send another one after cleanups.
-Original Message-
From: amd-gfx [mailto:amd-gfx-boun...@lists.freedesktop.org] On Behalf Of Monk
Liu
Sent: Monday, May 08, 2017 2:51 PM
To: amd-gfx@lists.freedesktop.org
Cc: Liu, Monk
Subject: [
1,TDR will kickout guilty job if it hang exceed the threshold
of the given one from kernel paramter "job_hang_limit", that
way a bad command stream will not infinitly cause GPU hang.
by default this threshold is 1 so a job will be kicked out
after it hang.
2,if a job timeout TDR routine will not
42 matches
Mail list logo