[PATCH 16/18] drm/amdgpu: increate mailbox polling timeout to 12s.

2017-09-17 Thread Monk Liu
From: Horace Chen Because there may have multiple FLR waiting for done, the waiting time of events may be long, add the time to 12s to reduce timeout failure. Change-Id: I6b33170ba7dedf781b99ba6095127efce403af81 Signed-off-by: Horace Chen --- drivers/gpu/drm/amd/amdgpu/mxgpu_ai.h | 2 +- drive

[PATCH 17/18] drm/amdgpu:fix uvd ring fini routine

2017-09-17 Thread Monk Liu
fix missing finish uvd enc_ring and wrongly finish uvd ring Change-Id: Ib74237ca5adcb3b128c9b751fced0b7db7b09e86 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_uvd.c | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_

[PATCH 08/18] drm/amdgpu:halt when vm fault

2017-09-17 Thread Monk Liu
only with this way we can debug the VMC page fault issue Change-Id: Ifc8373c3c3c40d54ae94dedf1be74d6314faeb10 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 6 ++ drivers/gpu/drm/amd/amdgpu/mmhub_v1_0.c | 7 +++ 2 files changed, 13 insertions(+) diff --git a/dri

[PATCH 14/18] drm/amdgpu: Fix amdgpu reload failure under SRIOV

2017-09-17 Thread Monk Liu
From: Horace Chen Kernel will set the PCI power state to UNKNOWN after unloading, Since SRIOV has faked PCI config space so the UNKNOWN state will be kept forever. In driver reload if the power state is UNKNOWN then enabling msi will fail. forcely set it to D0 for SRIOV to fix this kernel flawn

[PATCH 11/18] drm/amdgpu:add vgt_flush for gfx9

2017-09-17 Thread Monk Liu
Change-Id: I584572cfb9145ee1b8d11d69ba2989bd6acfd706 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 ++ 1 file changed, 14 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c index 3306667..f201510 100644 -

[PATCH 07/18] drm/amdgpu:add hdp golden setting register name hint

2017-09-17 Thread Monk Liu
Change-Id: I3a43901f5757b9fab629824a74ad9a4770a47b38 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 7c

[PATCH 03/18] drm/amdgpu/sriov:move in_reset to adev and rename

2017-09-17 Thread Monk Liu
currently in_reset is only used in sriov gpu reset, and it will be used for other non-gfx hw component later, like PSP, so move it from gfx to adev and rename to in_sriov_reset make more sense. Change-Id: Ibb8546f6e4635a1cca740e57f6244f158c70a1e6 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/a

[PATCH 00/18] *** misc patches for SRIOV ***

2017-09-17 Thread Monk Liu
found a lot of patches missed in 4.12 staging Horace Chen (2): drm/amdgpu: Fix amdgpu reload failure under SRIOV drm/amdgpu: increate mailbox polling timeout to 12s. Monk Liu (16): drm/amdgpu/sriov:fix missing error handling drm/amdgpu:no kiq in IH drm/amdgpu/sriov:move in_reset to adev

[PATCH 18/18] drm/amdgpu/sriov:init csb for gfxv9

2017-09-17 Thread Monk Liu
RLC need CSB registers initiated under SRIOV during world switch otherwise the clear state buffer behav will not be recovered to current VF scheme after switch back Change-Id: I3afd82875564c233060b740724bd8031095780f6 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 4 +++- 1

[PATCH 13/18] drm/amdgpu:fix driver unloading bug

2017-09-17 Thread Monk Liu
[SWDEV-126631] - fix hypervisor save_vf fail that occured after driver removed: 1. Because the KIQ and KCQ were not ummapped, save_vf will fail if driver freed mqd of KIQ and KCQ. 2. KIQ can't be unmapped since RLCV always need it, the bo_free on KIQ should be skipped 3. KCQ can be unmapped, and

[PATCH 10/18] drm/amdgpu:hdp flush should be put it initialized

2017-09-17 Thread Monk Liu
Change-Id: I635271ba4c89189017daa302a7fe5cd65c3eef06 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c b/drivers/gpu/drm/amd/amdgpu/gmc_v9_0.c index 7a20ba8..3d0

[PATCH 12/18] drm/amdgpu:use formal register to trigger hdp invalidate

2017-09-17 Thread Monk Liu
Change-Id: I61dc02ea6a450f9acfa3bae07aa20244261f5369 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/sdma_v4_0.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c b/drivers/gpu/drm/am

[PATCH 15/18] drm/amdgpu/sriov: fix page fault issue of driver unload

2017-09-17 Thread Monk Liu
bo_free on csa is too late to put in amdgpu_fini because that time ttm is already finished, Move it earlier to avoid the page fault. Change-Id: Id9c3f6aa8720cabbc9936ce21d8cf98af6e23bee Signed-off-by: Monk Liu Signed-off-by: Horace Chen --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 4 +--- d

[PATCH 09/18] drm/amdgpu:insert TMZ_BEGIN

2017-09-17 Thread Monk Liu
FRAME_CONTROL(begin) is needed for vega10 due to ucode logic change, it can fix some CTS random fail under gfx preemption enabled mode. Change-Id: I0442337f6cde13ed2a33f033badcb522e0f35e2d Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c | 14 -- 1 file changed, 8 in

[PATCH 01/18] drm/amdgpu/sriov:fix missing error handling

2017-09-17 Thread Monk Liu
Change-Id: Ifc6942ed0221f3134bfba4d66fde743484191da3 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c index e390c01..d1ac27

[PATCH 06/18] drm/amdgpu/sriov:fix memory leak after gpu reset

2017-09-17 Thread Monk Liu
doing gpu reset will rerun all hw_init and thus ucode_init_bo is invoked again, so we need to skip the fw_buf allocation during sriov gpu reset to avoid memory leak. Change-Id: I31131eda1bd45ea2f5bdc50c5da5fc5a9fe9027d Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 ++

[PATCH 04/18] drm/amdgpu/sriov:don't load psp fw during gpu reset

2017-09-17 Thread Monk Liu
At least for SRIOV we found reload PSP fw during gpu reset cause PSP hang. Change-Id: I5f273187a10bb8571b77651dfba7656ce0429af0 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 15 +-- 1 file changed, 9 insertions(+), 6 deletions(-) diff --git a/drivers/gpu/drm/

[PATCH 05/18] drm/amdgpu:make ctx_add_fence interruptible

2017-09-17 Thread Monk Liu
otherwise a gpu hang will make application couldn't be killed Change-Id: I6051b5b3ae1188983f49325a2438c84a6c12374a Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c | 12 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 1

[PATCH 02/18] drm/amdgpu:no kiq in IH

2017-09-17 Thread Monk Liu
Change-Id: I4deb65675d2531236b2f4e2bc6f015c657546464 Signed-off-by: Monk Liu --- drivers/gpu/drm/amd/amdgpu/vega10_ih.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/vega10_ih.c b/drivers/gpu/drm/amd/amdgpu/vega10_ih.c index 67610f7..c291e33 1

Re: [PATCH 1/1] amdgpu: move asic id table to a separate file

2017-09-17 Thread Zhang, Jerry (Junwei)
looks fine to me, feel free to add my RB. Reviewed-by: Junwei Zhang BTW, we also has 1 or 2 patch to improve the name parsing. Please also take a look. Jerry On 05/11/2017 05:10 AM, Li, Samuel wrote: Also attach a sample ids file for reference. The names are from marketing, not related to so

Re: [PATCH] drm/amdgpu/psp: declare raven psp firmware

2017-09-17 Thread Zhang, Jerry (Junwei)
On 09/16/2017 05:37 AM, Alex Deucher wrote: So it gets picked up properly by the kernel. Signed-off-by: Alex Deucher Reviewed-by: Junwei Zhang --- drivers/gpu/drm/amd/amdgpu/psp_v10_0.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/psp_v10_0.c b/driver

Re: [PATCH] drm/amdkfd: check for null dev to avoid a null pointer dereference

2017-09-17 Thread Oded Gabbay
On Fri, Sep 8, 2017 at 5:13 PM, Colin King wrote: > From: Colin Ian King > > The call to kfd_device_by_id can potentially return null, so check that > dev is null and return with -EINVAL to avoid a null pointer dereference. > > Detected by CoverityScan CID#1454629 ("Dereference null return value"

Re: [PATCH 10/11] drm/amdkfd: Print event limit messages only once per process

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:43 AM, Felix Kuehling wrote: > To avoid spamming the log. > > Signed-off-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_events.c | 5 - > drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 1 + > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/dri

Re: [PATCH 09/11] drm/amdkfd: Fix kernel-queue wrapping bugs

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:43 AM, Felix Kuehling wrote: > From: Yong Zhao > > Avoid intermediate negative numbers when doing calculations with a mix > of signed and unsigned variables where implicit conversions can lead > to unexpected results. > > When kernel queue buffer wraps around to 0, we ne

Re: [PATCH 06/11] drm/amdkfd: Use VMID bitmap from KGD

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > From: Yong Zhao > > The hard-coded values related to VMID were removed in KFD, as those > values can be calculated in the KFD initialization function. > > Signed-off-by: Yong Zhao > Signed-off-by: Felix Kuehling > --- > drivers/gpu/drm/a

Re: [PATCH 08/11] drm/amdkfd: Drop _nocpsch suffix from shared functions

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > From: Yong Zhao > > Several functions in DQM are shared between cpsch and nocpsch code. > Remove the misleading _nocpsch suffix from their names. > > Signed-off-by: Yong Zhao > Signed-off-by: Felix Kuehling > --- > .../gpu/drm/amd/amdkfd

Re: [PATCH 05/11] drm/amdkfd: Fix incorrect destroy_mqd parameter

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > When uninitializing a kernel queue. > > Signed-off-by: Yong Zhao > Signed-off-by: Felix Kuehling > --- > drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/gpu/dr

Re: [PATCH 04/11] drm/amdkfd: Adjust dequeue latencies and timeouts

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > Adjust latencies and timeouts for dequeueing with HWS and consolidate > them in one place. Make them longer to allow long running waves to > complete without causing a timeout. The timeout is twice as long as the > latency plus some buffer t

Re: [PATCH 02/11] drm/amdkfd: Fix suspend/resume issue on Carrizo

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > From: Yong Zhao > > When we do suspend/resume through "sudo pm-suspend" while there is > HSA activity running, upon resume we will encounter HWS hanging, which > is caused by memory read/write failures. The root cause is that when > suspend

Re: [PATCH 01/11] drm/amdkfd: Reorganize kfd resume code

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > From: Yong Zhao > > The idea is to let kfd init and resume function share the same code path > as much as possible, rather than to have two copies of almost identical > code. That way improves the code readability and maintainability. > > S

Re: [PATCH 07/11] drm/amdkfd: Reuse CHIP_* from amdgpu

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:42 AM, Felix Kuehling wrote: > From: Yong Zhao > > There are already CHIP_* definitions under amd_shared.h file on amdgpu > side, so KFD should reuse them rather than defining new ones. > > Using enum for asic type requires default cases on switch statements > to prevent

Re: [PATCH 11/11] drm/amdkfd: Set /dev/kfd permissions to 0666 by default

2017-09-17 Thread Oded Gabbay
On Sat, Sep 16, 2017 at 2:43 AM, Felix Kuehling wrote: > From: Andres Rodriguez > > Set the default permissions of /dev/kfd to be more than just root > accessible 600. > I don't think that's acceptable. You need to use udev rules file for that. Oded > Signed-off-by: Andres Rodriguez > Reviewed