Re: [PATCH 1/1] drm/amdkfd: Don't touch the hardware in pre_reset callback

2019-12-19 Thread Felix Kuehling
You should be able to save the plain text email and pass that to "git am". It's trivial with Thunderbird on a Linux system. If you're using outlook, I'm not sure. Anyway, I'm already reworking the patch based on Shaoyun's suggestion and some ideas it gave me. Regards,   Felix On 2019-12-19

[PATCH 3/5] drm/amdgpu: GPU TLB flush API moved to amdgpu_amdkfd

2019-12-19 Thread Alex Sierra
[Why] TLB flush method has been deprecated using kfd2kgd interface. This implementation is now on the amdgpu_amdkfd API. [How] TLB flush functions now implemented in amdgpu_amdkfd. Change-Id: Ic51cccdfe6e71288d78da772b6e1b6ced72f8ef7 Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdgpu/amd

[PATCH 2/5] drm/amdgpu: export function to flush TLB via pasid

2019-12-19 Thread Alex Sierra
This can be used directly from amdgpu and amdkfd to invalidate TLB through pasid. It supports gmc v7, v8, v9 and v10. Change-Id: I6563a8eba2e42d1a67fa2547156c20da41d1e490 Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.h | 6 ++ drivers/gpu/drm/amd/amdgpu/gmc_v10_0.c | 81

[PATCH 5/5] drm/amdgpu: invalidate BO during userptr mapping

2019-12-19 Thread Alex Sierra
This is required for HMM functionality only on GFXv9 GPU, which supports recoverable page faults. [Why] Instead of stopping all user mode queues during a userptr mapping. The GFXv9 recoverable page fault is used to revalidate userptr mappings. Now, this will be done on the page fault handler. [Ho

[PATCH 4/5] drm/amdgpu: flush TLB functions removal from kfd2kgd interface

2019-12-19 Thread Alex Sierra
[Why] kfd2kgd interface will be deprecated. This removal only covers TLB invalidation for now. They have been replaced in amdgpu_amdkfd API. [How] TLB invalidate functions removed from the different amdkfd_gfx_v* versions. Change-Id: Ic2c7d4a0d19fe1e884dee1ff10a520d31252afee Signed-off-by: Alex S

[PATCH 1/5] drm/amdgpu: Avoid reclaim fs while eviction lock

2019-12-19 Thread Alex Sierra
[Why] Avoid reclaim filesystem while eviction lock is held called from MMU notifier. [How] Setting PF_MEMALLOC_NOFS flags while eviction mutex is locked. Using memalloc_nofs_save / memalloc_nofs_restore API. Change-Id: I5531c9337836e7d4a430df3f16dcc82888e8018c Signed-off-by: Alex Sierra --- dri

RE: [PATCH 1/1] drm/amdkfd: Don't touch the hardware in pre_reset callback

2019-12-19 Thread Liu, Monk
Hi Felix Do you know how I can get a "xxx.patch" file from the email from you ?? _ Monk Liu|GPU Virtualization Team |AMD -Original Message- From: Kuehling, Felix Sent: Friday, December 20, 2019 10:09 AM To: amd-gfx@lists.freedesktop.org Cc: Liu, Sha

Re: [PATCH 1/1] drm/amdkfd: Don't touch the hardware in pre_reset callback

2019-12-19 Thread Felix Kuehling
On 2019-12-19 21:34, Liu, Shaoyun wrote: Will it looks cleaner if we keep a pre_reset flag in per device structure and check it in the function when talk to hw? I was briefly considering that when I saw how many function needed that pre_reset flag. But this could lead to race conditions with o

RE: [PATCwH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

2019-12-19 Thread Liu, Monk
>>>. For kiq, there is no return for WREG3 We can make amdgpu_virt_kiq_wreg() return a value if really needed, e.g.: return if this write success _ Monk Liu|GPU Virtualization Team |AMD [sig-cloud-gpu] From: Liu, Shaoyun Sent: Friday, December 20, 2019 12:59

Re: [PATCH 1/1] drm/amdkfd: Don't touch the hardware in pre_reset callback

2019-12-19 Thread Liu, Shaoyun
Will it looks cleaner if we keep a pre_reset flag in per device structure and check it in the function when talk to hw? Regards Shaoyun.liu From: Kuehling, Felix Sent: December 19, 2019 9:21:08 PM To: amd-gfx@lists.freedesktop.org ; Liu, Monk ; Liu, Shaoyun ; Gr

Re: [PATCH 1/1] drm/amdkfd: Don't touch the hardware in pre_reset callback

2019-12-19 Thread Felix Kuehling
[+Andrey] Hi Shaoyun, Monk, and Andrey, I tested this on my bare-metal Vega10 system. GPU reset (using BACO) is flaky on this system with and without this patch. The first reset seems to work OK, the second one fails in different ways. In theory this change should be an improvement as it elim

[PATCH 1/1] drm/amdkfd: Don't touch the hardware in pre_reset callback

2019-12-19 Thread Felix Kuehling
The intention of the pre_reset callback is to update the driver state to reflect that all user mode queues are preempted and the HIQ is destroyed. However we should not actually preempt any queues or otherwise touch the hardware because it is presumably hanging. The impending reset will take care o

Re: [PATCH 1/2] drm/amdgpu: update the method to get fb_loc of memory training(V4)

2019-12-19 Thread Yin, Tianci (Rico)
[AMD Official Use Only - Internal Distribution Only] Hi Luben, May I have your Review-by? Thanks a lot! Rico From: Tuikov, Luben Sent: Friday, December 20, 2019 3:47 To: Yin, Tianci (Rico) ; amd-gfx@lists.freedesktop.org Cc: Koenig, Christian ; Deucher, Alexan

Re: [PATCH] drm/dp_mst: clear time slots for ports invalid

2019-12-19 Thread Lin, Wayne
[AMD Official Use Only - Internal Distribution Only] Pinged. Hi, can someone help to review please. Thanks a lot. Regards, Wayne From: Wayne Lin Sent: Friday, December 6, 2019 16:39 To: dri-de...@lists.freedesktop.org; amd-gfx@lists.freedesktop.org Cc:

Re: [PATCwH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

2019-12-19 Thread Felix Kuehling
I can prepare a patch, but I can't test it thoroughly. It would need to be tested on bare-metal and SRIOV. Regards,   Felix On 2019-12-19 18:25, shaoyunl wrote: How we prevent the  user queue from submitting on the  following FLR  if we didn't unmap the  user queues . It's possible that CP sti

Re: [PATCwH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

2019-12-19 Thread shaoyunl
How we prevent the  user queue from submitting on the  following FLR  if we didn't unmap the  user queues . It's possible that CP still not hang when other part HW get hang and  need a reset . Om, but probably it's ok since after FLR , all the hqd will be reset to unmapped by default by HW  an

Re: [PATCwH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

2019-12-19 Thread Felix Kuehling
I'm thinking, if we know we're preparing for a GPU reset, maybe we shouldn't even try to suspend processes and stop the HIQ. kfd_suspend_all_processes, stop_cpsch and other functions up that call chain up to kgd2kfd_suspend could have a parameter (bool pre_reset) that would update the driver st

Re: [PATCH] drm/amdgpu: attempt xgmi perfmon re-arm on failed arm

2019-12-19 Thread Felix Kuehling
On 2019-12-19 4:30 p.m., Jonathan Kim wrote: The DF routines to arm xGMI performance will attempt to re-arm both on performance monitoring start and read on initial failure to arm. v3: Addressing nit-picks. v2: Roll back reset_perfmon_cntr to void return since new perfmon counters are now safe

RE: [PATCH] drm/amdgpu: attempt xgmi perfmon re-arm on failed arm

2019-12-19 Thread Kim, Jonathan
[AMD Official Use Only - Internal Distribution Only] -Original Message- From: Kim, Jonathan Sent: Thursday, December 19, 2019 4:31 PM To: amd-gfx@lists.freedesktop.org Cc: felix.khuel...@amd.com; Kim, Jonathan ; Kim, Jonathan Subject: [PATCH] drm/amdgpu: attempt xgmi perfmon re-arm o

[PATCH] drm/amdgpu: attempt xgmi perfmon re-arm on failed arm

2019-12-19 Thread Jonathan Kim
The DF routines to arm xGMI performance will attempt to re-arm both on performance monitoring start and read on initial failure to arm. v3: Addressing nit-picks. v2: Roll back reset_perfmon_cntr to void return since new perfmon counters are now safe to write to during DF C-States. Do single perf

Re: [PATCH 1/2] drm/amdgpu: update the method to get fb_loc of memory training(V4)

2019-12-19 Thread Luben Tuikov
Yep! That's perfect--good job! Regards, Luben On 2019-12-19 04:16, Tianci Yin wrote: > From: "Tianci.Yin" > > The method of getting fb_loc changed from parsing VBIOS to > taking certain offset from top of VRAM > > Change-Id: I053b42fdb1d822722fa7980b2cd9f86b3fdce539 > Signed-off-by: Tianci.Yin

Re: [PATCH] drm/amd/display: replace BUG_ON with WARN_ON

2019-12-19 Thread Aditya Pakki
On 12/19/19 10:29 AM, Mikita Lipski wrote: > > > On 12/18/19 11:15 AM, Aditya Pakki wrote: >> In skip_modeset label within dm_update_crtc_state(), the dc stream >> cannot be NULL. Using BUG_ON as an assertion is not required and >> can be removed. The patch replaces the check with a WARN_ON in ca

Re: [PATCwH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

2019-12-19 Thread shaoyunl
After check the code , in KFD side , should be simple just add the check in stop_cpsch code . For kiq, there is no return for WREG32 , so no easy way to check the return value . Maybe we can add kiq_status in struct amdgpu_kiq  to indicate the kiq is hang or not ,  in hdq_destroy function check

Re: [PATCH] drm/amd/display: replace BUG_ON with WARN_ON

2019-12-19 Thread Mikita Lipski
On 12/18/19 11:15 AM, Aditya Pakki wrote: In skip_modeset label within dm_update_crtc_state(), the dc stream cannot be NULL. Using BUG_ON as an assertion is not required and can be removed. The patch replaces the check with a WARN_ON in case dm_new_crtc_state->stream is NULL. Signed-off-by: A

Re: [PATCwH 2/2] drm/amdgpu: fix KIQ ring test fail in TDR of SRIOV

2019-12-19 Thread Liu, Shaoyun
I see, thanks for the detail information. Normally when CP is hang, the hiq access to unmap the queue will failed before driver call to the hqd_destroy. I think driver should add the code to check the return value and directly finish the pre_reset in this case . If the hiq does not hang but kiq

Re: [PATCH] drm/amdgpu: always reset asic when going into suspend

2019-12-19 Thread Alex Deucher
On Mon, Dec 16, 2019 at 4:00 AM Daniel Drake wrote: > > Hi Alex, > > On Mon, Nov 25, 2019 at 1:17 PM Daniel Drake wrote: > > Unfortunately not. The original issue still exists (dead gfx after > > resume from s2idle) and also when I trigger execution of the suspend > > or runtime suspend routines

RE: [PATCH] drm/amdgpu/vcn2.5: Silence a compiler warning

2019-12-19 Thread Liu, Leo
Reviewed-by: Leo Liu -Original Message- From: amd-gfx On Behalf Of Alex Deucher Sent: December 18, 2019 11:52 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander Subject: [PATCH] drm/amdgpu/vcn2.5: Silence a compiler warning Set r to 0 as a default value. Signed-off-by: Alex D

Re: [PATCH 1/1] drm/amdgpu: fix ctx init failure for asics without gfx ring

2019-12-19 Thread Nirmoy
Reviewed-by: Nirmoy Das On 12/19/19 12:42 PM, Le Ma wrote: This workaround does not affect other asics because amdgpu only need expose one gfx sched to user for now. Change-Id: Ica92b8565a89899aebe0eba7b2b5a25159b411d3 Signed-off-by: Le Ma --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 3 ++-

[PATCH 1/1] drm/amdgpu: fix ctx init failure for asics without gfx ring

2019-12-19 Thread Le Ma
This workaround does not affect other asics because amdgpu only need expose one gfx sched to user for now. Change-Id: Ica92b8565a89899aebe0eba7b2b5a25159b411d3 Signed-off-by: Le Ma --- drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/d

[PATCH 2/2] drm/amdgpu: add psp session ID get interface for sriov

2019-12-19 Thread Frank . Min
on sriov, psp vf ring running depends on interrupt, so have to move the xgmi TA loading after IH hw init. Change-Id: Ieffb3a94107c437f54abc0c41238c6f40274b35d Signed-off-by: Frank.Min --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 4 ++-- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.h | 2 +- driver

[PATCH 1/2] drm/amdgpu: remove FB location config for sriov

2019-12-19 Thread Frank . Min
FB location is already programmed by HV driver for arcutus so remove this part Change-Id: Ia357ae716bfc3084a4dd277ade219e57092f9b42 Signed-off-by: Frank.Min --- drivers/gpu/drm/amd/amdgpu/gfxhub_v1_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/mmhub_v9_4.c | 16 2 files changed, 1 in

Re: [PATCH 1/2] drm/amdgpu: update the method to get fb_loc of memory training(V3)

2019-12-19 Thread Yin, Tianci (Rico)
[AMD Official Use Only - Internal Distribution Only] Hi Luben, What a brilliant thought! Concise and easy for eyes! Thanks so much! Rico From: Tuikov, Luben Sent: Thursday, December 19, 2019 11:10 To: Yin, Tianci (Rico) ; amd-gfx@lists.freedesktop.org Cc: Koe

[PATCH 1/2] drm/amdgpu: update the method to get fb_loc of memory training(V4)

2019-12-19 Thread Tianci Yin
From: "Tianci.Yin" The method of getting fb_loc changed from parsing VBIOS to taking certain offset from top of VRAM Change-Id: I053b42fdb1d822722fa7980b2cd9f86b3fdce539 Signed-off-by: Tianci.Yin --- drivers/gpu/drm/amd/amdgpu/amdgpu.h | 3 +- drivers/gpu/drm/amd/amdgpu/amdgpu_atomb

[PATCH 2/2] drm/amdgpu: remove memory training p2c buffer reservation(V2)

2019-12-19 Thread Tianci Yin
From: "Tianci.Yin" IP discovery TMR(occupied the top VRAM with size DISCOVERY_TMR_SIZE) has been reserved, and the p2c buffer is in the range of this TMR, so the p2c buffer reservation is unnecessary. Change-Id: Ib1f2f2b4a1f3869c03ffe22e2836cdbee17ba99f Reviewed-by: Kevin Wang Reviewed-by: Xiao