RE: Suspecting corrupted VBIOS after update of AMDGPU on AMD7870

2020-01-30 Thread Liu, Zhan
Okay I see. From your attached dmesg.log, issue comes from here: [ 26.265638] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring sdma0 timeout, signaled seq=1, emitted seq=2 [ 26.265764] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process pid 0 thread pid 0 [ 26.265771]

RE: [Patch v1 5/5] drm/amdkfd: refactor runtime pm for baco

2020-01-30 Thread Zeng, Oak
[AMD Official Use Only - Internal Distribution Only] Hi Felix, See one inline comment Regards, Oak -Original Message- From: amd-gfx On Behalf Of Felix Kuehling Sent: Thursday, January 30, 2020 6:24 PM To: Alex Deucher Cc: Deucher, Alexander ; Bhardwaj, Rajneesh ; amd-gfx list Subje

Re: [Patch v1 5/5] drm/amdkfd: refactor runtime pm for baco

2020-01-30 Thread Felix Kuehling
On 2020-01-30 17:11, Alex Deucher wrote: On Thu, Jan 30, 2020 at 4:55 PM Felix Kuehling wrote: On 2020-01-30 14:01, Bhardwaj, Rajneesh wrote: Hello Felix, Thanks for your time to review and for your feedback. On 1/29/2020 5:52 PM, Felix Kuehling wrote: HI Rajneesh, See comments inline ...

RE: [PATCH] drm/amdkfd: Add queue information to sysfs

2020-01-30 Thread Kasiviswanathan, Harish
[AMD Official Use Only - Internal Distribution Only] One minor comment. -Original Message- From: amd-gfx On Behalf Of Amber Lin Sent: Thursday, January 30, 2020 12:46 AM To: amd-gfx@lists.freedesktop.org Cc: Lin, Amber Subject: [PATCH] drm/amdkfd: Add queue information to sysfs Provide

Re: [PATCH] drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode

2020-01-30 Thread Yong Zhao
True. It is a bug too. I am looking into it. Yong On 2020-01-30 5:51 p.m., Felix Kuehling wrote: On 2020-01-30 17:29, Yong Zhao wrote: The sdma_queue_count increment should be done before execute_queues_cpsch(), which calls pm_calc_rlib_size() where sdma_queue_count is used to calculate whethe

Re: [PATCH] drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode

2020-01-30 Thread Felix Kuehling
On 2020-01-30 17:29, Yong Zhao wrote: The sdma_queue_count increment should be done before execute_queues_cpsch(), which calls pm_calc_rlib_size() where sdma_queue_count is used to calculate whether over_subscription is triggered. With the previous code, when a SDMA queue is created, compute_que

Re: [PATCH 5/5] drm/amdgpu: rework synchronization of VM updates v4

2020-01-30 Thread Felix Kuehling
On 2020-01-30 7:49, Christian König wrote: If provided we only sync to the BOs reservation object and no longer to the root PD. v2: update comment, cleanup amdgpu_bo_sync_wait_resv v3: use correct reservation object while clearing v4: fix typo in amdgpu_bo_sync_wait_resv Signed-off-by: Christia

Re: [PATCH 4/5] drm/amdgpu: simplify and fix amdgpu_sync_resv

2020-01-30 Thread Felix Kuehling
On 2020-01-30 7:49, Christian König wrote: No matter what we always need to sync to moves. Signed-off-by: Christian König Tested-by: Tom St Denis Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletion

[PATCH] drm/amdkfd: Fix a bug in SDMA RLC queue counting under HWS mode

2020-01-30 Thread Yong Zhao
The sdma_queue_count increment should be done before execute_queues_cpsch(), which calls pm_calc_rlib_size() where sdma_queue_count is used to calculate whether over_subscription is triggered. With the previous code, when a SDMA queue is created, compute_queue_count in pm_calc_rlib_size() is one m

Re: [PATCH 2/5] drm/amdgpu: return EINVAL instead of ENOENT in the VM code

2020-01-30 Thread Felix Kuehling
On 2020-01-30 7:49, Christian König wrote: That we can't find a PD above the root is expected can only happen if we try to update a larger range than actually managed by the VM. Signed-off-by: Christian König Tested-by: Tom St Denis Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdg

Re: [PATCH 3/5] drm/amdgpu: allow higher level PD invalidations

2020-01-30 Thread Felix Kuehling
On 2020-01-30 7:49, Christian König wrote: Allow partial invalidation on unallocated PDs. This is useful when we need to silence faults to stop interrupt floods on Vega. Signed-off-by: Christian König Tested-by: Tom St Denis I already reviewed this a week ago. With two style nit-picks fixed,

Re: [Patch v1 5/5] drm/amdkfd: refactor runtime pm for baco

2020-01-30 Thread Alex Deucher
On Thu, Jan 30, 2020 at 4:55 PM Felix Kuehling wrote: > > On 2020-01-30 14:01, Bhardwaj, Rajneesh wrote: > > Hello Felix, > > > > Thanks for your time to review and for your feedback. > > > > On 1/29/2020 5:52 PM, Felix Kuehling wrote: > >> HI Rajneesh, > >> > >> See comments inline ... > >> > >>

Re: [PATCH 1/5] drm/amdgpu: fix braces in amdgpu_vm_update_ptes

2020-01-30 Thread Felix Kuehling
On 2020-01-30 7:49, Christian König wrote: For the root PD mask can be 0x as well which would overrun to 0 if we don't cast it before we add one. You're fixing parentheses, not braces. Parentheses: () Brackets: [] Braces: {} With the title fixed, this patch is Reviewed-by: Felix Kueh

Re: [Patch v1 5/5] drm/amdkfd: refactor runtime pm for baco

2020-01-30 Thread Felix Kuehling
On 2020-01-30 14:01, Bhardwaj, Rajneesh wrote: Hello Felix, Thanks for your time to review and for your feedback. On 1/29/2020 5:52 PM, Felix Kuehling wrote: HI Rajneesh, See comments inline ... And a general question: Why do you need to set the autosuspend_delay in so many places? Amdgpu o

Re: [PATCH 3/3] drm/amdgpu/smu_v11_0: Correct behavior of restoring default tables (v2)

2020-01-30 Thread Matt Coffin
It's worth noting here that I don't have a vega20 card to test with, so it might be prudent to get a Tested-by from someone that has access to one. It *should* work since it's so similar to the navi10 code, but it is moderately un-tested. On 1/29/20 11:17 AM, Alex Deucher wrote: > From: Matt Coff

Re: [PATCH 6/6] drm/amd/display: REFERENCE for srm interface patches

2020-01-30 Thread Harry Wentland
Thanks for providing more documentation and this reference. The patch set (1-5) is Reviewed-by: Harry Wentland Harry On 2020-01-22 4:05 p.m., Bhawanpreet Lakha wrote: > This is just a reference for the patches. not to be merged > > Signed-off-by: Bhawanpreet Lakha > --- > REFERENCE | 49

RE: Suspecting corrupted VBIOS after update of AMDGPU on AMD7870

2020-01-30 Thread Liu, Zhan
Hi Jacob, Thant you for your bug reporting. I saw you attached xorg.log, which is great. Could you also grab dmesg.log via SSH? Thanks, Zhan From: amd-gfx On Behalf Of Jacob Hrbek Sent: 2020/January/30, Thursday 12:18 PM To: amd-gfx@lists.freedesktop.org Subject: Suspecting corrupted VBIOS a

Re: [Patch v1 5/5] drm/amdkfd: refactor runtime pm for baco

2020-01-30 Thread Bhardwaj, Rajneesh
On 1/28/2020 3:09 PM, Zeng, Oak wrote: [AMD Official Use Only - Internal Distribution Only] Regards, Oak -Original Message- From: amd-gfx On Behalf Of Rajneesh Bhardwaj Sent: Monday, January 27, 2020 8:29 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Kuehling, Feli

Re: [Patch v1 1/5] drm/amdgpu: always enable runtime power management

2020-01-30 Thread Bhardwaj, Rajneesh
On 1/28/2020 3:14 PM, Alex Deucher wrote: [CAUTION: External Email] On Mon, Jan 27, 2020 at 8:30 PM Rajneesh Bhardwaj wrote: This allows runtime power management to kick in on amdgpu driver when the underlying hardware supports either BOCO or BACO. This can still be avoided if boot arg amdgp

Re: [Patch v1 3/5] drm/amdkfd: Introduce debugfs option to disable baco

2020-01-30 Thread Bhardwaj, Rajneesh
Hi Alex Thanks for your time and feedback! On 1/28/2020 3:22 PM, Alex Deucher wrote: [CAUTION: External Email] On Mon, Jan 27, 2020 at 8:30 PM Rajneesh Bhardwaj wrote: When BACO is enabled by default, sometimes it can cause additional trouble to debug KFD issues. This debugfs override allo

Re: [Patch v1 4/5] drm/amdkfd: show warning when kfd is locked

2020-01-30 Thread Bhardwaj, Rajneesh
On 1/28/2020 5:42 PM, Felix Kuehling wrote: On 2020-01-27 20:29, Rajneesh Bhardwaj wrote: During system suspend the kfd driver aquires a lock that prohibits further kfd actions unless the gpu is resumed. This adds some info which can be useful while debugging. Signed-off-by: Rajneesh Bhardwaj

Re: [Patch v1 5/5] drm/amdkfd: refactor runtime pm for baco

2020-01-30 Thread Bhardwaj, Rajneesh
Hello Felix, Thanks for your time to review and for your feedback. On 1/29/2020 5:52 PM, Felix Kuehling wrote: HI Rajneesh, See comments inline ... And a general question: Why do you need to set the autosuspend_delay in so many places? Amdgpu only has a single call to this function during i

Re: [PATCH] drm/amd/display: Only enable cursor on pipes that need it

2020-01-30 Thread Harry Wentland
On 2020-01-30 1:29 p.m., Nicholas Kazlauskas wrote: > [Why] > In current code we're essentially drawing the cursor on every pipe > that contains it. This only works when the planes have the same > scaling for src to dest rect, otherwise we'll get "double cursor" where > one cursor is incorrectly fi

Re: [PATCH] drm/amdgpu: Fix implicit enum conversion in gfx_v9_4_ras_error_inject

2020-01-30 Thread Alex Deucher
On Thu, Jan 30, 2020 at 3:33 AM Nathan Chancellor wrote: > > Clang warns: > > ../drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c:967:35: warning: implicit > conversion from enumeration type 'enum amdgpu_ras_block' to different > enumeration type 'enum ta_ras_block' [-Wenum-conversion] > block_info.b

[PATCH] drm/amd/display: Only enable cursor on pipes that need it

2020-01-30 Thread Nicholas Kazlauskas
[Why] In current code we're essentially drawing the cursor on every pipe that contains it. This only works when the planes have the same scaling for src to dest rect, otherwise we'll get "double cursor" where one cursor is incorrectly filtered and offset from the real position. [How] Without dedic

[PATCH] drm/amdgpu/vcn: leave all dpg mode unpause in idle work

2020-01-30 Thread James Zhu
Leave all dpg mode unpause in idle work to fix VCN2.* timeout issue during transcoding. Signed-off-by: James Zhu --- drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c | 13 ++--- 1 file changed, 2 insertions(+), 11 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_vcn.c b/drivers/gpu/d

Suspecting corrupted VBIOS after update of AMDGPU on AMD7870

2020-01-30 Thread Jacob Hrbek
*Hello,* I believe that system update that included amdgpu on debian testing (but i am on LFS) corrupted my VBIOS on AMD7870 (+- 4 hours after the update the GPU using AMDGPU/Radeon drivers resulted in no output). i'm sending this email to inform about possible bug with my findings on https://gis

[PATCH 3/5] drm/amdgpu: allow higher level PD invalidations

2020-01-30 Thread Christian König
Allow partial invalidation on unallocated PDs. This is useful when we need to silence faults to stop interrupt floods on Vega. Signed-off-by: Christian König Tested-by: Tom St Denis --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 23 ++- 1 file changed, 18 insertions(+), 5 dele

[PATCH 2/5] drm/amdgpu: return EINVAL instead of ENOENT in the VM code

2020-01-30 Thread Christian König
That we can't find a PD above the root is expected can only happen if we try to update a larger range than actually managed by the VM. Signed-off-by: Christian König Tested-by: Tom St Denis --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --

[PATCH 5/5] drm/amdgpu: rework synchronization of VM updates v4

2020-01-30 Thread Christian König
If provided we only sync to the BOs reservation object and no longer to the root PD. v2: update comment, cleanup amdgpu_bo_sync_wait_resv v3: use correct reservation object while clearing v4: fix typo in amdgpu_bo_sync_wait_resv Signed-off-by: Christian König Tested-by: Tom St Denis --- driver

[PATCH 4/5] drm/amdgpu: simplify and fix amdgpu_sync_resv

2020-01-30 Thread Christian König
No matter what we always need to sync to moves. Signed-off-by: Christian König Tested-by: Tom St Denis --- drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_sync.c b/drivers/gpu/drm/am

[PATCH 1/5] drm/amdgpu: fix braces in amdgpu_vm_update_ptes

2020-01-30 Thread Christian König
For the root PD mask can be 0x as well which would overrun to 0 if we don't cast it before we add one. Signed-off-by: Christian König Tested-by: Tom St Denis --- drivers/gpu/drm/amd/amdgpu/amdgpu_vm.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/am

Re: [PATCH] drm/amd/powerplay: fix navi10 system intermittent reboot issue

2020-01-30 Thread Xu, Feifei
Reviewed-by: Feifei Xu > On Jan 30, 2020, at 16:59, Evan Quan wrote: > > This workaround is needed only for Navi10 12 Gbps SKUs. > > Change-Id: I4bfcb8a8dbff785a159e6a1ed413d93063403ab3 > Signed-off-by: Evan Quan > --- > drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 18 +++ > .../gpu/

Re: [PATCH 1/2] drm/amdgpu: enable GPU reset by default on Navi

2020-01-30 Thread Pierre-Eric Pelloux-Prayer
Hi Alex, I had one issue while testing this patch on a RX5700. After a gfx hang a reset is executed. Switching to a VT and restarting gdm works fine but the clocks seem messed up: - lots of graphical artifcats (underflows?) - pp_dpm_sclk and pp_dpm_socclk have strange values (see attached f

RE: [PATCH] drm/amd/dm/mst: Ignore payload update failures on disable

2020-01-30 Thread Lin, Wayne
[AMD Public Use] Hi Lyude, Thanks for the patch! I'm wondering if this error still occurs with this patch applied https://patchwork.kernel.org/patch/11274363/ I tried to clean up all mgr->proposed_vcpis[] in this patch so drm_dp_update_payload_part1() will skip all invalid ports. However, I'm al

[PATCH] drm/amd/powerplay: fix navi10 system intermittent reboot issue

2020-01-30 Thread Evan Quan
This workaround is needed only for Navi10 12 Gbps SKUs. Change-Id: I4bfcb8a8dbff785a159e6a1ed413d93063403ab3 Signed-off-by: Evan Quan --- drivers/gpu/drm/amd/powerplay/amdgpu_smu.c| 18 +++ .../gpu/drm/amd/powerplay/inc/amdgpu_smu.h| 1 + drivers/gpu/drm/amd/powerplay/inc/smu_types.

[PATCH] drm/amdgpu: Fix implicit enum conversion in gfx_v9_4_ras_error_inject

2020-01-30 Thread Nathan Chancellor
Clang warns: ../drivers/gpu/drm/amd/amdgpu/gfx_v9_4.c:967:35: warning: implicit conversion from enumeration type 'enum amdgpu_ras_block' to different enumeration type 'enum ta_ras_block' [-Wenum-conversion] block_info.block_id = info->head.block; ~ ~~~^~