Re: [PATCH] drm/amdgpu: Move reset domain locking in DPC handler

2022-04-13 Thread Christian König
Am 13.04.22 um 21:31 schrieb Andrey Grodzovsky: Lock reset domain unconditionally because on resume we unlock it unconditionally. This solved mutex deadlock when handling both FATAL and non FATAL PCI errors one after another. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/

Re: [PATCH v2] drm/amdgpu: Fix one use-after-free of VM

2022-04-13 Thread Christian König
Am 14.04.22 um 07:03 schrieb xinhui pan: VM might already be freed when amdgpu_vm_tlb_seq_cb() is called. We see the calltrace below. Fix it by keeping the last flush fence around and wait for it to signal BUG kmalloc-4k (Not tainted): Poison overwritten 0x9c88630414e8-0x9c88630414e8 @

Re: [PATCH] drm/radeon: Add build directory to include path

2022-04-13 Thread Christian König
Am 13.04.22 um 18:14 schrieb Michel Dänzer: From: Michel Dänzer Fixes compile errors with out-of-tree builds, e.g. ../drivers/gpu/drm/radeon/r420.c:38:10: fatal error: r420_reg_safe.h: No such file or directory 38 | #include "r420_reg_safe.h" | ^ Well st

Re: [PATCH v2] drm/amdgpu: Fix one use-after-free of VM

2022-04-13 Thread Paul Menzel
Dear Xinhui, Thank you for rerolling the patch. Am 14.04.22 um 07:03 schrieb xinhui pan: VM might already be freed when amdgpu_vm_tlb_seq_cb() is called. We see the calltrace below. Fix it by keeping the last flush fence around and wait for it to signal Nit: Please add a dot/period to the e

[PATCH v2] drm/amdgpu: Fix one use-after-free of VM

2022-04-13 Thread xinhui pan
VM might already be freed when amdgpu_vm_tlb_seq_cb() is called. We see the calltrace below. Fix it by keeping the last flush fence around and wait for it to signal BUG kmalloc-4k (Not tainted): Poison overwritten 0x9c88630414e8-0x9c88630414e8 @offset=5352. First byte 0x6c instead of 0x6

[pull] amdgpu drm-fixes-5.18

2022-04-13 Thread Alex Deucher
Hi Dave, Daniel, Fixes for 5.18. The following changes since commit 88711fa9a14f6f473f4a7645155ca51386e36c21: Merge tag 'drm-misc-fixes-2022-04-07' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes (2022-04-08 09:22:16 +1000) are available in the Git repository at: https://git

[PATCH] drm/amdgpu: don't runtime suspend if there are displays attached (v2)

2022-04-13 Thread Alex Deucher
We normally runtime suspend when there are displays attached if they are in the DPMS off state, however, if something wakes the GPU we send a hotplug event on resume (in case any displays were connected while the GPU was in suspend) which can cause userspace to light up the displays again soon afte

Re: AMDGPU: regression on 5.17.1

2022-04-13 Thread Michele Ballabio
On Wed, 13 Apr 2022 14:14:42 -0400 Alex Deucher wrote: > On Wed, Apr 13, 2022 at 1:33 PM Michele Ballabio > wrote: > > > > On Mon, 11 Apr 2022 14:34:37 -0400 > > Alex Deucher wrote: > > > > > On Sat, Apr 9, 2022 at 12:28 PM Michele Ballabio > > > wrote: > > > > > > > > On Tue, 5 Apr 2022 1

[PATCH] drm/amdgpu: Move reset domain locking in DPC handler

2022-04-13 Thread Andrey Grodzovsky
Lock reset domain unconditionally because on resume we unlock it unconditionally. This solved mutex deadlock when handling both FATAL and non FATAL PCI errors one after another. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 14 +++--- 1 file changed, 7

Re: AMDGPU: regression on 5.17.1

2022-04-13 Thread Alex Deucher
On Wed, Apr 13, 2022 at 1:33 PM Michele Ballabio wrote: > > On Mon, 11 Apr 2022 14:34:37 -0400 > Alex Deucher wrote: > > > On Sat, Apr 9, 2022 at 12:28 PM Michele Ballabio > > wrote: > > > > > > On Tue, 5 Apr 2022 10:23:16 -0400 > > > Alex Deucher wrote: > > > > > > > On Mon, Apr 4, 2022 at 3:3

RE: [PATCH] drm/amd/amdgpu: Not request init data for MS_HYPERV with vega10

2022-04-13 Thread Michael Kelley (LINUX)
From: Alex Deucher Sent: Tuesday, April 12, 2022 7:13 AM > > On Tue, Apr 12, 2022 at 4:01 AM Paul Menzel wrote: > > > > [Cc: +x86 folks] > > > > Dear Alex, dear x86 folks, > > > > > > x86 folks, can you think of alternatives to access `X86_HYPER_MS_HYPERV` > > from `arch/x86/include/asm/hypervis

Re: AMDGPU: regression on 5.17.1

2022-04-13 Thread Michele Ballabio
On Mon, 11 Apr 2022 14:34:37 -0400 Alex Deucher wrote: > On Sat, Apr 9, 2022 at 12:28 PM Michele Ballabio > wrote: > > > > On Tue, 5 Apr 2022 10:23:16 -0400 > > Alex Deucher wrote: > > > > > On Mon, Apr 4, 2022 at 3:39 PM Michele Ballabio > > > wrote: > > > > > > > > On Mon, 4 Apr 2022 13:

Re: [EXTERNAL] [PATCH 2/2] drm/amdkfd: Add PCIe Hotplug Support for AMDKFD

2022-04-13 Thread Andrey Grodzovsky
On 2022-04-13 12:03, Shuotao Xu wrote: On Apr 11, 2022, at 11:52 PM, Andrey Grodzovsky wrote: [Some people who received this message don't often get email fromandrey.grodzov...@amd.com. Learn why this is important athttp://aka.ms/LearnAboutSenderIdentification.] On 2022-04-08 21:28, Sh

[PATCH] drm/radeon: Add build directory to include path

2022-04-13 Thread Michel Dänzer
From: Michel Dänzer Fixes compile errors with out-of-tree builds, e.g. ../drivers/gpu/drm/radeon/r420.c:38:10: fatal error: r420_reg_safe.h: No such file or directory 38 | #include "r420_reg_safe.h" | ^ Signed-off-by: Michel Dänzer --- drivers/gpu/drm/radeon

Re: [EXTERNAL] [PATCH 2/2] drm/amdkfd: Add PCIe Hotplug Support for AMDKFD

2022-04-13 Thread Shuotao Xu
On Apr 11, 2022, at 11:52 PM, Andrey Grodzovsky mailto:andrey.grodzov...@amd.com>> wrote: [Some people who received this message don't often get email from andrey.grodzov...@amd.com. Learn why this is important at http://aka.ms/LearnAboutSenderIdentification.

[PATCH 2/2] drm/amdkfd: CRIU add support for GWS queues

2022-04-13 Thread David Yat Sin
Adding support to checkpoint/restore GWS(Global Wave Sync) queues. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 4 ++-- .../amd/amdkfd/kfd_process_queue_manager.c| 22 ++- 3 files chan

[PATCH 1/2] drm/amdkfd: Fix GWS queue count

2022-04-13 Thread David Yat Sin
Queue can be inactive during process termination. This would cause dqm->gws_queue_count to not be decremented. There can only be 1 GWS queue per device process so moving the logic out of loop. Signed-off-by: David Yat Sin --- .../gpu/drm/amd/amdkfd/kfd_device_queue_manager.c| 12 ++--

Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems

2022-04-13 Thread Nathan Chancellor
Hi Richard, On Tue, Apr 12, 2022 at 04:50:00PM -0500, Richard Gong wrote: > Active State Power Management (ASPM) feature is enabled since kernel 5.14. > There are some AMD GFX cards (such as WX3200 and RX640) that won't work > with ASPM-enabled Intel Alder Lake based systems. Using these GFX cards

[PATCH] drm/amd/amdgpu: Remove static from variable in RLCG Reg RW.

2022-04-13 Thread Gavin Wan
[why] These static variables saves the RLC Scratch registers address. When we installed multiple GPUs (for example: XGMI setting) and multiple GPUs call the function at same time. The RLC Scratch registers address are changed each other. Then it caused reading/writing to wro

Re: [PATCH] drm/amdkfd: potential NULL dereference in kfd_set/reset_event()

2022-04-13 Thread Felix Kuehling
Am 2022-04-13 um 03:36 schrieb Dan Carpenter: If lookup_event_by_id() returns a NULL "ev" pointer then the spin_lock(&ev->lock) will crash. This was detected by Smatch: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_events.c:644 kfd_set_event() error: we previously assumed 'ev' could be n

Re: [PATCH v3 1/1] amdgpu/pm: Clarify documentation of error handling in send_smc_mesg

2022-04-13 Thread Luben Tuikov
Looks good! Thanks. Reviewed-by: Luben Tuikov On 2022-04-13 01:08, Darren Powell wrote: > Clarify the smu_cmn_send_smc_msg_with_param documentation to mention two > cases exist where messages are silently dropped with no error returned. > These cases occur in unusual situations where either: >

RE: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems

2022-04-13 Thread Limonciello, Mario
[Public] > On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel > wrote: > > > > Dear Richard, > > > > > > Thank you for sending out v4. > > > > Am 12.04.22 um 23:50 schrieb Richard Gong: > > > Active State Power Management (ASPM) feature is enabled since kernel > 5.14. > > > There are some AMD GFX card

Re: [PATCH v2] Revert "drm/amd/display: Pass HostVM enable flag into DCN3.1 DML"

2022-04-13 Thread Alex Deucher
On Tue, Apr 12, 2022 at 5:03 PM Rodrigo Siqueira wrote: > > This reverts commit 367b3e934f578f6c0d5d8ca5987dc6ac4cd6831d. > > While we were testing DCN3.1 with a hub, we noticed that only one of 2 > connected displays lights up when using some specific display > resolution. In summary, this was th

Re: gcc inserts __builtin_popcount, causes 'modpost: "__popcountdi2" ... amdgpu.ko] undefined'

2022-04-13 Thread Sergei Trofimovich
On Mon, Apr 11, 2022 at 10:08:15PM +0100, Sergei Trofimovich wrote: > Current linux-5.17.1 on fresh gcc-12 fails to build with errors like: > > ERROR: modpost: "__popcountdi2" > [drivers/net/ethernet/broadcom/bnx2x/bnx2x.ko] undefined! > ERROR: modpost: "__popcountdi2" [drivers/gpu/drm/am

Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems

2022-04-13 Thread Alex Deucher
On Wed, Apr 13, 2022 at 3:43 AM Paul Menzel wrote: > > Dear Richard, > > > Thank you for sending out v4. > > Am 12.04.22 um 23:50 schrieb Richard Gong: > > Active State Power Management (ASPM) feature is enabled since kernel 5.14. > > There are some AMD GFX cards (such as WX3200 and RX640) that wo

Re: [PATCH] drm/amdgpu: Fix one use-after-free of VM

2022-04-13 Thread Daniel Vetter
On Tue, 12 Apr 2022 at 14:11, Christian König wrote: > > Am 12.04.22 um 14:03 schrieb xinhui pan: > > VM might already be freed when amdgpu_vm_tlb_seq_cb() is called. > > We see the calltrace below. > > > > Fix it by keeping the last flush fence around and wait for it to signal > > > > BUG kmalloc

Re: 回复: [PATCH] drm/amdgpu: Fix one use-after-free of VM

2022-04-13 Thread Christian König
I think for now we should just have a the following code in amdgpu_vm_fini: dma_fence_wait(vm->last_tlb_flush, false); /* Make sure that all fence callbacks have completed*/ spinlock(vm->last_tlb_flush->lock); spinunlock(vm->last_tlb_flush->lock); dma_fence_put(vm->last_tlb_flush); Cleaning that

Re: 回复: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread Christian König
That warning is a bit more than a little annoying. Before we stop the delayed delete worker we *must* absolutely make sure that there is nothing going on the hardware any more. Otherwise we could easily run into use after free issues. There should somewhere be a amdgpu_fence_wait_empty() befo

回复: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread Pan, Xinhui
[AMD Official Use Only] The log from tester says it is the drm framebuffer BO being busy. I just feel there is lack of time for its fence to be signaled. As a delay works too in my test. But the warning is a little annoying. 发件人: Koenig, Christian 发送时间:

Re: [PATCHv4] drm/amdgpu: disable ASPM on Intel Alder Lake based systems

2022-04-13 Thread Paul Menzel
Dear Richard, Thank you for sending out v4. Am 12.04.22 um 23:50 schrieb Richard Gong: Active State Power Management (ASPM) feature is enabled since kernel 5.14. There are some AMD GFX cards (such as WX3200 and RX640) that won't work with ASPM-enabled Intel Alder Lake based systems. Using thes

[PATCH] drm/amdkfd: potential NULL dereference in kfd_set/reset_event()

2022-04-13 Thread Dan Carpenter
If lookup_event_by_id() returns a NULL "ev" pointer then the spin_lock(&ev->lock) will crash. This was detected by Smatch: drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_events.c:644 kfd_set_event() error: we previously assumed 'ev' could be null (see line 639) Fixes: 5273e82c5f47 ("drm/amdkfd

Re: [PATCH v2] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread Paul Menzel
Dear xinhui, Am 13.04.22 um 08:46 schrieb xinhui pan: ttm_device_delayed_workqueue would reschedule itself if there is pending BO to be destroyed. So just one flush + cancel_sync is not enough. We still see lru_list not empty warnging. warning (`scripts/checkpatch.pl --codespell` should have

AW: [PATCH] drm/amdgpu: Make sure ttm delayed work finished

2022-04-13 Thread Koenig, Christian
We don't need that. TTM only reschedules when the BOs are still busy. And if the BOs are still busy when you unload the driver we have much bigger problems that this TTM worker :) Regards, Christian Von: Pan, Xinhui Gesendet: Mittwoch, 13. April 2022 05:08 An:

Re: Vega 56 failing to process EDID from VR Headset

2022-04-13 Thread Paul Menzel
Dear James, Am 13.04.22 um 00:13 schrieb James Dutton: On Tue, 12 Apr 2022 at 07:13, Paul Menzel wrote: Am 11.04.22 um 23:39 schrieb James Dutton: So, did you do any changes to Linux? Why do you think the EDID is at fault? […] I suggest to analyze, why `No DP link bandwidth` is logged. The m

Re: [PATCH 1/2] drm/amd/amdgpu: Update PF2VF header

2022-04-13 Thread Paul Menzel
[Removed unintended paste in second line] Am 13.04.22 um 09:03 schrieb Paul Menzel: Dear Bokun, Thank you for rerolling the patch. Please add the iteration/version in the subject next time `[PATCH v2 1/2]` or so. Am 12.04.22 um 23:31 schrieb Bokun Zhang: - Add proper indentation in the hea

Re: [PATCH 1/2] drm/amd/amdgpu: Update PF2VF header

2022-04-13 Thread Paul Menzel
Dear Bokun, drm/amd/amdgpu: Update PF2VF header Thank you for rerolling the patch. Please add the iteration/version in the subject next time `[PATCH v2 1/2]` or so. Am 12.04.22 um 23:31 schrieb Bokun Zhang: - Add proper indentation in the header file Please use that as the commit message s