Re: [PATCH 2/2] drm/amdgpu: rename amdgpu_vm_bo_rmv to _del

2022-02-02 Thread Christian König
Am 01.02.22 um 17:27 schrieb Daniel Vetter: On Tue, Feb 1, 2022 at 4:28 PM Christian König wrote: Some people complained about the name and this matches much more Linux naming conventions for object functions. Signed-off-by: Christian König "some people" sounds mightily ominous :-) That's

[PATCH] drm/amdgpu: skipping SDMA IP suspend for S0ix.

2022-02-02 Thread Rajib Mahapatra
[Why] amdgpu error observed if suspend is aborted during S0i3 resume. [How] If suspend is aborted for some reason during S0i3 resume cycle, it follows amdgpu errors in resume. Skipping SDMA ip in suspend solves the issue on RENOIR (green sardine apu) chip. This time, the system is able to resume g

RE: [PATCH] drm/amdgpu: skipping SDMA IP suspend for S0ix.

2022-02-02 Thread Limonciello, Mario
[Public] > -Original Message- > From: Mahapatra, Rajib > Sent: Wednesday, February 2, 2022 03:07 > To: Liang, Prike ; Limonciello, Mario > ; Deucher, Alexander > > Cc: amd-gfx@lists.freedesktop.org; S, Shirish ; > Mahapatra, Rajib > Subject: [PATCH] drm/amdgpu: skipping SDMA IP suspen

Re: [PATCH] drm/amdgpu: skipping SDMA IP suspend for S0ix.

2022-02-02 Thread Alex Deucher
On Wed, Feb 2, 2022 at 4:07 AM Rajib Mahapatra wrote: > > [Why] > amdgpu error observed if suspend is aborted during S0i3 > resume. > > [How] > If suspend is aborted for some reason during S0i3 resume > cycle, it follows amdgpu errors in resume. > Skipping SDMA ip in suspend solves the issue on RE

Re: [PATCH v4 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping

2022-02-02 Thread Christoph Hellwig
On Thu, Jan 27, 2022 at 02:32:58PM -0800, Andrew Morton wrote: > On Wed, 26 Jan 2022 21:09:39 -0600 Alex Sierra wrote: > > > This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory > > owned by a device that can be mapped into CPU page tables like > > MEMORY_DEVICE_GENERIC and can a

Re: [PATCH] drm/amdgpu: skipping SDMA IP suspend for S0ix.

2022-02-02 Thread Limonciello, Mario
On 2/2/2022 08:16, Alex Deucher wrote: On Wed, Feb 2, 2022 at 4:07 AM Rajib Mahapatra wrote: [Why] amdgpu error observed if suspend is aborted during S0i3 resume. [How] If suspend is aborted for some reason during S0i3 resume cycle, it follows amdgpu errors in resume. Skipping SDMA ip in susp

Re: [PATCH] drm/amdgpu: Handle the GPU recovery failure in SRIOV environment.

2022-02-02 Thread Andrey Grodzovsky
On 2022-02-01 16:47, Surbhi Kakarya wrote: This patch handles the GPU recovery faliure in sriov environment by retrying the reset if the first reset fails. To determine the condition of retry, a new function amdgpu_is_retry_sriov_reset() is added which returns true if failure is due to ETIMED

Re: [PATCH v4 00/10] Add MEMORY_DEVICE_COHERENT for coherent device memory mapping

2022-02-02 Thread Jason Gunthorpe
On Wed, Feb 02, 2022 at 03:57:50PM +0100, Christoph Hellwig wrote: > On Thu, Jan 27, 2022 at 02:32:58PM -0800, Andrew Morton wrote: > > On Wed, 26 Jan 2022 21:09:39 -0600 Alex Sierra wrote: > > > > > This patch series introduces MEMORY_DEVICE_COHERENT, a type of memory > > > owned by a device tha

Re: [PATCH] drm/amdgpu: skipping SDMA IP suspend for S0ix.

2022-02-02 Thread Alex Deucher
On Wed, Feb 2, 2022 at 10:29 AM Limonciello, Mario wrote: > > On 2/2/2022 08:16, Alex Deucher wrote: > > On Wed, Feb 2, 2022 at 4:07 AM Rajib Mahapatra > > wrote: > >> > >> [Why] > >> amdgpu error observed if suspend is aborted during S0i3 > >> resume. > >> > >> [How] > >> If suspend is aborted

Re: [PATCH] drm/amdgpu: Handle the GPU recovery failure in SRIOV environment.

2022-02-02 Thread Felix Kuehling
Am 2022-02-01 um 16:47 schrieb Surbhi Kakarya: This patch handles the GPU recovery faliure in sriov environment by retrying the reset if the first reset fails. To determine the condition of retry, a new function amdgpu_is_retry_sriov_reset() is added which returns true if failure is due to ETIM

[RFC v4] drm/amdgpu: Rework reset domain to be refcounted.

2022-02-02 Thread Andrey Grodzovsky
The reset domain contains register access semaphor now and so needs to be present as long as each device in a hive needs it and so it cannot be binded to XGMI hive life cycle. Adress this by making reset domain refcounted and pointed by each member of the hive and the hive itself. v4: Fix crash on

[bug report] drm/amd/display: refactor destructive verify link cap sequence

2022-02-02 Thread Dan Carpenter
Hello Wenjing Liu, The patch 1a206273c322: "drm/amd/display: refactor destructive verify link cap sequence" from Jan 28, 2022, leads to the following Smatch static checker warning: drivers/gpu/drm/amd/amdgpu/../display/dc/core/dc_link_dp.c:3248 dp_verify_link_cap() error: uniniti

[PATCH] drm/amd/display: Handle removed connector in early_unregister

2022-02-02 Thread Fangzhi Zuo
From: Wayne Lin [Why] commit "drm/amd/display: turn DPMS off on connector unplug" and commit "drm/amd/display: Clear dc remote sinks on MST disconnect" were trying to resolve the resource problem when we connectors get disconnected under MST scenarios. However, these patches don't really clean up

Re: [RFC v3 00/12] Define and use reset domain for GPU recovery in amdgpu

2022-02-02 Thread Andrey Grodzovsky
Just another ping, with Shyun's help I was able to do some smoke testing on XGMI SRIOV system (booting and triggering hive reset) and for now looks good. Andrey On 2022-01-28 14:36, Andrey Grodzovsky wrote: Just a gentle ping if people have more comments on this patch set ? Especially last 5 p

[PATCH] drm/amd/display: Use NULL pointer instead of plain integer

2022-02-02 Thread Magali Lemes
Assigning 0L to a pointer variable caused the following warning: drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dsc/rc_calc_fpu.c:71:40: warning: Using plain integer as NULL pointer In order to remove this warning, this commit assigns a NULL pointer to the pointer variable that caused this issue.

Re: [PATCH] drm/amd/display: Use NULL pointer instead of plain integer

2022-02-02 Thread Alex Deucher
Applied. Thanks! Alex On Wed, Feb 2, 2022 at 5:20 PM Magali Lemes wrote: > > Assigning 0L to a pointer variable caused the following warning: > > drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dsc/rc_calc_fpu.c:71:40: > warning: Using plain integer as NULL pointer > > In order to remove this warn

[PATCH] drm/amd/pm: add missing prototypes to amdgpu_dpm_internal

2022-02-02 Thread Maíra Canal
Include the header with the prototype to silence the following clang warnings: drivers/gpu/drm/amd/amdgpu/../pm/amdgpu_dpm_internal.c:29:6: warning: no previous prototype for function 'amdgpu_dpm_get_active_displays' [-Wmissing-prototypes] void amdgpu_dpm_get_active_displays(struct amdgpu_device *

[pull] amdgpu drm-fixes-5.17

2022-02-02 Thread Alex Deucher
Hi Dave, Daniel, Fixes for 5.17. The following changes since commit 26291c54e111ff6ba87a164d85d4a4e134b7315c: Linux 5.17-rc2 (2022-01-30 15:37:07 +0200) are available in the Git repository at: https://gitlab.freedesktop.org/agd5f/linux.git tags/amd-drm-fixes-5.17-2022-02-02 for you to fe

[PATCH -next] drm/amdkfd: Fix resource_size.cocci warning

2022-02-02 Thread Yang Li
Use resource_size function on resource object instead of explicit computation. Eliminate the following coccicheck warning: ./drivers/gpu/drm/amd/amdkfd/kfd_migrate.c:978:11-14: ERROR: Missing resource_size with res Reported-by: Abaci Robot Signed-off-by: Yang Li --- drivers/gpu/drm/amd/amdkfd/