[PATCH] drm/amdgpu: Update RAS XGMI Error Query

2021-08-24 Thread Clements, John
[AMD Official Use Only] Submitting patch to resolve RAS XGMI error query issue Thank you, John Clements 0001-drm-amdgpu-Update-RAS-XGMI-Error-Query.patch Description: 0001-drm-amdgpu-Update-RAS-XGMI-Error-Query.patch

Re: [PATCH 1/3] drm/amdkfd: Allocate SDMA engines more fairly

2021-08-24 Thread Lazar, Lijo
On 8/24/2021 12:19 PM, Greathouse, Joseph wrote: -Original Message- From: Lazar, Lijo Sent: Monday, August 23, 2021 11:37 PM To: Kuehling, Felix ; Greathouse, Joseph ; amd- g...@lists.freedesktop.org Subject: Re: [PATCH 1/3] drm/amdkfd: Allocate SDMA engines more fairly On 8/23/2021

RE: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-24 Thread Liu, Monk
[AMD Official Use Only] Hi Andrey Sorry that it is really hard for me to get any particular or solid potential bugs from your reply, can you be more specific, e.g.: what kind of race issue is introduced by this "kthread_stop/start" approach. To your another question/concern: >> . In a constan

RE: [PATCH 1/3] drm/amdkfd: Allocate SDMA engines more fairly

2021-08-24 Thread Greathouse, Joseph
> -Original Message- > From: Lazar, Lijo > Sent: Tuesday, August 24, 2021 2:24 AM > To: Greathouse, Joseph ; Kuehling, Felix > ; amd- > g...@lists.freedesktop.org > Subject: Re: [PATCH 1/3] drm/amdkfd: Allocate SDMA engines more fairly > > > > On 8/24/2021 12:19 PM, Greathouse, Joseph

Re: [PATCH 1/5] drm/sched:add new priority level

2021-08-24 Thread Sharma, Shashank
Hi Christian, I am a bit curious here. I thought it would be a good idea to add a new SW priority level, so that any other driver can also utilize this SW infrastructure. So it could be like, if you have a HW which matches with SW priority levels, directly map your HW queue to the SW priority

Re: [PATCH 1/5] drm/sched:add new priority level

2021-08-24 Thread Christian König
Nope that are two completely different things. The DRM_SCHED_PRIORITY_* exposes a functionality of the software scheduler. E.g. we try to serve kernel queues first and if those are empty we use high priority etc But that functionality is completely independent from the hardware priority

RE: [PATCH] amd/amdkfd: add ras page retirement handling for sq/sdma interrupt

2021-08-24 Thread Zhang, Hawking
[AMD Official Use Only] Hi Tao, This will break mode 2 reset solution, right? But we have to keep mode 2 reset solution as the default one for now. I think we need a new interface to allow KFD switch between unmap_queue and mode 2 reset solution Regards, Hawking -Original Message- Fro

RE: [PATCH] amd/amdkfd: add ras page retirement handling for sq/sdma interrupt

2021-08-24 Thread Zhang, Hawking
[AMD Official Use Only] How about we add a new member in ras context (amdgpu_ras) to indicate the poison consumption handling mode/approach? In such way, we can initialize that member per ASIC. Regards, Hawking -Original Message- From: amd-gfx On Behalf Of Zhang, Hawking Sent: Tuesda

Re: [PATCH 1/5] drm/sched:add new priority level

2021-08-24 Thread Sharma, Shashank
On 8/24/2021 2:25 PM, Christian König wrote: Nope that are two completely different things. The DRM_SCHED_PRIORITY_* exposes a functionality of the software scheduler. E.g. we try to serve kernel queues first and if those are empty we use high priority etc But that functionality is co

[PATCH] drm/sched: fix the bug of time out calculation

2021-08-24 Thread Monk Liu
the original logic is wrong that the timeout will not be retriggerd after the previous job siganled, and that lead to the scenario that all jobs in the same scheduler shares the same timeout timer from the very begining job in this scheduler which is wrong. we should modify the timer everytime a p

[PATCH] amd/amdkfd: add ras page retirement handling for sq/sdma interrupt

2021-08-24 Thread Tao Zhou
In ras poison mode, page retirement will be handled by the irq handler of the module which consumes corrupted data. Signed-off-by: Tao Zhou --- .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c| 13 - drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 10 -- drivers/gpu

Re: [PATCH] amd/amdkfd: add ras page retirement handling for sq/sdma interrupt

2021-08-24 Thread Zhou1, Tao
[AMD Official Use Only] Hi Hawking, GPU reset will also be called in dev->kfd2kgd->ras_process_cb, this patch is to add page retirement handling before gpu reset. unmap_queue mode (reset or preemption) is another story, I'll write a new patch after unmap_queue reset mode becomes functional. I

Re: [PATCH 1/5] drm/sched:add new priority level

2021-08-24 Thread Christian König
Am 24.08.21 um 11:45 schrieb Sharma, Shashank: On 8/24/2021 2:25 PM, Christian König wrote: Nope that are two completely different things. The DRM_SCHED_PRIORITY_* exposes a functionality of the software scheduler. E.g. we try to serve kernel queues first and if those are empty we use high pr

Re: [PATCH 1/5] drm/sched:add new priority level

2021-08-24 Thread Das, Nirmoy
Hi Christian, On 8/24/2021 8:10 AM, Christian König wrote: I haven't followed the previous discussion, but that looks like this change is based on a misunderstanding. In previous discussion I sort of suggested to have new DRM prio as I didn't see any other way to map priority provided by the

Re: [PATCH 1/5] drm/sched:add new priority level

2021-08-24 Thread Christian König
Am 24.08.21 um 13:57 schrieb Das, Nirmoy: Hi Christian, On 8/24/2021 8:10 AM, Christian König wrote: I haven't followed the previous discussion, but that looks like this change is based on a misunderstanding. In previous discussion I sort of suggested to have new DRM prio as I didn't see an

[PATCH linux-next] drm:dcn31: fix boolreturn.cocci warnings

2021-08-24 Thread CGEL
From: Jing Yangyang ./drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c:112:9-10:WARNING: return of 0/1 in function 'dcn31_is_panel_backlight_on' with return type bool ./drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c:122:9-10:WARNING: return of 0/1 in function 'dcn31_is_panel_powe

Re: [PATCH v2 03/12] x86/sev: Add an x86 version of prot_guest_has()

2021-08-24 Thread Christoph Hellwig
On Thu, Aug 19, 2021 at 01:33:09PM -0500, Tom Lendacky wrote: > I did it as inline originally because the presence of the function will be > decided based on the ARCH_HAS_PROTECTED_GUEST config. For now, that is > only selected by the AMD memory encryption support, so if I went out of > line I coul

[PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
From: Borislav Petkov Building a randconfig here triggered: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! because the module export of that symbol happens in kernel/power/suspend.c which is enabled with CONFIG_SUSPEND. The ifdef guards in amdgpu

[PATCH] drm/amd/amdgpu: New debugfs interface for MMIO registers

2021-08-24 Thread Tom St Denis
This new debugfs interface uses an IOCTL interface in order to pass along state information like SRBM and GRBM bank switching. This new interface also allows a full 32-bit MMIO address range which the previous didn't. With this new design we have room to grow the flexibility of the file as need b

Re: [PATCH] drm/amd/amdgpu: New debugfs interface for MMIO registers

2021-08-24 Thread Christian König
Am 24.08.21 um 14:16 schrieb Tom St Denis: This new debugfs interface uses an IOCTL interface in order to pass along state information like SRBM and GRBM bank switching. This new interface also allows a full 32-bit MMIO address range which the previous didn't. With this new design we have ro

Re: [PATCH 1/3] drm/amdkfd: Allocate SDMA engines more fairly

2021-08-24 Thread Lazar, Lijo
On 8/24/2021 1:26 PM, Greathouse, Joseph wrote: -Original Message- From: Lazar, Lijo Sent: Tuesday, August 24, 2021 2:24 AM To: Greathouse, Joseph ; Kuehling, Felix ; amd- g...@lists.freedesktop.org Subject: Re: [PATCH 1/3] drm/amdkfd: Allocate SDMA engines more fairly On 8/24/202

Re: [PATCH] drm/amd/amdgpu: New debugfs interface for MMIO registers

2021-08-24 Thread Christian König
Am 24.08.21 um 14:27 schrieb StDenis, Tom: [AMD Official Use Only] What do you mean a "shared header?" How would they be shared between kernel and user? Somewhere in the include/uapi/drm/ folder I think. Either add that to amdgpu_drm.h or maybe amdgpu_debugfs.h? Or just keep it as a struc

Re: [PATCH 1/5] drm/sched:add new priority level

2021-08-24 Thread Das, Nirmoy
On 8/24/2021 2:07 PM, Christian König wrote: Am 24.08.21 um 13:57 schrieb Das, Nirmoy: Hi Christian, On 8/24/2021 8:10 AM, Christian König wrote: I haven't followed the previous discussion, but that looks like this change is based on a misunderstanding. In previous discussion I sort of su

Re: [PATCH] drm/amd/amdgpu: New debugfs interface for MMIO registers

2021-08-24 Thread Tom St Denis
The IOCTL data is in the debugfs header as it is. I could move that to the amdgpu_drm.h and include it from amdgpu_debugfs.h. I'll re-write the STATE IOCTL to use a struct and then test against what I have in umr. Refactoring the read/write is trivial and I'll do that no problem (with style fixe

RE: [PATCH v2] drm/amdkfd: Account for SH/SE count when setting up cu masks.

2021-08-24 Thread Russell, Kent
[AMD Official Use Only] Minor comment inline > -Original Message- > From: amd-gfx On Behalf Of Sean Keely > Sent: Monday, August 23, 2021 8:37 PM > To: amd-gfx@lists.freedesktop.org > Cc: Keely, Sean > Subject: [PATCH v2] drm/amdkfd: Account for SH/SE count when setting up cu > masks.

Re: [PATCH] drm/amd/amdgpu: New debugfs interface for MMIO registers

2021-08-24 Thread StDenis, Tom
[AMD Official Use Only] What do you mean a "shared header?" How would they be shared between kernel and user? As for why not read/write. Jus wanted to keep it simple. It's not really performance bound. umr never does reads/writes larger than 32-bits anyways. It doesn't have to be this way

Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Lazar, Lijo
On 8/24/2021 3:12 PM, Borislav Petkov wrote: From: Borislav Petkov Building a randconfig here triggered: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! because the module export of that symbol happens in kernel/power/suspend.c which is enable

Re: [PATCH 1/5] drm/sched:add new priority level

2021-08-24 Thread Christian König
Am 24.08.21 um 14:39 schrieb Das, Nirmoy: On 8/24/2021 2:07 PM, Christian König wrote: Am 24.08.21 um 13:57 schrieb Das, Nirmoy: Hi Christian, On 8/24/2021 8:10 AM, Christian König wrote: I haven't followed the previous discussion, but that looks like this change is based on a misunderstandi

Re: [PATCH] drm/amd/amdgpu: New debugfs interface for MMIO registers

2021-08-24 Thread Christian König
Am 24.08.21 um 14:42 schrieb Tom St Denis: The IOCTL data is in the debugfs header as it is. I could move that to the amdgpu_drm.h and include it from amdgpu_debugfs.h. Na, keep it like that and just add a comment. On second thought I don't want to raise any discussion on the mailing list if

Re: [PATCH 1/5] drm/sched:add new priority level

2021-08-24 Thread Das, Nirmoy
On 8/24/2021 3:18 PM, Christian König wrote: Am 24.08.21 um 14:39 schrieb Das, Nirmoy: On 8/24/2021 2:07 PM, Christian König wrote: Am 24.08.21 um 13:57 schrieb Das, Nirmoy: Hi Christian, On 8/24/2021 8:10 AM, Christian König wrote: I haven't followed the previous discussion, but that look

Re: [PATCH] drm/amd/amdgpu: New debugfs interface for MMIO registers

2021-08-24 Thread Tom St Denis
hehehe I just moved it to uapi... No worries, you're the maintainer, I'll move it back before posting v2. Cheers, Tom On Tue, Aug 24, 2021 at 9:22 AM Christian König < ckoenig.leichtzumer...@gmail.com> wrote: > Am 24.08.21 um 14:42 schrieb Tom St Denis: > > The IOCTL data is in the debugfs heade

[PATCH] drm/amd/amdgpu: New debugfs interface for MMIO registers (v2)

2021-08-24 Thread Tom St Denis
This new debugfs interface uses an IOCTL interface in order to pass along state information like SRBM and GRBM bank switching. This new interface also allows a full 32-bit MMIO address range which the previous didn't. With this new design we have room to grow the flexibility of the file as need b

Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
On Tue, Aug 24, 2021 at 06:38:41PM +0530, Lazar, Lijo wrote: > Without CONFIG_PM_SLEEP and with CONFIG_SUSPEND Can you even create such a .config? > I remember giving a reviewed-by for this one, looks like it never got in. > https://www.spinics.net/lists/amd-gfx/msg66166.html A better version of

Re: [PATCH v2] drm/amd/display: Fix two cursor duplication when using overlay

2021-08-24 Thread Simon Ser
Hi Rodrigo! Thanks a lot for your reply! Comments below, please bear with me: I'm a bit familiar with the cursor issues, but my knowledge of AMD hw is still severely lacking. On Wednesday, August 18th, 2021 at 15:18, Rodrigo Siqueira wrote: > On 08/18, Simon Ser wrote: > > Hm. This patch cause

Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Lazar, Lijo
On 8/24/2021 7:10 PM, Borislav Petkov wrote: On Tue, Aug 24, 2021 at 06:38:41PM +0530, Lazar, Lijo wrote: Without CONFIG_PM_SLEEP and with CONFIG_SUSPEND Can you even create such a .config? The description of "(drm/amdgpu: fix checking pmops when PM_SLEEP is not enabled)" says - 'pm_

Re: [PATCH v2] Revert "drm/scheduler: Avoid accessing freed bad job."

2021-08-24 Thread Andrey Grodzovsky
On 2021-08-24 3:24 a.m., Liu, Monk wrote: [AMD Official Use Only] Hi Andrey Sorry that it is really hard for me to get any particular or solid potential bugs from your reply, can you be more specific, e.g.: what kind of race issue is introduced by this "kthread_stop/start" approach. Hey,

Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Borislav Petkov
On Tue, Aug 24, 2021 at 07:22:46PM +0530, Lazar, Lijo wrote: > 'pm_suspend_target_state' is only available when CONFIG_PM_SLEEP > is set/enabled. pm_suspend_target_state is available only when CONFIG_SUSPEND is enabled. The extern thing is only a forward declaration. > OTOH, when both SUSPEND and

Re: [PATCH] drm/sched: fix the bug of time out calculation

2021-08-24 Thread Andrey Grodzovsky
On 2021-08-24 10:46 a.m., Andrey Grodzovsky wrote: On 2021-08-24 5:51 a.m., Monk Liu wrote: the original logic is wrong that the timeout will not be retriggerd after the previous job siganled, and that lead to the scenario that all jobs in the same scheduler shares the same timeout timer from

Re: [PATCH v2] drm/amd/display: Fix two cursor duplication when using overlay

2021-08-24 Thread Kazlauskas, Nicholas
On 2021-08-24 9:59 a.m., Simon Ser wrote: Hi Rodrigo! Thanks a lot for your reply! Comments below, please bear with me: I'm a bit familiar with the cursor issues, but my knowledge of AMD hw is still severely lacking. On Wednesday, August 18th, 2021 at 15:18, Rodrigo Siqueira wrote: On 08/18

Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Lazar, Lijo
On 8/24/2021 3:12 PM, Borislav Petkov wrote: From: Borislav Petkov Building a randconfig here triggered: ERROR: modpost: "pm_suspend_target_state" [drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined! because the module export of that symbol happens in kernel/power/suspend.c which is enabl

Re: [PATCH] drm/amdgpu: Fix build with missing pm_suspend_target_state module export

2021-08-24 Thread Alex Deucher
Applied. Thanks! Alex On Tue, Aug 24, 2021 at 11:16 AM Lazar, Lijo wrote: > > > > On 8/24/2021 3:12 PM, Borislav Petkov wrote: > > From: Borislav Petkov > > > > Building a randconfig here triggered: > > > >ERROR: modpost: "pm_suspend_target_state" > > [drivers/gpu/drm/amd/amdgpu/amdgpu.ko

Re: [PATCH] drm/amd/amdgpu: New debugfs interface for MMIO registers

2021-08-24 Thread Alex Deucher
Also, please use C style comments. Alex On Tue, Aug 24, 2021 at 9:28 AM Tom St Denis wrote: > > hehehe I just moved it to uapi... No worries, you're the maintainer, I'll > move it back before posting v2. > > Cheers, > Tom > > On Tue, Aug 24, 2021 at 9:22 AM Christian König > wrote: >> >> Am 2

Re: [PATCH] drm/sched: fix the bug of time out calculation

2021-08-24 Thread Andrey Grodzovsky
On 2021-08-24 5:51 a.m., Monk Liu wrote: the original logic is wrong that the timeout will not be retriggerd after the previous job siganled, and that lead to the scenario that all jobs in the same scheduler shares the same timeout timer from the very begining job in this scheduler which is wro

[PATCH] drm/amd/amdgpu: New debugfs interface for MMIO registers (v3)

2021-08-24 Thread Tom St Denis
This new debugfs interface uses an IOCTL interface in order to pass along state information like SRBM and GRBM bank switching. This new interface also allows a full 32-bit MMIO address range which the previous didn't. With this new design we have room to grow the flexibility of the file as need b

Re: [PATCH v2] drm/amd/display: Fix two cursor duplication when using overlay

2021-08-24 Thread Harry Wentland
On 2021-08-24 10:56 a.m., Kazlauskas, Nicholas wrote: > On 2021-08-24 9:59 a.m., Simon Ser wrote: >> Hi Rodrigo! >> >> Thanks a lot for your reply! Comments below, please bear with me: I'm >> a bit familiar with the cursor issues, but my knowledge of AMD hw is >> still severely lacking. >> >> On

[PATCH 5/8] drm/amdgpu/display: fix mixed declarations and code in dc_link_dp.c

2021-08-24 Thread Alex Deucher
Trivial. Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support") Signed-off-by: Alex Deucher --- .../gpu/drm/amd/display/dc/core/dc_link_dp.c | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c b/dr

[PATCH 1/8] drm/amdgpu/display: fix mixed declarations and code in dp_set_hw_test_pattern

2021-08-24 Thread Alex Deucher
Trivial. Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c b/drivers/gpu/

[PATCH 7/8] drm/amdgpu/display: remove unused variables in dcn31_hpo_dp_link_enc_update_stream_allocation_table

2021-08-24 Thread Alex Deucher
Trivial. Fixes: dfed73a863df ("drm/amd/display: Add DP 2.0 HPO Link Encoder") Signed-off-by: Alex Deucher --- .../gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c | 4 1 file changed, 4 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c b/d

[PATCH 8/8] drm/amdgpu/display: remove unused variable in dcn31_hpo_dp_stream_enc_mute_control

2021-08-24 Thread Alex Deucher
Trivial. Fixes: c0c9c87bcc5f ("drm/amd/display: Add DP 2.0 HPO Stream Encoder") Signed-off-by: Alex Deucher --- .../gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_stream_encoder.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_stream_encoder.c

[PATCH 3/8] drm/amdgpu/display: handle all cases in decide_cr_training_pattern

2021-08-24 Thread Alex Deucher
We need a default case to handle the additional enum values. While here drop the need for a local variable. Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 11 +++ 1 file changed, 3 ins

[PATCH 6/8] drm/amdgpu/display: remove unused function dp2_update_mst_stream_alloc_table

2021-08-24 Thread Alex Deucher
Trivial. Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/core/dc_link.c | 52 --- 1 file changed, 52 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c b/drivers/gpu/drm/amd/d

Re: [PATCH 8/8] drm/amdgpu/display: remove unused variable in dcn31_hpo_dp_stream_enc_mute_control

2021-08-24 Thread Harry Wentland
Patches 1, 3, 5-8 are Reviewed-by: Harry Wentland For some reason I didn't seem to get patches 2 and 4. Harry On 2021-08-24 12:51 p.m., Alex Deucher wrote: > Trivial. > > Fixes: c0c9c87bcc5f ("drm/amd/display: Add DP 2.0 HPO Stream Encoder") > Signed-off-by: Alex Deucher > --- > .../gpu/drm/

[PATCH 2/8] drm/amdgpu/display: fix unhandled cases in get_phyd32clk_src()

2021-08-24 Thread Alex Deucher
Fixes an unhandled cases warning and defaults to a more logical return value. Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --g

[PATCH 4/8] drm/amdgpu/display: drop unused variable.

2021-08-24 Thread Alex Deucher
Trivial. Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/core/dc.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index bd

Re: [PATCH 4/8] drm/amdgpu/display: drop unused variable.

2021-08-24 Thread Harry Wentland
Patches 2 and 4 are Reviewed-by: Harry Wentland Harry On 2021-08-24 1:36 p.m., Alex Deucher wrote: > Trivial. > > Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support") > Signed-off-by: Alex Deucher > --- > drivers/gpu/drm/amd/display/dc/core/dc.c | 3 --- > 1 file changed, 3 dele

[PATCH 2/8] drm/amdgpu/display: fix unhandled cases in get_phyd32clk_src()

2021-08-24 Thread Alex Deucher
Fixes an unhandled cases warning and defaults to a more logical return value. Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --g

[PATCH 4/8] drm/amdgpu/display: drop unused variable.

2021-08-24 Thread Alex Deucher
Trivial. Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/core/dc.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c b/drivers/gpu/drm/amd/display/dc/core/dc.c index bd

[PATCH v2 00/18] CHECKPOINT RESTORE WITH ROCm

2021-08-24 Thread David Yat Sin
CRIU is a user space tool which is very popular for container live migration in datacentres. It can checkpoint a running application, save its complete state, memory contents and all system resources to images on disk which can be migrated to another m achine and restored later. More information

[PATCH v2 02/18] x86/configs: CRIU update debug rock defconfig

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj - Update debug config for Checkpoint-Restore (CR) support - Also include necessary options for CR with docker containers. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- arch/x86/configs/rock-dbg_defconfig | 53 ++--- 1 file

[PATCH v2 16/18] drm/amdkfd: CRIU implement gpu_id remapping

2021-08-24 Thread David Yat Sin
When doing a restore on a different node, the gpu_id's on the restore node may be different. But the user space application will still refer use the original gpu_id's in the ioctl calls. Adding code to create a gpu id mapping so that kfd can determine actual gpu_id during the user ioctl's. Signed-

[PATCH v2 15/18] drm/amdkfd: CRIU dump and restore events

2021-08-24 Thread David Yat Sin
Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 61 + drivers/gpu/drm/amd/amdkfd/kfd_events.c | 322 +-- drivers/gpu/drm/amd/amdkfd/kfd_priv.h

[PATCH v2 08/18] drm/amdkfd: CRIU Implement KFD pause ioctl

2021-08-24 Thread David Yat Sin
Introducing pause IOCTL. The CRIU amdgpu plugin is needs to call AMDKFD_IOC_CRIU_PAUSE(pause = 1) before starting dump and AMDKFD_IOC_CRIU_PAUSE(pause = 0) when dump is complete. This ensures that the queues are not modified between each CRIU dump ioctl. Signed-off-by: David Yat Sin --- drivers/

[PATCH v2 04/18] drm/amdkfd: CRIU Implement KFD process_info ioctl

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj This IOCTL is expected to be called as a precursor to the actual Checkpoint operation. This does the basic discovery into the target process seized by CRIU and relays the information to the userspace that utilizes it to start the Checkpoint operation via another dedicated

[PATCH v2 12/18] drm/amdkfd: CRIU restore queue doorbell id

2021-08-24 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same doorbell id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 61 +-- 1 file changed, 41 insertions(+), 20 deletions(-) diff --git a/drivers/g

[PATCH v2 14/18] drm/amdkfd: CRIU dump/restore queue control stack

2021-08-24 Thread David Yat Sin
Dump contents of queue control stacks on CRIU dump and restore them during CRIU restore. (rajneesh: rebased to 5.11 and fixed merge conflict) Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_d

[PATCH v2 03/18] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj Checkpoint-Restore in userspace (CRIU) is a powerful tool that can snapshot a running process and later restore it on same or a remote machine but expects the processes that have a device file (e.g. GPU) associated with them, provide necessary driver support to assist CRIU

[PATCH v2 17/18] Revert "drm/amdgpu: Remove verify_access shortcut for KFD BOs"

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj This reverts commit 12ebe2b9df192a2a8580cd9ee3e9940c116913c8. This is just a temporary work around and will be dropped later. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++ 1 file changed, 7 inser

[PATCH v2 01/18] x86/configs: CRIU update release defconfig

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj Update rock-rel_defconfig for monolithic kernel release that enables CRIU support with kfd. Signed-off-by: Rajneesh Bhardwaj (cherry picked from commit 4a6d309a82648a23a4fc0add83013ac6db6187d5) Signed-off-by: David Yat Sin --- arch/x86/configs/rock-rel_defconfig | 13 +

[PATCH v2 10/18] drm/amdkfd: CRIU restore queue ids

2021-08-24 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- driv

[PATCH v2 11/18] drm/amdkfd: CRIU restore sdma id for queues

2021-08-24 Thread David Yat Sin
When re-creating queues during CRIU restore, restore the queue with the same sdma id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++- .../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +- .../amd/amdkfd/kfd_pro

[PATCH v2 06/18] drm/amdkfd: CRIU Implement KFD restore ioctl

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj This implements the KFD CRIU Restore ioctl that lays the basic foundation for the CRIU restore operation. It provides support to create the buffer objects corresponding to Non-Paged system memory mapped for GPU and/or CPU access and lays basic foundation for the userptrs b

[PATCH v2 07/18] drm/amdkfd: CRIU Implement KFD resume ioctl

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj This adds support to create userptr BOs on restore and introduces a new ioctl to restart memory notifiers for the restored userptr BOs. When doing CRIU restore MMU notifications can happen anytime after we call amdgpu_mn_register. Prevent MMU notifications until we reach s

[PATCH v2 05/18] drm/amdkfd: CRIU Implement KFD dumper ioctl

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj This adds support to discover the buffer objects that belong to a process being checkpointed. The data corresponding to these buffer objects is returned to user space plugin running under criu master context which then stores this info to recreate these buffer objects dur

[PATCH v2 13/18] drm/amdkfd: CRIU dump and restore queue mqds

2021-08-24 Thread David Yat Sin
Dump contents of queue MQD's on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 70 ++-- .../d

[PATCH v2 09/18] drm/amdkfd: CRIU add queues support

2021-08-24 Thread David Yat Sin
Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore. Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 16 +- drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 25 +- ..

[PATCH v2 18/18] drm/amdkfd: CRIU export kfd bos as prime dmabuf objects

2021-08-24 Thread David Yat Sin
From: Rajneesh Bhardwaj KFD buffer objects do not associate a GEM handle with them so cannot directly be used with libdrm to initiate a system dma (sDMA) operation to speedup the checkpoint and restore operation so export them as dmabuf objects and use with libdrm helper (amdgpu_bo_import) to fur

[PATCH 0/4] Various fixes to pass libdrm hotunplug tests

2021-08-24 Thread Andrey Grodzovsky
Bunch of fixes to enable passing hotplug tests i previosly added here[1] with latest code. Once accepted I will enable the tests on libdrm side. [1] - https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/172 Andrey Grodzovsky (4): drm/amdgpu: Move flush VCE idle_work during HW fini drm/t

[PATCH 2/4] drm/ttm: Create pinned list

2021-08-24 Thread Andrey Grodzovsky
This list will be used to capture all non VRAM BOs not on LRU so when device is hot unplugged we can iterate the list and unmap DMA mappings before device is removed. Signed-off-by: Andrey Grodzovsky Suggested-by: Christian König --- drivers/gpu/drm/ttm/ttm_bo.c | 24 +

[PATCH 1/4] drm/amdgpu: Move flush VCE idle_work during HW fini

2021-08-24 Thread Andrey Grodzovsky
Attepmts to powergate after device is removed lead to crash. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 1 - drivers/gpu/drm/amd/amdgpu/vce_v2_0.c | 4 drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 5 - drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 2 ++ 4

[PATCH 3/4] drm/amdgpu: drm/amdgpu: Handle IOMMU enabled case

2021-08-24 Thread Andrey Grodzovsky
Handle all DMA IOMMU group related dependencies before the group is removed and we try to access it after free. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 + drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 50 ++ drivers/gpu/drm/amd/amdg

[PATCH 4/4] drm/amdgpu: Add a UAPI flag for hot plug/unplug

2021-08-24 Thread Andrey Grodzovsky
To support libdrm tests. Signed-off-by: Andrey Grodzovsky --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index 6400259a7c4b..c2fdf67ff551 100644 -

Re: [PATCH v2] drm/amdkfd: Account for SH/SE count when setting up cu masks.

2021-08-24 Thread Felix Kuehling
Am 2021-08-23 um 8:36 p.m. schrieb Sean Keely: > On systems with multiple SH per SE compute_static_thread_mgmt_se# > is split into independent masks, one for each SH, in the upper and > lower 16 bits. We need to detect this and apply cu masking to each > SH. The cu mask bits are assigned first to

RE: [PATCH 1/4] drm/amdgpu: Move flush VCE idle_work during HW fini

2021-08-24 Thread Quan, Evan
[AMD Official Use Only] Hi Andrey, I sent out a similar patch set to address S3 issue. And I believe it should be able to address the issue here too. https://lists.freedesktop.org/archives/amd-gfx/2021-August/067972.html https://lists.freedesktop.org/archives/amd-gfx/2021-August/067967.html BR

RE: [PATCH v2] drm/amdkfd: Account for SH/SE count when setting up cu masks.

2021-08-24 Thread Keely, Sean
[AMD Official Use Only] Right, sorry. These were two separate branches until checkpatch complained about the nesting level. Then I broke it. -Original Message- From: Kuehling, Felix Sent: Tuesday, August 24, 2021 7:54 PM To: amd-gfx@lists.freedesktop.org; Keely, Sean Subject: Re: [PA

[PATCH v3] drm/amdkfd: Account for SH/SE count when setting up cu masks.

2021-08-24 Thread Sean Keely
On systems with multiple SH per SE compute_static_thread_mgmt_se# is split into independent masks, one for each SH, in the upper and lower 16 bits. We need to detect this and apply cu masking to each SH. The cu mask bits are assigned first to each SE, then to alternate SHs, then finally to higher

Re: [PATCH 1/4] drm/amdgpu: Move flush VCE idle_work during HW fini

2021-08-24 Thread Andrey Grodzovsky
Right, they will cover my use case, when are they landing ? I rebased today and haven't seen them. Andrey On 2021-08-24 9:41 p.m., Quan, Evan wrote: [AMD Official Use Only] Hi Andrey, I sent out a similar patch set to address S3 issue. And I believe it should be able to address the issue he

RE: [PATCH 1/4] drm/amdgpu: Move flush VCE idle_work during HW fini

2021-08-24 Thread Quan, Evan
[AMD Official Use Only] Just landed. Thanks, Evan > -Original Message- > From: Grodzovsky, Andrey > Sent: Wednesday, August 25, 2021 11:20 AM > To: Quan, Evan ; dri-de...@lists.freedesktop.org; > amd-gfx@lists.freedesktop.org > Cc: ckoenig.leichtzumer...@gmail.com > Subject: Re: [PATCH 1

[PATCH] drm/amdgpu: reenable BACO support for 699F:C7 polaris12 SKU

2021-08-24 Thread Evan Quan
This reverts the commit below: "drm/amdgpu: disable BACO support for 699F:C7 polaris12 SKU temporarily". As the S3 hang issue has been fixed by another commit: "drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend". Change-Id: I5ea08a75eedd7fe32c7fa0b448f5bae1f390abe6 Signed-off-by: E

[PATCH v1 07/14] drm/amdkfd: public type as sys mem on migration to ram

2021-08-24 Thread Alex Sierra
Public device type memory on VRAM to RAM migration, has similar access as System RAM from the CPU. This flag sets the source from the sender. Which in Public type case, should be set as IOMEM. Signed-off-by: Alex Sierra Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |

[PATCH v1 05/14] drm/amdkfd: ref count init for device pages

2021-08-24 Thread Alex Sierra
Ref counter from device pages is init to zero during memmap init zone. The first time a new device page is allocated to migrate data into it, its ref counter needs to be initialized to one. Signed-off-by: Alex Sierra --- drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +- 1 file changed, 1 insertio

[PATCH v1 12/14] lib: add support for device public type in test_hmm

2021-08-24 Thread Alex Sierra
Device Public type uses device memory that is coherently accesible by the CPU. This could be shown as SP (special purpose) memory range at the BIOS-e820 memory enumeration. If no SP memory is supported in system, this could be faked by setting CONFIG_EFI_FAKE_MEMMAP. Currently, test_hmm only suppo

[PATCH v1 11/14] lib: test_hmm add module param for zone device type

2021-08-24 Thread Alex Sierra
In order to configure device public in test_hmm, two module parameters should be passed, which correspond to the SP start address of each device (2) spm_addr_dev0 & spm_addr_dev1. If no parameters are passed, private device type is configured. Signed-off-by: Alex Sierra --- v5: Remove devmem->pag

[PATCH v1 03/14] mm: add iomem vma selection for memory migration

2021-08-24 Thread Alex Sierra
In this case, this is used to migrate pages from device memory, back to system memory. This particular device memory type should be accessible by the CPU, through IOMEM access. Typically, zone device public type memory falls into this category. Signed-off-by: Alex Sierra --- include/linux/migrat

[PATCH v1 08/14] mm: add public type support to migrate_vma helpers

2021-08-24 Thread Alex Sierra
Add device public type case to migrate_vma_insert_page, migrate_vma_pages and migrate_vma_check_page helpers. Signed-off-by: Alex Sierra --- mm/migrate.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index d4ae2da99607..09817

[PATCH v1 01/14] ext4/xfs: add page refcount helper

2021-08-24 Thread Alex Sierra
From: Ralph Campbell There are several places where ZONE_DEVICE struct pages assume a reference count == 1 means the page is idle and free. Instead of open coding this, add a helper function to hide this detail. Signed-off-by: Ralph Campbell Signed-off-by: Alex Sierra Reviewed-by: Christoph He

[PATCH v1 13/14] tools: update hmm-test to support device public type

2021-08-24 Thread Alex Sierra
Test cases such as migrate_fault and migrate_multiple, were modified to explicit migrate from device to sys memory without the need of page faults, when using device public type. Snapshot test case updated to read memory device type first and based on that, get the proper returned results migrate_

[PATCH v1 06/14] drm/amdkfd: add SPM support for SVM

2021-08-24 Thread Alex Sierra
When CPU is connected throug XGMI, it has coherent access to VRAM resource. In this case that resource is taken from a table in the device gmc aperture base. This resource is used along with the device type, which could be DEVICE_PRIVATE or DEVICE_PUBLIC to create the device page map region. Signe

[PATCH v1 02/14] mm: remove extra ZONE_DEVICE struct page refcount

2021-08-24 Thread Alex Sierra
From: Ralph Campbell ZONE_DEVICE struct pages have an extra reference count that complicates the code for put_page() and several places in the kernel that need to check the reference count to see that a page is not being used (gup, compaction, migration, etc.). Clean up the code so the reference

[PATCH v1 04/14] mm: add zone device public type memory support

2021-08-24 Thread Alex Sierra
Device memory that is cache coherent from device and CPU point of view. This is use on platform that have an advance system bus (like CAPI or CCIX). Any page of a process can be migrated to such memory. However, no one should be allow to pin such memory so that it can always be evicted. Signed-off

[PATCH v1 10/14] lib: test_hmm add ioctl to get zone device type

2021-08-24 Thread Alex Sierra
new ioctl cmd added to query zone device type. This will be used once the test_hmm adds zone device public type. Signed-off-by: Alex Sierra --- lib/test_hmm.c | 15 ++- lib/test_hmm_uapi.h | 7 +++ 2 files changed, 21 insertions(+), 1 deletion(-) diff --git a/lib/test_hmm.

  1   2   >