[AMD Official Use Only]
Submitting patch to resolve RAS XGMI error query issue
Thank you,
John Clements
0001-drm-amdgpu-Update-RAS-XGMI-Error-Query.patch
Description: 0001-drm-amdgpu-Update-RAS-XGMI-Error-Query.patch
On 8/24/2021 12:19 PM, Greathouse, Joseph wrote:
-Original Message-
From: Lazar, Lijo
Sent: Monday, August 23, 2021 11:37 PM
To: Kuehling, Felix ; Greathouse, Joseph
; amd-
g...@lists.freedesktop.org
Subject: Re: [PATCH 1/3] drm/amdkfd: Allocate SDMA engines more fairly
On 8/23/2021
[AMD Official Use Only]
Hi Andrey
Sorry that it is really hard for me to get any particular or solid potential
bugs from your reply, can you be more specific, e.g.: what kind of race issue
is introduced by this "kthread_stop/start" approach.
To your another question/concern:
>> . In a constan
> -Original Message-
> From: Lazar, Lijo
> Sent: Tuesday, August 24, 2021 2:24 AM
> To: Greathouse, Joseph ; Kuehling, Felix
> ; amd-
> g...@lists.freedesktop.org
> Subject: Re: [PATCH 1/3] drm/amdkfd: Allocate SDMA engines more fairly
>
>
>
> On 8/24/2021 12:19 PM, Greathouse, Joseph
Hi Christian,
I am a bit curious here.
I thought it would be a good idea to add a new SW priority level, so
that any other driver can also utilize this SW infrastructure.
So it could be like, if you have a HW which matches with SW priority
levels, directly map your HW queue to the SW priority
Nope that are two completely different things.
The DRM_SCHED_PRIORITY_* exposes a functionality of the software
scheduler. E.g. we try to serve kernel queues first and if those are
empty we use high priority etc
But that functionality is completely independent from the hardware
priority
[AMD Official Use Only]
Hi Tao,
This will break mode 2 reset solution, right? But we have to keep mode 2 reset
solution as the default one for now. I think we need a new interface to allow
KFD switch between unmap_queue and mode 2 reset solution
Regards,
Hawking
-Original Message-
Fro
[AMD Official Use Only]
How about we add a new member in ras context (amdgpu_ras) to indicate the
poison consumption handling mode/approach? In such way, we can initialize that
member per ASIC.
Regards,
Hawking
-Original Message-
From: amd-gfx On Behalf Of Zhang,
Hawking
Sent: Tuesda
On 8/24/2021 2:25 PM, Christian König wrote:
Nope that are two completely different things.
The DRM_SCHED_PRIORITY_* exposes a functionality of the software
scheduler. E.g. we try to serve kernel queues first and if those are
empty we use high priority etc
But that functionality is co
the original logic is wrong that the timeout will not be retriggerd
after the previous job siganled, and that lead to the scenario that all
jobs in the same scheduler shares the same timeout timer from the very
begining job in this scheduler which is wrong.
we should modify the timer everytime a p
In ras poison mode, page retirement will be handled by the irq handler of the
module which consumes corrupted data.
Signed-off-by: Tao Zhou
---
.../gpu/drm/amd/amdgpu/amdgpu_amdkfd_aldebaran.c| 13 -
drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 10 --
drivers/gpu
[AMD Official Use Only]
Hi Hawking,
GPU reset will also be called in dev->kfd2kgd->ras_process_cb, this patch is to
add page retirement handling before gpu reset.
unmap_queue mode (reset or preemption) is another story, I'll write a new patch
after unmap_queue reset mode becomes functional.
I
Am 24.08.21 um 11:45 schrieb Sharma, Shashank:
On 8/24/2021 2:25 PM, Christian König wrote:
Nope that are two completely different things.
The DRM_SCHED_PRIORITY_* exposes a functionality of the software
scheduler. E.g. we try to serve kernel queues first and if those are
empty we use high pr
Hi Christian,
On 8/24/2021 8:10 AM, Christian König wrote:
I haven't followed the previous discussion, but that looks like this
change is based on a misunderstanding.
In previous discussion I sort of suggested to have new DRM prio as I
didn't see any other way to map priority provided by the
Am 24.08.21 um 13:57 schrieb Das, Nirmoy:
Hi Christian,
On 8/24/2021 8:10 AM, Christian König wrote:
I haven't followed the previous discussion, but that looks like this
change is based on a misunderstanding.
In previous discussion I sort of suggested to have new DRM prio as I
didn't see an
From: Jing Yangyang
./drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c:112:9-10:WARNING:
return of 0/1 in function 'dcn31_is_panel_backlight_on'
with return type bool
./drivers/gpu/drm/amd/display/dc/dcn31/dcn31_panel_cntl.c:122:9-10:WARNING:
return of 0/1 in function 'dcn31_is_panel_powe
On Thu, Aug 19, 2021 at 01:33:09PM -0500, Tom Lendacky wrote:
> I did it as inline originally because the presence of the function will be
> decided based on the ARCH_HAS_PROTECTED_GUEST config. For now, that is
> only selected by the AMD memory encryption support, so if I went out of
> line I coul
From: Borislav Petkov
Building a randconfig here triggered:
ERROR: modpost: "pm_suspend_target_state"
[drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
because the module export of that symbol happens in
kernel/power/suspend.c which is enabled with CONFIG_SUSPEND.
The ifdef guards in amdgpu
This new debugfs interface uses an IOCTL interface in order to pass
along state information like SRBM and GRBM bank switching. This
new interface also allows a full 32-bit MMIO address range which
the previous didn't. With this new design we have room to grow
the flexibility of the file as need b
Am 24.08.21 um 14:16 schrieb Tom St Denis:
This new debugfs interface uses an IOCTL interface in order to pass
along state information like SRBM and GRBM bank switching. This
new interface also allows a full 32-bit MMIO address range which
the previous didn't. With this new design we have ro
On 8/24/2021 1:26 PM, Greathouse, Joseph wrote:
-Original Message-
From: Lazar, Lijo
Sent: Tuesday, August 24, 2021 2:24 AM
To: Greathouse, Joseph ; Kuehling, Felix
; amd-
g...@lists.freedesktop.org
Subject: Re: [PATCH 1/3] drm/amdkfd: Allocate SDMA engines more fairly
On 8/24/202
Am 24.08.21 um 14:27 schrieb StDenis, Tom:
[AMD Official Use Only]
What do you mean a "shared header?" How would they be shared between kernel
and user?
Somewhere in the include/uapi/drm/ folder I think. Either add that to
amdgpu_drm.h or maybe amdgpu_debugfs.h?
Or just keep it as a struc
On 8/24/2021 2:07 PM, Christian König wrote:
Am 24.08.21 um 13:57 schrieb Das, Nirmoy:
Hi Christian,
On 8/24/2021 8:10 AM, Christian König wrote:
I haven't followed the previous discussion, but that looks like this
change is based on a misunderstanding.
In previous discussion I sort of su
The IOCTL data is in the debugfs header as it is. I could move that to the
amdgpu_drm.h and include it from amdgpu_debugfs.h.
I'll re-write the STATE IOCTL to use a struct and then test against what I
have in umr.
Refactoring the read/write is trivial and I'll do that no problem (with
style fixe
[AMD Official Use Only]
Minor comment inline
> -Original Message-
> From: amd-gfx On Behalf Of Sean Keely
> Sent: Monday, August 23, 2021 8:37 PM
> To: amd-gfx@lists.freedesktop.org
> Cc: Keely, Sean
> Subject: [PATCH v2] drm/amdkfd: Account for SH/SE count when setting up cu
> masks.
[AMD Official Use Only]
What do you mean a "shared header?" How would they be shared between kernel
and user?
As for why not read/write. Jus wanted to keep it simple. It's not really
performance bound. umr never does reads/writes larger than 32-bits anyways.
It doesn't have to be this way
On 8/24/2021 3:12 PM, Borislav Petkov wrote:
From: Borislav Petkov
Building a randconfig here triggered:
ERROR: modpost: "pm_suspend_target_state"
[drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
because the module export of that symbol happens in
kernel/power/suspend.c which is enable
Am 24.08.21 um 14:39 schrieb Das, Nirmoy:
On 8/24/2021 2:07 PM, Christian König wrote:
Am 24.08.21 um 13:57 schrieb Das, Nirmoy:
Hi Christian,
On 8/24/2021 8:10 AM, Christian König wrote:
I haven't followed the previous discussion, but that looks like
this change is based on a misunderstandi
Am 24.08.21 um 14:42 schrieb Tom St Denis:
The IOCTL data is in the debugfs header as it is. I could move that to
the amdgpu_drm.h and include it from amdgpu_debugfs.h.
Na, keep it like that and just add a comment.
On second thought I don't want to raise any discussion on the mailing
list if
On 8/24/2021 3:18 PM, Christian König wrote:
Am 24.08.21 um 14:39 schrieb Das, Nirmoy:
On 8/24/2021 2:07 PM, Christian König wrote:
Am 24.08.21 um 13:57 schrieb Das, Nirmoy:
Hi Christian,
On 8/24/2021 8:10 AM, Christian König wrote:
I haven't followed the previous discussion, but that look
hehehe I just moved it to uapi... No worries, you're the maintainer, I'll
move it back before posting v2.
Cheers,
Tom
On Tue, Aug 24, 2021 at 9:22 AM Christian König <
ckoenig.leichtzumer...@gmail.com> wrote:
> Am 24.08.21 um 14:42 schrieb Tom St Denis:
>
> The IOCTL data is in the debugfs heade
This new debugfs interface uses an IOCTL interface in order to pass
along state information like SRBM and GRBM bank switching. This
new interface also allows a full 32-bit MMIO address range which
the previous didn't. With this new design we have room to grow
the flexibility of the file as need b
On Tue, Aug 24, 2021 at 06:38:41PM +0530, Lazar, Lijo wrote:
> Without CONFIG_PM_SLEEP and with CONFIG_SUSPEND
Can you even create such a .config?
> I remember giving a reviewed-by for this one, looks like it never got in.
> https://www.spinics.net/lists/amd-gfx/msg66166.html
A better version of
Hi Rodrigo!
Thanks a lot for your reply! Comments below, please bear with me: I'm
a bit familiar with the cursor issues, but my knowledge of AMD hw is
still severely lacking.
On Wednesday, August 18th, 2021 at 15:18, Rodrigo Siqueira
wrote:
> On 08/18, Simon Ser wrote:
> > Hm. This patch cause
On 8/24/2021 7:10 PM, Borislav Petkov wrote:
On Tue, Aug 24, 2021 at 06:38:41PM +0530, Lazar, Lijo wrote:
Without CONFIG_PM_SLEEP and with CONFIG_SUSPEND
Can you even create such a .config?
The description of "(drm/amdgpu: fix checking pmops when PM_SLEEP is
not enabled)" says -
'pm_
On 2021-08-24 3:24 a.m., Liu, Monk wrote:
[AMD Official Use Only]
Hi Andrey
Sorry that it is really hard for me to get any particular or solid potential bugs from
your reply, can you be more specific, e.g.: what kind of race issue is introduced by this
"kthread_stop/start" approach.
Hey,
On Tue, Aug 24, 2021 at 07:22:46PM +0530, Lazar, Lijo wrote:
> 'pm_suspend_target_state' is only available when CONFIG_PM_SLEEP
> is set/enabled.
pm_suspend_target_state is available only when CONFIG_SUSPEND is
enabled. The extern thing is only a forward declaration.
> OTOH, when both SUSPEND and
On 2021-08-24 10:46 a.m., Andrey Grodzovsky wrote:
On 2021-08-24 5:51 a.m., Monk Liu wrote:
the original logic is wrong that the timeout will not be retriggerd
after the previous job siganled, and that lead to the scenario that all
jobs in the same scheduler shares the same timeout timer from
On 2021-08-24 9:59 a.m., Simon Ser wrote:
Hi Rodrigo!
Thanks a lot for your reply! Comments below, please bear with me: I'm
a bit familiar with the cursor issues, but my knowledge of AMD hw is
still severely lacking.
On Wednesday, August 18th, 2021 at 15:18, Rodrigo Siqueira
wrote:
On 08/18
On 8/24/2021 3:12 PM, Borislav Petkov wrote:
From: Borislav Petkov
Building a randconfig here triggered:
ERROR: modpost: "pm_suspend_target_state"
[drivers/gpu/drm/amd/amdgpu/amdgpu.ko] undefined!
because the module export of that symbol happens in
kernel/power/suspend.c which is enabl
Applied. Thanks!
Alex
On Tue, Aug 24, 2021 at 11:16 AM Lazar, Lijo wrote:
>
>
>
> On 8/24/2021 3:12 PM, Borislav Petkov wrote:
> > From: Borislav Petkov
> >
> > Building a randconfig here triggered:
> >
> >ERROR: modpost: "pm_suspend_target_state"
> > [drivers/gpu/drm/amd/amdgpu/amdgpu.ko
Also, please use C style comments.
Alex
On Tue, Aug 24, 2021 at 9:28 AM Tom St Denis wrote:
>
> hehehe I just moved it to uapi... No worries, you're the maintainer, I'll
> move it back before posting v2.
>
> Cheers,
> Tom
>
> On Tue, Aug 24, 2021 at 9:22 AM Christian König
> wrote:
>>
>> Am 2
On 2021-08-24 5:51 a.m., Monk Liu wrote:
the original logic is wrong that the timeout will not be retriggerd
after the previous job siganled, and that lead to the scenario that all
jobs in the same scheduler shares the same timeout timer from the very
begining job in this scheduler which is wro
This new debugfs interface uses an IOCTL interface in order to pass
along state information like SRBM and GRBM bank switching. This
new interface also allows a full 32-bit MMIO address range which
the previous didn't. With this new design we have room to grow
the flexibility of the file as need b
On 2021-08-24 10:56 a.m., Kazlauskas, Nicholas wrote:
> On 2021-08-24 9:59 a.m., Simon Ser wrote:
>> Hi Rodrigo!
>>
>> Thanks a lot for your reply! Comments below, please bear with me: I'm
>> a bit familiar with the cursor issues, but my knowledge of AMD hw is
>> still severely lacking.
>>
>> On
Trivial.
Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support")
Signed-off-by: Alex Deucher
---
.../gpu/drm/amd/display/dc/core/dc_link_dp.c | 18 +-
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c
b/dr
Trivial.
Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support")
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c
b/drivers/gpu/
Trivial.
Fixes: dfed73a863df ("drm/amd/display: Add DP 2.0 HPO Link Encoder")
Signed-off-by: Alex Deucher
---
.../gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c | 4
1 file changed, 4 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_link_encoder.c
b/d
Trivial.
Fixes: c0c9c87bcc5f ("drm/amd/display: Add DP 2.0 HPO Stream Encoder")
Signed-off-by: Alex Deucher
---
.../gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_stream_encoder.c | 2 --
1 file changed, 2 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/dcn31/dcn31_hpo_dp_stream_encoder.c
We need a default case to handle the additional enum values. While
here drop the need for a local variable.
Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support")
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/display/dc/core/dc_link_dp.c | 11 +++
1 file changed, 3 ins
Trivial.
Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support")
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/display/dc/core/dc_link.c | 52 ---
1 file changed, 52 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc_link.c
b/drivers/gpu/drm/amd/d
Patches 1, 3, 5-8 are
Reviewed-by: Harry Wentland
For some reason I didn't seem to get patches 2 and 4.
Harry
On 2021-08-24 12:51 p.m., Alex Deucher wrote:
> Trivial.
>
> Fixes: c0c9c87bcc5f ("drm/amd/display: Add DP 2.0 HPO Stream Encoder")
> Signed-off-by: Alex Deucher
> ---
> .../gpu/drm/
Fixes an unhandled cases warning and defaults to a more
logical return value.
Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support")
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --g
Trivial.
Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support")
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/display/dc/core/dc.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index bd
Patches 2 and 4 are
Reviewed-by: Harry Wentland
Harry
On 2021-08-24 1:36 p.m., Alex Deucher wrote:
> Trivial.
>
> Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support")
> Signed-off-by: Alex Deucher
> ---
> drivers/gpu/drm/amd/display/dc/core/dc.c | 3 ---
> 1 file changed, 3 dele
Fixes an unhandled cases warning and defaults to a more
logical return value.
Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support")
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/display/dc/core/dc_link_hwss.c | 5 ++---
1 file changed, 2 insertions(+), 3 deletions(-)
diff --g
Trivial.
Fixes: 808a662bb3a8 ("drm/amd/display: Add DP 2.0 SST DC Support")
Signed-off-by: Alex Deucher
---
drivers/gpu/drm/amd/display/dc/core/dc.c | 3 ---
1 file changed, 3 deletions(-)
diff --git a/drivers/gpu/drm/amd/display/dc/core/dc.c
b/drivers/gpu/drm/amd/display/dc/core/dc.c
index bd
CRIU is a user space tool which is very popular for container live migration in
datacentres. It can checkpoint a running application, save its complete state,
memory contents and all system resources to images on disk which can be
migrated to another m achine and restored later. More information
From: Rajneesh Bhardwaj
- Update debug config for Checkpoint-Restore (CR) support
- Also include necessary options for CR with docker containers.
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
arch/x86/configs/rock-dbg_defconfig | 53 ++---
1 file
When doing a restore on a different node, the gpu_id's on the restore
node may be different. But the user space application will still refer
use the original gpu_id's in the ioctl calls. Adding code to create a
gpu id mapping so that kfd can determine actual gpu_id during the user
ioctl's.
Signed-
Add support to existing CRIU ioctl's to save and restore events during
criu checkpoint and restore.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 61 +
drivers/gpu/drm/amd/amdkfd/kfd_events.c | 322 +--
drivers/gpu/drm/amd/amdkfd/kfd_priv.h
Introducing pause IOCTL. The CRIU amdgpu plugin is needs
to call AMDKFD_IOC_CRIU_PAUSE(pause = 1) before starting dump and
AMDKFD_IOC_CRIU_PAUSE(pause = 0) when dump is complete. This ensures
that the queues are not modified between each CRIU dump ioctl.
Signed-off-by: David Yat Sin
---
drivers/
From: Rajneesh Bhardwaj
This IOCTL is expected to be called as a precursor to the actual
Checkpoint operation. This does the basic discovery into the target
process seized by CRIU and relays the information to the userspace that
utilizes it to start the Checkpoint operation via another dedicated
When re-creating queues during CRIU restore, restore the queue with the
same doorbell id value used during CRIU dump.
Signed-off-by: David Yat Sin
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 61 +--
1 file changed, 41 insertions(+), 20 deletions(-)
diff --git a/drivers/g
Dump contents of queue control stacks on CRIU dump and restore them
during CRIU restore.
(rajneesh: rebased to 5.11 and fixed merge conflict)
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_d
From: Rajneesh Bhardwaj
Checkpoint-Restore in userspace (CRIU) is a powerful tool that can
snapshot a running process and later restore it on same or a remote
machine but expects the processes that have a device file (e.g. GPU)
associated with them, provide necessary driver support to assist CRIU
From: Rajneesh Bhardwaj
This reverts commit 12ebe2b9df192a2a8580cd9ee3e9940c116913c8.
This is just a temporary work around and will be dropped later.
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 7 +++
1 file changed, 7 inser
From: Rajneesh Bhardwaj
Update rock-rel_defconfig for monolithic kernel release that enables
CRIU support with kfd.
Signed-off-by: Rajneesh Bhardwaj
(cherry picked from commit 4a6d309a82648a23a4fc0add83013ac6db6187d5)
Signed-off-by: David Yat Sin
---
arch/x86/configs/rock-rel_defconfig | 13 +
When re-creating queues during CRIU restore, restore the queue with the
same queue id value used during CRIU dump.
Signed-off-by: Rajneesh Bhardwaj
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +-
driv
When re-creating queues during CRIU restore, restore the queue with the
same sdma id value used during CRIU dump.
Signed-off-by: David Yat Sin
---
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++-
.../drm/amd/amdkfd/kfd_device_queue_manager.h | 3 +-
.../amd/amdkfd/kfd_pro
From: Rajneesh Bhardwaj
This implements the KFD CRIU Restore ioctl that lays the basic
foundation for the CRIU restore operation. It provides support to
create the buffer objects corresponding to Non-Paged system memory
mapped for GPU and/or CPU access and lays basic foundation for the
userptrs b
From: Rajneesh Bhardwaj
This adds support to create userptr BOs on restore and introduces a new
ioctl to restart memory notifiers for the restored userptr BOs.
When doing CRIU restore MMU notifications can happen anytime after we call
amdgpu_mn_register. Prevent MMU notifications until we reach s
From: Rajneesh Bhardwaj
This adds support to discover the buffer objects that belong to a
process being checkpointed. The data corresponding to these buffer
objects is returned to user space plugin running under criu master
context which then stores this info to recreate these buffer objects
dur
Dump contents of queue MQD's on CRIU dump and restore them during CRIU
restore.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +-
drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +-
.../drm/amd/amdkfd/kfd_device_queue_manager.c | 70 ++--
.../d
Add support to existing CRIU ioctl's to save number of queues and queue
properties for each queue during checkpoint and re-create queues on
restore.
Signed-off-by: David Yat Sin
---
drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 16 +-
drivers/gpu/drm/amd/amdkfd/kfd_priv.h | 25 +-
..
From: Rajneesh Bhardwaj
KFD buffer objects do not associate a GEM handle with them so cannot
directly be used with libdrm to initiate a system dma (sDMA) operation
to speedup the checkpoint and restore operation so export them as dmabuf
objects and use with libdrm helper (amdgpu_bo_import) to fur
Bunch of fixes to enable passing hotplug tests i previosly added
here[1] with latest code.
Once accepted I will enable the tests on libdrm side.
[1] - https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/172
Andrey Grodzovsky (4):
drm/amdgpu: Move flush VCE idle_work during HW fini
drm/t
This list will be used to capture all non VRAM BOs not
on LRU so when device is hot unplugged we can iterate
the list and unmap DMA mappings before device is removed.
Signed-off-by: Andrey Grodzovsky
Suggested-by: Christian König
---
drivers/gpu/drm/ttm/ttm_bo.c | 24 +
Attepmts to powergate after device is removed lead to crash.
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu_vce.c | 1 -
drivers/gpu/drm/amd/amdgpu/vce_v2_0.c | 4
drivers/gpu/drm/amd/amdgpu/vce_v3_0.c | 5 -
drivers/gpu/drm/amd/amdgpu/vce_v4_0.c | 2 ++
4
Handle all DMA IOMMU group related dependencies before the
group is removed and we try to access it after free.
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +
drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c| 50 ++
drivers/gpu/drm/amd/amdg
To support libdrm tests.
Signed-off-by: Andrey Grodzovsky
---
drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c
index 6400259a7c4b..c2fdf67ff551 100644
-
Am 2021-08-23 um 8:36 p.m. schrieb Sean Keely:
> On systems with multiple SH per SE compute_static_thread_mgmt_se#
> is split into independent masks, one for each SH, in the upper and
> lower 16 bits. We need to detect this and apply cu masking to each
> SH. The cu mask bits are assigned first to
[AMD Official Use Only]
Hi Andrey,
I sent out a similar patch set to address S3 issue. And I believe it should be
able to address the issue here too.
https://lists.freedesktop.org/archives/amd-gfx/2021-August/067972.html
https://lists.freedesktop.org/archives/amd-gfx/2021-August/067967.html
BR
[AMD Official Use Only]
Right, sorry. These were two separate branches until checkpatch complained
about the nesting level. Then I broke it.
-Original Message-
From: Kuehling, Felix
Sent: Tuesday, August 24, 2021 7:54 PM
To: amd-gfx@lists.freedesktop.org; Keely, Sean
Subject: Re: [PA
On systems with multiple SH per SE compute_static_thread_mgmt_se#
is split into independent masks, one for each SH, in the upper and
lower 16 bits. We need to detect this and apply cu masking to each
SH. The cu mask bits are assigned first to each SE, then to
alternate SHs, then finally to higher
Right, they will cover my use case, when are they landing ? I rebased
today and haven't seen them.
Andrey
On 2021-08-24 9:41 p.m., Quan, Evan wrote:
[AMD Official Use Only]
Hi Andrey,
I sent out a similar patch set to address S3 issue. And I believe it should be
able to address the issue he
[AMD Official Use Only]
Just landed.
Thanks,
Evan
> -Original Message-
> From: Grodzovsky, Andrey
> Sent: Wednesday, August 25, 2021 11:20 AM
> To: Quan, Evan ; dri-de...@lists.freedesktop.org;
> amd-gfx@lists.freedesktop.org
> Cc: ckoenig.leichtzumer...@gmail.com
> Subject: Re: [PATCH 1
This reverts the commit below:
"drm/amdgpu: disable BACO support for 699F:C7 polaris12 SKU temporarily".
As the S3 hang issue has been fixed by another commit:
"drm/amdgpu: add missing cleanups for Polaris12 UVD/VCE on suspend".
Change-Id: I5ea08a75eedd7fe32c7fa0b448f5bae1f390abe6
Signed-off-by: E
Public device type memory on VRAM to RAM migration,
has similar access as System RAM from the CPU. This flag sets
the source from the sender. Which in Public type case,
should be set as IOMEM.
Signed-off-by: Alex Sierra
Reviewed-by: Felix Kuehling
---
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c |
Ref counter from device pages is init to zero during memmap init zone.
The first time a new device page is allocated to migrate data into it,
its ref counter needs to be initialized to one.
Signed-off-by: Alex Sierra
---
drivers/gpu/drm/amd/amdkfd/kfd_migrate.c | 2 +-
1 file changed, 1 insertio
Device Public type uses device memory that is coherently accesible by
the CPU. This could be shown as SP (special purpose) memory range
at the BIOS-e820 memory enumeration. If no SP memory is supported in
system, this could be faked by setting CONFIG_EFI_FAKE_MEMMAP.
Currently, test_hmm only suppo
In order to configure device public in test_hmm, two module parameters
should be passed, which correspond to the SP start address of each
device (2) spm_addr_dev0 & spm_addr_dev1. If no parameters are passed,
private device type is configured.
Signed-off-by: Alex Sierra
---
v5:
Remove devmem->pag
In this case, this is used to migrate pages from device memory, back to
system memory. This particular device memory type should be accessible
by the CPU, through IOMEM access. Typically, zone device public type
memory falls into this category.
Signed-off-by: Alex Sierra
---
include/linux/migrat
Add device public type case to migrate_vma_insert_page,
migrate_vma_pages and migrate_vma_check_page helpers.
Signed-off-by: Alex Sierra
---
mm/migrate.c | 19 ---
1 file changed, 12 insertions(+), 7 deletions(-)
diff --git a/mm/migrate.c b/mm/migrate.c
index d4ae2da99607..09817
From: Ralph Campbell
There are several places where ZONE_DEVICE struct pages assume a reference
count == 1 means the page is idle and free. Instead of open coding this,
add a helper function to hide this detail.
Signed-off-by: Ralph Campbell
Signed-off-by: Alex Sierra
Reviewed-by: Christoph He
Test cases such as migrate_fault and migrate_multiple,
were modified to explicit migrate from device to sys memory
without the need of page faults, when using device public
type.
Snapshot test case updated to read memory device type
first and based on that, get the proper returned results
migrate_
When CPU is connected throug XGMI, it has coherent
access to VRAM resource. In this case that resource
is taken from a table in the device gmc aperture base.
This resource is used along with the device type, which could
be DEVICE_PRIVATE or DEVICE_PUBLIC to create the device
page map region.
Signe
From: Ralph Campbell
ZONE_DEVICE struct pages have an extra reference count that complicates the
code for put_page() and several places in the kernel that need to check the
reference count to see that a page is not being used (gup, compaction,
migration, etc.). Clean up the code so the reference
Device memory that is cache coherent from device and CPU point of view.
This is use on platform that have an advance system bus (like CAPI or
CCIX). Any page of a process can be migrated to such memory. However,
no one should be allow to pin such memory so that it can always be
evicted.
Signed-off
new ioctl cmd added to query zone device type. This will be
used once the test_hmm adds zone device public type.
Signed-off-by: Alex Sierra
---
lib/test_hmm.c | 15 ++-
lib/test_hmm_uapi.h | 7 +++
2 files changed, 21 insertions(+), 1 deletion(-)
diff --git a/lib/test_hmm.
1 - 100 of 109 matches
Mail list logo