[Patch v5 00/24] CHECKPOINT RESTORE WITH ROCm

2022-02-03 Thread Rajneesh Bhardwaj
V5: Proposed IOCTL APIs for CRIU with consolidated feedback CRIU is a user space tool which is very popular for container live migration in datacentres. It can checkpoint a running application, save its complete state, memory contents and all system resources to images on disk which can be migrate

[Patch v5 01/24] x86/configs: CRIU update debug rock defconfig

2022-02-03 Thread Rajneesh Bhardwaj
- Update debug config for Checkpoint-Restore (CR) support - Also include necessary options for CR with docker containers. Reviewed-by: Felix Kuehling Signed-off-by: Rajneesh Bhardwaj --- arch/x86/configs/rock-dbg_defconfig | 53 ++--- 1 file changed, 34 insertions(+),

[Patch v5 03/24] drm/amdkfd: CRIU Implement KFD process_info ioctl

2022-02-03 Thread Rajneesh Bhardwaj
This IOCTL op is expected to be called as a precursor to the actual Checkpoint operation. This does the basic discovery into the target process seized by CRIU and relays the information to the userspace that utilizes it to start the Checkpoint operation via another dedicated IOCTL op. The process_

[Patch v5 04/24] drm/amdkfd: CRIU Implement KFD checkpoint ioctl

2022-02-03 Thread Rajneesh Bhardwaj
This adds support to discover the buffer objects that belong to a process being checkpointed. The data corresponding to these buffer objects is returned to user space plugin running under criu master context which then stores this info to recreate these buffer objects during a restore operation.

[Patch v5 06/24] drm/amdkfd: CRIU Implement KFD resume ioctl

2022-02-03 Thread Rajneesh Bhardwaj
This adds support to create userptr BOs on restore and introduces a new ioctl op to restart memory notifiers for the restored userptr BOs. When doing CRIU restore MMU notifications can happen anytime after we call amdgpu_mn_register. Prevent MMU notifications until we reach stage-4 of the restore p

[Patch v5 02/24] drm/amdkfd: CRIU Introduce Checkpoint-Restore APIs

2022-02-03 Thread Rajneesh Bhardwaj
Checkpoint-Restore in userspace (CRIU) is a powerful tool that can snapshot a running process and later restore it on same or a remote machine but expects the processes that have a device file (e.g. GPU) associated with them, provide necessary driver support to assist CRIU and its extensible plugin

[Patch v5 09/24] drm/amdkfd: CRIU restore queue ids

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same queue id value used during CRIU dump. Signed-off-by: Rajneesh Bhardwaj Signed-off-by: David Yat Sin --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev

[Patch v5 05/24] drm/amdkfd: CRIU Implement KFD restore ioctl

2022-02-03 Thread Rajneesh Bhardwaj
This implements the KFD CRIU Restore ioctl that lays the basic foundation for the CRIU restore operation. It provides support to create the buffer objects corresponding to the checkpointed image. This ioctl creates various types of buffer objects such as VRAM, MMIO, Doorbell, GTT based on the date

[Patch v5 07/24] drm/amdkfd: CRIU Implement KFD unpause operation

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Introducing UNPAUSE op. After CRIU amdgpu plugin performs a PROCESS_INFO op the queues will be stay in an evicted state. Once the plugin is done draining BO contents, it is safe to perform an UNPAUSE op for the queues to resume. Signed-off-by: David Yat Sin Signed-off-by: Ra

[Patch v5 10/24] drm/amdkfd: CRIU restore sdma id for queues

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same sdma id value used during CRIU dump. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 48 ++- .../drm/amd/amdkfd/kf

[Patch v5 11/24] drm/amdkfd: CRIU restore queue doorbell id

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin When re-creating queues during CRIU restore, restore the queue with the same doorbell id value used during CRIU dump. Signed-off-by: David Yat Sin --- .../drm/amd/amdkfd/kfd_device_queue_manager.c | 60 +-- 1 file changed, 41 insertions(+), 19 deletions(-)

[Patch v5 08/24] drm/amdkfd: CRIU add queues support

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Add support to existing CRIU ioctl's to save number of queues and queue properties for each queue during checkpoint and re-create queues on restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 110 ++

[Patch v5 21/24] drm/amdkfd: CRIU Save Shared Virtual Memory ranges

2022-02-03 Thread Rajneesh Bhardwaj
During checkpoint stage, save the shared virtual memory ranges and attributes for the target process. A process may contain a number of svm ranges and each range might contain a number of attributes. While not all attributes may be applicable for a given prange but during checkpoint we store all po

[Patch v5 17/24] drm/amdkfd: CRIU checkpoint and restore xnack mode

2022-02-03 Thread Rajneesh Bhardwaj
Recoverable page faults are represented by the xnack mode setting inside a kfd process and are used to represent the device page faults. For CR, we don't consider negative values which are typically used for querying the current xnack mode without modifying it. Signed-off-by: Rajneesh Bhardwaj --

[Patch v5 22/24] drm/amdkfd: CRIU prepare for svm resume

2022-02-03 Thread Rajneesh Bhardwaj
During CRIU restore phase, the VMAs for the virtual address ranges are not at their final location yet so in this stage, only cache the data required to successfully resume the svm ranges during an imminent CRIU resume phase. Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_ch

[Patch v5 12/24] drm/amdkfd: CRIU checkpoint and restore queue mqds

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Checkpoint contents of queue MQD's on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- .../drm/a

[Patch v5 24/24] drm/amdkfd: Bump up KFD API version for CRIU

2022-02-03 Thread Rajneesh Bhardwaj
- Change KFD minor version to 7 for CRIU Proposed userspace changes: https://github.com/RadeonOpenCompute/criu Signed-off-by: Rajneesh Bhardwaj --- include/uapi/linux/kfd_ioctl.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/include/uapi/linux/kfd_ioctl.h b/include/uapi

[Patch v5 16/24] drm/amdkfd: CRIU export BOs as prime dmabuf objects

2022-02-03 Thread Rajneesh Bhardwaj
KFD buffer objects do not associate a GEM handle with them so cannot directly be used with libdrm to initiate a system dma (sDMA) operation to speedup the checkpoint and restore operation so export them as dmabuf objects and use with libdrm helper (amdgpu_bo_import) to further process the sdma comm

[Patch v5 14/24] drm/amdkfd: CRIU checkpoint and restore events

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Add support to existing CRIU ioctl's to save and restore events during criu checkpoint and restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 70 +- drivers/gpu/drm/amd/amdkfd/kfd_events.c | 272 +

[Patch v5 19/24] drm/amdkfd: use user_gpu_id for svm ranges

2022-02-03 Thread Rajneesh Bhardwaj
Currently the SVM ranges use actual_gpu_id but with Checkpoint Restore support its possible that the SVM ranges can be resumed on another node where the actual_gpu_id may not be same as the original (user_gpu_id) gpu id. So modify svm code to use user_gpu_id. Signed-off-by: Rajneesh Bhardwaj ---

[Patch v5 15/24] drm/amdkfd: CRIU implement gpu_id remapping

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin When doing a restore on a different node, the gpu_id's on the restore node may be different. But the user space application will still refer use the original gpu_id's in the ioctl calls. Adding code to create a gpu id mapping so that kfd can determine actual gpu_id during the

[Patch v5 20/24] drm/amdkfd: CRIU Discover svm ranges

2022-02-03 Thread Rajneesh Bhardwaj
A KFD process may contain a number of virtual address ranges for shared virtual memory management and each such range can have many SVM attributes spanning across various nodes within the process boundary. This change reports the total number of such SVM ranges and their total private data size by

[Patch v5 13/24] drm/amdkfd: CRIU checkpoint and restore queue control stack

2022-02-03 Thread Rajneesh Bhardwaj
From: David Yat Sin Checkpoint contents of queue control stacks on CRIU dump and restore them during CRIU restore. Signed-off-by: David Yat Sin Signed-off-by: Rajneesh Bhardwaj --- drivers/gpu/drm/amd/amdkfd/kfd_chardev.c | 2 +- drivers/gpu/drm/amd/amdkfd/kfd_dbgdev.c | 2 +- ..

[Patch v5 23/24] drm/amdkfd: CRIU resume shared virtual memory ranges

2022-02-03 Thread Rajneesh Bhardwaj
In CRIU resume stage, resume all the shared virtual memory ranges from the data stored inside the resuming kfd process during CRIU restore phase. Also setup xnack mode and free up the resources. KFD_IOCTL_SVM_ATTR_CLR_FLAGS is not available for querying via get_attr interface but we must clear the

[Patch v5 18/24] drm/amdkfd: CRIU allow external mm for svm ranges

2022-02-03 Thread Rajneesh Bhardwaj
Both svm_range_get_attr and svm_range_set_attr helpers use mm struct from current but for a Checkpoint or Restore operation, the current->mm will fetch the mm for the CRIU master process. So modify these helpers to accept the task mm for a target kfd process to support Checkpoint Restore. Signed-o

Re: binary constants (was: Re: [PATCH v3] drm/dp: Add Additional DP2 Headers)

2022-02-03 Thread Daniel Vetter
On Thu, Feb 3, 2022 at 12:58 PM Jani Nikula wrote: > > On Mon, 27 Sep 2021, Fangzhi Zuo wrote: > > +/* DSC Extended Capability Branch Total DSC Resources */ > > +#define DP_DSC_SUPPORT_AND_DSC_DECODER_COUNT 0x2260 /* 2.0 */ > > +# define DP_DSC_DECODER_COUNT_MASK (0b111

[PATCH 1/7] drm/selftests: Move i915 buddy selftests into drm

2022-02-03 Thread Arunpravin
- move i915 buddy selftests into drm selftests folder - add Makefile and Kconfig support - add sanitycheck testcase Prerequisites - These series of selftests patches are created on top of drm buddy series - Enable kselftests for DRM as a module in .config Signed-off-by: Arunpravin --- drivers

[PATCH 2/7] drm/selftests: add drm buddy alloc limit testcase

2022-02-03 Thread Arunpravin
add a test to check the maximum allocation limit Signed-off-by: Arunpravin --- .../gpu/drm/selftests/drm_buddy_selftests.h | 1 + drivers/gpu/drm/selftests/test-drm_buddy.c| 60 +++ 2 files changed, 61 insertions(+) diff --git a/drivers/gpu/drm/selftests/drm_buddy_selftes

[PATCH 4/7] drm/selftests: add drm buddy optimistic testcase

2022-02-03 Thread Arunpravin
create a mm with one block of each order available, and try to allocate them all. Signed-off-by: Arunpravin --- .../gpu/drm/selftests/drm_buddy_selftests.h | 1 + drivers/gpu/drm/selftests/test-drm_buddy.c| 82 +++ 2 files changed, 83 insertions(+) diff --git a/drivers/gp

[PATCH 3/7] drm/selftests: add drm buddy alloc range testcase

2022-02-03 Thread Arunpravin
- add a test to check the range allocation - export get_buddy() function in drm_buddy.c - export drm_prandom_u32_max_state() in lib/drm_random.c - include helper functions - include prime number header file Signed-off-by: Arunpravin --- drivers/gpu/drm/drm_buddy.c | 20 +- dri

[PATCH 5/7] drm/selftests: add drm buddy pessimistic testcase

2022-02-03 Thread Arunpravin
create a pot-sized mm, then allocate one of each possible order within. This should leave the mm with exactly one page left. Signed-off-by: Arunpravin --- .../gpu/drm/selftests/drm_buddy_selftests.h | 1 + drivers/gpu/drm/selftests/test-drm_buddy.c| 153 ++ 2 files change

[PATCH 6/7] drm/selftests: add drm buddy smoke testcase

2022-02-03 Thread Arunpravin
- add a test to ascertain that the critical functionalities of the program is working fine - add a timeout helper function Signed-off-by: Arunpravin --- .../gpu/drm/selftests/drm_buddy_selftests.h | 1 + drivers/gpu/drm/selftests/test-drm_buddy.c| 143 ++ 2 files change

[PATCH 7/7] drm/selftests: add drm buddy pathological testcase

2022-02-03 Thread Arunpravin
create a pot-sized mm, then allocate one of each possible order within. This should leave the mm with exactly one page left. Free the largest block, then whittle down again. Eventually we will have a fully 50% fragmented mm. Signed-off-by: Arunpravin --- .../gpu/drm/selftests/drm_buddy_selftests

binary constants (was: Re: [PATCH v3] drm/dp: Add Additional DP2 Headers)

2022-02-03 Thread Jani Nikula
On Mon, 27 Sep 2021, Fangzhi Zuo wrote: > +/* DSC Extended Capability Branch Total DSC Resources */ > +#define DP_DSC_SUPPORT_AND_DSC_DECODER_COUNT 0x2260 /* 2.0 */ > +# define DP_DSC_DECODER_COUNT_MASK (0b111 << 5) > +# define DP_DSC_DECODER_COUNT_SHIFT

RE: [Patch v5 15/24] drm/amdkfd: CRIU implement gpu_id remapping

2022-02-03 Thread Yat Sin, David
One nit pick. Regards, David @@ -673,15 +693,19 @@ static int kfd_ioctl_dbg_address_watch(struct file *filep, memset((void *) &aw_info, 0, sizeof(struct dbg_address_watch_info)); - dev = kfd_device_by_id(args->gpu_id); - if (!dev) + mutex_lock(&p->mutex); + pdd

Re: [PATCH 1/7] drm/selftests: Move i915 buddy selftests into drm

2022-02-03 Thread Christian König
Am 03.02.22 um 14:32 schrieb Arunpravin: - move i915 buddy selftests into drm selftests folder - add Makefile and Kconfig support - add sanitycheck testcase Prerequisites - These series of selftests patches are created on top of drm buddy series - Enable kselftests for DRM as a module in .con

Re: [PATCH] drm/amd/display: Handle removed connector in early_unregister

2022-02-03 Thread Harry Wentland
On 2022-02-02 13:49, Fangzhi Zuo wrote: > From: Wayne Lin > > [Why] > commit "drm/amd/display: turn DPMS off on connector unplug" and > commit "drm/amd/display: Clear dc remote sinks on MST disconnect" > were trying to resolve the resource problem when we connectors get > disconnected under MS

Re: [PATCH -next] drm/amdkfd: Fix resource_size.cocci warning

2022-02-03 Thread Felix Kuehling
Am 2022-02-03 um 00:04 schrieb Yang Li: Use resource_size function on resource object instead of explicit computation. Eliminate the following coccicheck warning: ./drivers/gpu/drm/amd/amdkfd/kfd_migrate.c:978:11-14: ERROR: Missing resource_size with res Reported-by: Abaci Robot Signed-off-b

[PATCH] drm/amd/display: Handle removed connector in early_unregister

2022-02-03 Thread Fangzhi Zuo
From: Wayne Lin This patch lived in our internal branch since August but somehow missed the merge to upstream. Original Patch: (dc: Handle removed connector in early_unregister) Signed-off-by: Wayne Lin Signed-off-by: Fangzhi Zuo --- .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 7 .

[PATCH v2] drm/amd/display: Handle removed connector in early_unregister

2022-02-03 Thread Fangzhi Zuo
From: Wayne Lin This patch lived in our internal branch since August but somehow missed the merge to upstream. Original patch description: [Why] commit "drm/amd/display: turn DPMS off on connector unplug" and commit "drm/amd/display: Clear dc remote sinks on MST disconnect" were trying to resol

Re: [PATCH v2] drm/amd/display: Handle removed connector in early_unregister

2022-02-03 Thread Harry Wentland
On 2022-02-03 13:17, Fangzhi Zuo wrote: > From: Wayne Lin > > This patch lived in our internal branch since August > but somehow missed the merge to upstream. > > Original patch description: > > [Why] > commit "drm/amd/display: turn DPMS off on connector unplug" and > commit "drm/amd/display: C

[PATCH] drm/amdgpu/display: change pipe policy for DCN 2.0

2022-02-03 Thread Alex Deucher
Fixes hangs on driver load on DCN 2.0 parts. Bug: https://bugzilla.kernel.org/show_bug.cgi?id=215511 Fixes: ee2698cf79cc ("drm/amd/display: Changed pipe split policy to allow for multi-display pipe split") Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/dcn20/dcn20_resource.c | 2

[PATCH 1/3] drm/amdgpu: add missing license to dpcs_3_0_0 headers

2022-02-03 Thread Alex Deucher
MIT. Signed-off-by: Alex Deucher --- .../gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_0_offset.h | 7 +++ .../gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_0_sh_mask.h | 7 +++ 2 files changed, 14 insertions(+) diff --git a/drivers/gpu/drm/amd/include/asic_reg/dcn/dpcs_3_0_0_offset.h b/dri

[PATCH 3/3] drm/amdgpu: move dpcs_3_0_3 headers from dcn to dpcs

2022-02-03 Thread Alex Deucher
To align with other headers. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/dcn303/dcn303_resource.c | 4 ++-- .../amd/include/asic_reg/{dcn => dpcs}/dpcs_3_0_3_offset.h| 0 .../amd/include/asic_reg/{dcn => dpcs}/dpcs_3_0_3_sh_mask.h | 0 3 files changed, 2 insertions

[PATCH 2/3] drm/amdgpu: move dpcs_3_0_0 headers from dcn to dpcs

2022-02-03 Thread Alex Deucher
To align with other headers. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/display/dc/clk_mgr/dcn30/dcn30_clk_mgr.c | 4 ++-- drivers/gpu/drm/amd/display/dc/dcn30/dcn30_resource.c | 4 ++-- drivers/gpu/drm/amd/display/dc/dcn301/dcn301_resource.c | 4 ++-- drivers/gpu/drm/amd

[PATCH] drm/amdgpu: drop experimental flag on aldebaran

2022-02-03 Thread Alex Deucher
These have been at production level for a while. Drop the flag. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c i

Re: [PATCH] drm/amdgpu: drop experimental flag on aldebaran

2022-02-03 Thread Felix Kuehling
Am 2022-02-03 um 14:09 schrieb Alex Deucher: These have been at production level for a while. Drop the flag. Signed-off-by: Alex Deucher Reviewed-by: Felix Kuehling --- drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/d

[PATCH AUTOSEL 5.16 38/52] drm/amd/display: Correct MPC split policy for DCN301

2022-02-03 Thread Sasha Levin
From: Zhan Liu [ Upstream commit ac46d93235074a6c5d280d35771c23fd8620e7d9 ] [Why] DCN301 has seamless boot enabled. With MPC split enabled at the same time, system will hang. [How] Revert MPC split policy back to "MPC_SPLIT_AVOID". Since we have ODM combine enabled on DCN301, pipe split is not

[PATCH AUTOSEL 5.16 39/52] drm/amdgpu/display: adjust msleep limit in dp_wait_for_training_aux_rd_interval

2022-02-03 Thread Sasha Levin
From: Alex Deucher [ Upstream commit dc919d670c6fd1ac81ebf31625cd19579f7b3d4c ] Some architectures (e.g., ARM) have relatively low udelay limits. On most architectures, anything longer than 2000us is not recommended. Change the check to align with other similar checks in DC. Reviewed-by: Harry

[PATCH AUTOSEL 5.16 40/52] drm/amdgpu/display: use msleep rather than udelay for long delays

2022-02-03 Thread Sasha Levin
From: Alex Deucher [ Upstream commit 98fdcacb45f7cd2092151d6af2e60152811eb79c ] Some architectures (e.g., ARM) throw an compilation error if the udelay is too long. In general udelays of longer than 2000us are not recommended on any architecture. Switch to msleep in these cases. Reviewed-by:

[PATCH AUTOSEL 5.15 35/41] drm/amd/display: Correct MPC split policy for DCN301

2022-02-03 Thread Sasha Levin
From: Zhan Liu [ Upstream commit ac46d93235074a6c5d280d35771c23fd8620e7d9 ] [Why] DCN301 has seamless boot enabled. With MPC split enabled at the same time, system will hang. [How] Revert MPC split policy back to "MPC_SPLIT_AVOID". Since we have ODM combine enabled on DCN301, pipe split is not

[PATCH] drm/amdgpu: Fix wait for RLCG command completion

2022-02-03 Thread Victor Skvortsov
if (!(tmp & flag)) condition will always evaluate to true when the flag is 0x0 (AMDGPU_RLCG_GC_WRITE). Instead check that address bits are cleared to determine whether the command is complete. Signed-off-by: Victor Skvortsov --- drivers/gpu/drm/amd/amdgpu/amdgpu_virt.c | 2 +- drivers/gpu/drm/am

[PATCH] drm/amdgpu: Print once if RAS unsupported

2022-02-03 Thread Luben Tuikov
MESA polls for errors every 2-3 seconds. Printing with dev_info() causes the dmesg log to fill up with the same message, e.g, [18028.206676] amdgpu :0b:00.0: amdgpu: df doesn't config ras function. Make it dev_info_once(), as it isn't something correctible during boot, so printing just once i

[PATCH] drm/amd/display: Cap pflip irqs per max otg number

2022-02-03 Thread Roman.Li
From: Roman Li [Why] pflip interrupt order are mapped 1 to 1 to otg id. e.g. if irq_src=26 corresponds to otg0 then 27->otg1, 28->otg2... Linux DM registers pflip interrupts per number of crtcs. In fused pipe case crtc numbers can be less than otg id. e.g. if one pipe out of 3(otg#0-2) is fused

Re: [PATCH] drm/amdgpu: Print once if RAS unsupported

2022-02-03 Thread Deucher, Alexander
[AMD Official Use Only] We can probably just make these dev_dbg(). The vast majority of cards are non-RAS. No need to print this at all in most cases. Alex From: Tuikov, Luben Sent: Thursday, February 3, 2022 5:14 PM To: amd-gfx@lists.freedesktop.org Cc: Tui

Re: [PATCH] drm/amd/display: Cap pflip irqs per max otg number

2022-02-03 Thread Kazlauskas, Nicholas
On 2/3/2022 5:14 PM, roman...@amd.com wrote: From: Roman Li [Why] pflip interrupt order are mapped 1 to 1 to otg id. e.g. if irq_src=26 corresponds to otg0 then 27->otg1, 28->otg2... Linux DM registers pflip interrupts per number of crtcs. In fused pipe case crtc numbers can be less than otg i

[PATCH v1] drm/amdgpu: Print once if RAS unsupported

2022-02-03 Thread Luben Tuikov
MESA polls for errors every 2-3 seconds. Printing with dev_info() causes the dmesg log to fill up with the same message, e.g, [18028.206676] amdgpu :0b:00.0: amdgpu: df doesn't config ras function. Make it dev_dbg_once(), as it isn't something correctible during boot or thereafter, so printin

Re: [PATCH v1] drm/amdgpu: Print once if RAS unsupported

2022-02-03 Thread Alex Deucher
On Thu, Feb 3, 2022 at 6:14 PM Luben Tuikov wrote: > > MESA polls for errors every 2-3 seconds. Printing with dev_info() causes > the dmesg log to fill up with the same message, e.g, > > [18028.206676] amdgpu :0b:00.0: amdgpu: df doesn't config ras function. > > Make it dev_dbg_once(), as it i

[PATCH 3/3] drm/amdgpu: Prevent random memory access in FRU code

2022-02-03 Thread Luben Tuikov
Prevent random memory access in the FRU EEPROM code by passing the size of the destination buffer to the reading routine, and reading no more than the size of the buffer. Cc: Kent Russell Cc: Alex Deucher Signed-off-by: Luben Tuikov --- .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 21 ++

[PATCH 1/3] drm/amdgpu: Don't offset by 2 in FRU EEPROM

2022-02-03 Thread Luben Tuikov
Read buffers no longer expose the I2C address, and so we don't need to offset by two when we get the read data. Cc: Alex Deucher Cc: Kent Russell Cc: Andrey Grodzovsky Fixes: bd607166af7fe3 ("drm/amdgpu: Enable reading FRU chip via I2C v3") Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/

[PATCH 2/3] drm/amdgpu: Nerf "buff" to "buf"

2022-02-03 Thread Luben Tuikov
Buffer is abbreviated "buf", not "buff", which means something entirely different. Cc: Kent Russell Cc: Alex Deucher Signed-off-by: Luben Tuikov --- .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 22 +-- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/drivers/g

[PATCH] drm/amdgpu: Fix recursive locking warning

2022-02-03 Thread Rajneesh Bhardwaj
Noticed the below warning while running a pytorch workload on vega10 GPUs. Change to trylock to avoid conflicts with already held reservation locks. [ +0.03] WARNING: possible recursive locking detected [ +0.03] 5.13.0-kfd-rajneesh #1030 Not tainted [ +0.04]

Re: [Patch v5 00/24] CHECKPOINT RESTORE WITH ROCm

2022-02-03 Thread Felix Kuehling
The series is Reviewed-by: Felix Kuehling Am 2022-02-03 um 04:08 schrieb Rajneesh Bhardwaj: V5: Proposed IOCTL APIs for CRIU with consolidated feedback CRIU is a user space tool which is very popular for container live migration in datacentres. It can checkpoint a running application, save i

RE: [Patch v5 00/24] CHECKPOINT RESTORE WITH ROCm

2022-02-03 Thread Bhardwaj, Rajneesh
[AMD Official Use Only] Thank you Felix for the review and your guidance. -Original Message- From: Kuehling, Felix Sent: Thursday, February 3, 2022 10:22 PM To: Bhardwaj, Rajneesh ; amd-gfx@lists.freedesktop.org Cc: Yat Sin, David ; Deucher, Alexander ; dri-de...@lists.freedesktop.org

[PATCH v1 0/3] AMDGPU FRU fixes

2022-02-03 Thread Luben Tuikov
Reordered the patches; fixed some bugs. Luben Tuikov (3): drm/amdgpu: Nerf "buff" to "buf" drm/amdgpu: Don't offset by 2 in FRU EEPROM drm/amdgpu: Prevent random memory access in FRU code Cc: Alex Deucher Cc: Kent Russell Cc: Andrey Grodzovsky .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c

[PATCH v1 1/3] drm/amdgpu: Nerf "buff" to "buf"

2022-02-03 Thread Luben Tuikov
Buffer is abbreviated "buf" (buf-fer), not "buff" (buff-er). This is consistent with the rest of the kernel code. Cc: Kent Russell Cc: Alex Deucher Signed-off-by: Luben Tuikov --- .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 28 +-- 1 file changed, 14 insertions(+), 14 delet

[PATCH v1 3/3] drm/amdgpu: Prevent random memory access in FRU code

2022-02-03 Thread Luben Tuikov
Prevent random memory access in the FRU EEPROM code by passing the size of the destination buffer to the reading routine, and reading no more than the size of the buffer. Cc: Kent Russell Cc: Alex Deucher Signed-off-by: Luben Tuikov --- .../gpu/drm/amd/amdgpu/amdgpu_fru_eeprom.c| 21 ++

[PATCH v1 2/3] drm/amdgpu: Don't offset by 2 in FRU EEPROM

2022-02-03 Thread Luben Tuikov
Read buffers no longer expose the I2C address, and so we don't need to offset by two when we get the read data. Cc: Alex Deucher Cc: Kent Russell Cc: Andrey Grodzovsky Fixes: bd607166af7fe3 ("drm/amdgpu: Enable reading FRU chip via I2C v3") Signed-off-by: Luben Tuikov --- drivers/gpu/drm/amd/

Re: [PATCH] drm/amdgpu: Fix recursive locking warning

2022-02-03 Thread Christian König
Am 04.02.22 um 04:11 schrieb Rajneesh Bhardwaj: Noticed the below warning while running a pytorch workload on vega10 GPUs. Change to trylock to avoid conflicts with already held reservation locks. [ +0.03] WARNING: possible recursive locking detected [ +0.03] 5.13.0-kfd-rajneesh #1030