XDC 2024: Call for Proposals deadline extended to August 19

2024-08-14 Thread Mark Filion
Hello! The CfP deadline for talks, workshops and demos at XDC 2024 has been extended to next Monday, 19 August 2024.  You have one more week to submit, don't wait! https://indico.freedesktop.org/event/6/abstracts/ While any serious proposal will be gratefully considered, topics of interest to X

Re: [PATCH] drm/amdgpu: Remove hidden double memset from amdgpu_vm_pt_clear()

2024-08-14 Thread Tvrtko Ursulin
On 13/08/2024 15:08, Tvrtko Ursulin wrote: From: Tvrtko Ursulin When CONFIG_INIT_STACK_ALL_ZERO is set and so -ftrivial-auto-var-init=zero compiler option active, compiler fails to notice that later in amdgpu_vm_pt_clear() there is a second memset to clear the same on stack struct amdgpu_vm_

RE: [PATCH 2/2] drm/amd/pm: ensure the fw_info is not null before using it

2024-08-14 Thread Zhang, Jesse(Jie)
[AMD Official Use Only - AMD Internal Distribution Only] This patch is Reviewed-by: Jesse Zhang -Original Message- From: Huang, Tim Sent: Friday, August 9, 2024 3:34 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian ; Zhang, Jesse(Jie) ; Zhou, Bob ; Huang,

RE: [PATCH 1/2] drm/amdgpu: ensure the connector is not null before using it

2024-08-14 Thread Zhang, Jesse(Jie)
[AMD Official Use Only - AMD Internal Distribution Only] This patch is Reviewed-by: Jesse Zhang -Original Message- From: Huang, Tim Sent: Friday, August 9, 2024 3:34 PM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Koenig, Christian ; Zhang, Jesse(Jie) ; Zhou, Bob ; Huang

[PATCH v3] drm/amdgpu: Take IOMMU remapping into account for p2p checks

2024-08-14 Thread Rahul Jain
when trying to enable p2p the amdgpu_device_is_peer_accessible() checks the condition where address_mask overlaps the aper_base and hence returns 0, due to which the p2p disables for this platform IOMMU should remap the BAR addresses so the device can access them. Hence check if peer_adev is remap

[PATCH] drm/amdgpu/gfx11: return early in preempt_ib()

2024-08-14 Thread Alex Deucher
When MES is enabled KIQ is not available. Return an error when someone uses the debugfs preempt test interface in that case. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/gfx_v11_0.c b/dr

RE: [PATCH v1 00/18] VCN patches with updated register list

2024-08-14 Thread Liu, Leo
[AMD Official Use Only - AMD Internal Distribution Only] The series is: Acked-by: Leo Liu > -Original Message- > From: Sunil Khatri > Sent: Tuesday, August 13, 2024 7:30 AM > To: Deucher, Alexander ; Lazar, Lijo > ; Liu, Leo > Cc: amd-gfx@lists.freedesktop.org; Khatri, Sunil > Subject

Re: [PATCH] drm/buddy: fix issue that force_merge cannot free all roots

2024-08-14 Thread Matthew Auld
On 13/08/2024 10:44, Lin.Cao wrote: If buddy manager have more than one roots and each root have sub-block need to be free. When drm_buddy_fini called, the first loop of force_merge will merge and free all of the sub block of first root, which offset is 0x0 and size is biggest(more than have of t

RE: [PATCH 1/3] drm/amdgpu: Implement MES Suspend and Resume APIs for GFX11

2024-08-14 Thread Kasiviswanathan, Harish
[AMD Official Use Only - AMD Internal Distribution Only] Do we need some checks for FW version for backward compatibility? Apart from that a minor typo in the commit message. "Support 'f'or GFX12" -Original Message- From: amd-gfx On Behalf Of Mukul Joshi Sent: Tuesday, August 13, 2024 2

[PATCH] drm/amdgpu/sdma5.2: limit wptr workaround to sdma 5.2.1

2024-08-14 Thread Alex Deucher
The workaround seems to cause stability issues on other SDMA 5.2.x IPs. Fixes: a03ebf116303 ("drm/amdgpu/sdma5.2: Update wptr registers as well as doorbell") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3556 Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/sdma_v5_2.c | 18

Re: [PATCH] drm/amdkfd: keep create queue success if cwsr save area doesn't match

2024-08-14 Thread Felix Kuehling
On 2024-08-14 2:35, Zhang, Yifan wrote: > [AMD Official Use Only - AMD Internal Distribution Only] > > AFAIK, for low level libraries, e.g. LLVM, ROCr, Hip/OpenCL runtimes, all > GPUs are supported. But for the mathlibs and frameworks, only limited GPUs > are supported. E.g. : > > https://gi

RE: [PATCH 2/3] drm/amdkfd: Update queue unmap after VM fault with MES

2024-08-14 Thread Kasiviswanathan, Harish
[AMD Official Use Only - AMD Internal Distribution Only] -Original Message- From: amd-gfx On Behalf Of Mukul Joshi Sent: Tuesday, August 13, 2024 2:57 PM To: amd-gfx@lists.freedesktop.org Cc: Kuehling, Felix ; Deucher, Alexander ; Joshi, Mukul Subject: [PATCH 2/3] drm/amdkfd: Update que

Re: [PATCH v3] drm/amdgpu: Take IOMMU remapping into account for p2p checks

2024-08-14 Thread Alex Deucher
On Wed, Aug 14, 2024 at 5:15 AM Rahul Jain wrote: > > when trying to enable p2p the amdgpu_device_is_peer_accessible() > checks the condition where address_mask overlaps the aper_base > and hence returns 0, due to which the p2p disables for this platform > > IOMMU should remap the BAR addresses so

RE: [PATCH 2/3] drm/amdkfd: Update queue unmap after VM fault with MES

2024-08-14 Thread Joshi, Mukul
[AMD Official Use Only - AMD Internal Distribution Only] > -Original Message- > From: Kasiviswanathan, Harish > Sent: Wednesday, August 14, 2024 11:17 AM > To: Joshi, Mukul ; amd-gfx@lists.freedesktop.org > Cc: Kuehling, Felix ; Deucher, Alexander > ; Joshi, Mukul > Subject: RE: [PATCH 2

RE: [PATCH 3/3] drm/amdkfd: Update BadOpcode Interrupt handling with MES

2024-08-14 Thread Kasiviswanathan, Harish
[AMD Official Use Only - AMD Internal Distribution Only] Acked-by: Harish Kasiviswanathan -Original Message- From: amd-gfx On Behalf Of Mukul Joshi Sent: Tuesday, August 13, 2024 2:57 PM To: amd-gfx@lists.freedesktop.org Cc: Kuehling, Felix ; Deucher, Alexander ; Joshi, Mukul Subjec

RE: [PATCH 1/3] drm/amdgpu: Implement MES Suspend and Resume APIs for GFX11

2024-08-14 Thread Joshi, Mukul
[AMD Official Use Only - AMD Internal Distribution Only] > -Original Message- > From: Kasiviswanathan, Harish > Sent: Wednesday, August 14, 2024 10:37 AM > To: Joshi, Mukul ; amd-gfx@lists.freedesktop.org > Cc: Kuehling, Felix ; Deucher, Alexander > ; Joshi, Mukul > Subject: RE: [PATCH 1

RE: [PATCH] amd/gpu: drm/hisilicon: Remove unused declarations

2024-08-14 Thread Deucher, Alexander
[Public] > -Original Message- > From: Zhang Zekun > Sent: Monday, August 12, 2024 8:24 AM > To: Deucher, Alexander ; Koenig, Christian > ; Pan, Xinhui ; > airl...@gmail.com; dan...@ffwll.ch; amd-gfx@lists.freedesktop.org > Cc: zhangzeku...@huawei.com > Subject: [PATCH] amd/gpu: drm/hisili

RE: [PATCH v2] drm/radeon/evergreen_cs: fix int overflow errors in cs track offsets

2024-08-14 Thread Deucher, Alexander
[Public] > -Original Message- > From: Nikita Zhandarovich > Sent: Tuesday, August 6, 2024 1:19 PM > To: Deucher, Alexander ; Koenig, Christian > ; Pan, Xinhui ; David > Airlie ; Daniel Vetter > Cc: Nikita Zhandarovich ; Jerome Glisse > ; Dave Airlie ; amd- > g...@lists.freedesktop.org; d

Re: AMD drm patch workflow is broken for stable trees

2024-08-14 Thread Felix Kuehling
On 2024-08-12 11:00, Greg KH wrote: Hi all, As some of you have noticed, there's a TON of failure messages being sent out for AMD gpu driver commits that are tagged for stable backports. In short, you all are doing something really wrong with how you are tagging these. Hi Greg, I got notifica

Re: AMD drm patch workflow is broken for stable trees

2024-08-14 Thread Alex Deucher
On Wed, Aug 14, 2024 at 4:55 PM Felix Kuehling wrote: > > On 2024-08-12 11:00, Greg KH wrote: > > Hi all, > > > > As some of you have noticed, there's a TON of failure messages being > > sent out for AMD gpu driver commits that are tagged for stable > > backports. In short, you all are doing some

[pull] amdgpu drm-fixes-6.11

2024-08-14 Thread Alex Deucher
Hi Dave, Sima, Fixes for 6.11. The MES 12 updates are relatively large, but they are for GFX 12 which is new for 6.11. The following changes since commit 7c626ce4bae1ac14f60076d00eafe71af30450ba: Linux 6.11-rc3 (2024-08-11 14:27:14 -0700) are available in the Git repository at: https://gi

Re: [PATCH v3] drm/amdgpu: Take IOMMU remapping into account for p2p checks

2024-08-14 Thread Felix Kuehling
On 2024-08-14 11:17, Alex Deucher wrote: On Wed, Aug 14, 2024 at 5:15 AM Rahul Jain wrote: when trying to enable p2p the amdgpu_device_is_peer_accessible() checks the condition where address_mask overlaps the aper_base and hence returns 0, due to which the p2p disables for this platform IOMM

Re: [PATCH 1/2] drm/amdgpu: fix KFDMemoryTest.PtraceAccessInvisibleVram fail on SRIOV

2024-08-14 Thread Felix Kuehling
On 2024-08-12 02:59, Samuel Zhang wrote: Ptrace access VRAM bo will first try sdma access in amdgpu_ttm_access_memory_sdma(), if fails, it will fallback to mmio access. Since ptrace only access 8 bytes at a time and amdgpu_ttm_access_memory_sdma() only allow PAGE_SIZE bytes access, it returns

Re: [PATCH 2/2] drm/amdgpu: fix incomplete access issue in amdgpu_ttm_access_memory_sdma()

2024-08-14 Thread Felix Kuehling
On 2024-08-12 02:59, Samuel Zhang wrote: The requested access range may be across 2 adjacent buddy blocks of a BO. In this case, it needs to issue 2 sdma copy commands to fully access the data range. But current implementation only issue 1 sdma copy command and result in incomplete access. The

Re: [PATCH 2/4] amdgpu: fix a race in kfd_mem_export_dmabuf()

2024-08-14 Thread Felix Kuehling
On 2024-08-12 02:59, Al Viro wrote: Using drm_gem_prime_handle_to_fd() to set dmabuf up and insert it into descriptor table, only to have it looked up by file descriptor and remove it from descriptor table is not just too convoluted - it's racy; another thread might have modified the descriptor

[PATCHv2 1/3] drm/amdgpu: Implement MES Suspend and Resume APIs for GFX11

2024-08-14 Thread Mukul Joshi
Add implementation for MES Suspend and Resume APIs to unmap/map all queues for GFX11. Support for GFX12 will be added when the corresponding firmware support is in place. Signed-off-by: Mukul Joshi --- v1->v2: - Add MES FW version check. - Update amdgpu_mes_suspend/amdgpu_mes_resume handling. d

[PATCHv2 2/3] drm/amdkfd: Update queue unmap after VM fault with MES

2024-08-14 Thread Mukul Joshi
MEC FW expects MES to unmap all queues when a VM fault is observed on a queue and then resumed once the affected process is terminated. Use the MES Suspend and Resume APIs to achieve this. Signed-off-by: Mukul Joshi --- v1->v2: - Add MES FW version check. - Separate out the kfd_dqm_evict_pasid in

[PATCHv2 3/3] drm/amdkfd: Update BadOpcode Interrupt handling with MES

2024-08-14 Thread Mukul Joshi
Based on the recommendation of MEC FW, update BadOpcode interrupt handling by unmapping all queues, removing the queue that got the interrupt from scheduling and remapping rest of the queues back when using MES scheduler. This is done to prevent the case where unmapping of the bad queue can fail th

[PATCH 02/17] drm/amdgpu: Add infrastructure for Cleaner Shader feature

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam The cleaner shader is used by the CP firmware to clean LDS and GPRs between processes on the CUs. This adds an internal API for GFX IP code to allocate and initialize the cleaner shader. Cc: Christian König Cc: Alex Deucher Signed-off-by: Alex Deucher Signed-off-by

[PATCH 06/17] drm/amdgpu: Add enforce_isolation sysfs attribute

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam This commit adds a new sysfs attribute 'enforce_isolation' to control the 'enforce_isolation' setting per GPU. The attribute can be read and written, and accepts values 0 (disabled) and 1 (enabled). When 'enforce_isolation' is enabled, reserved VMIDs are allocated for

[PATCH 01/17] drm/amdgpu: handle enforce isolation on non-0 gfxhub

2024-08-14 Thread Alex Deucher
Some chips have more than one gfxhub so check if we are a gfxhub rather than just gfxhub 0. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c b/drivers/gpu/drm/amd/a

[PATCH 07/17] drm/amdgpu: Add sysfs interface for running cleaner shader

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam This patch adds a new sysfs interface for running the cleaner shader on AMD GPUs. The cleaner shader is used to clear GPU memory before it's reused, which can help prevent data leakage between different processes. The new sysfs file is write-only and is named `run_clea

[PATCH 08/17] drm/amdgpu: Add PACKET3_RUN_CLEANER_SHADER for cleaner shader execution

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam This commit adds the PACKET3_RUN_CLEANER_SHADER definition. This packet is a command packet used to instruct the GPU to execute the cleaner shader. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data S

[PATCH 05/17] drm/amdgpu: Enforce isolation as part of the job

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam This patch adds a new parameter 'enforce_isolation' to the amdgpu_job structure. This parameter is used to determine whether shader isolation should be enforced for a job. The enforce_isolation parameter is then stored in the amdgpu_job structure and used when flushing

[PATCH 04/17] drm/amdgpu: Make enforce_isolation setting per GPU

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam This commit makes enforce_isolation setting to be per GPU and per partition by adding the enforce_isolation array to the adev structure. The adev variable is set based on the global enforce_isolation module parameter during device initialization. In amdgpu_ids.c, the a

[PATCH 10/17] drm/amdgpu/gfx9: Implement cleaner shader support for GFX9.4.3 hardware

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam The patch modifies the gfx_v9_4_3_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v9_4_3_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffe

[PATCH 13/17] drm/amdkfd: APIs to stop/start KFD scheduling

2024-08-14 Thread Alex Deucher
From: Amber Lin Provide amdgpu_amdkfd_stop_sched() for amdgpu to stop KFD scheduling compute work on HIQ. amdgpu_amdkfd_start_sched() resumes the scheduling. When amdgpu_amdkfd_stop_sched is called, KFD will unmap queues from runlist. If users send ioctls to KFD to create queues, they'll be added

[PATCH 00/17] Process Isolation Support

2024-08-14 Thread Alex Deucher
This patch set enables process isolation mode which serializes access to the graphics block between processes. When this mode is active, a cleaner shader is run between processes to clear shader LDS (Local Data Store) and GPRs (General Purpose Registers). A sysfs interface is also available to man

[PATCH 15/17] drm/amdgpu/gfx9: Apply Isolation Enforcement to GFX & Compute rings

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_0 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its us

[PATCH 03/17] drm/amdgpu: Emit cleaner shader at end of IB submission

2024-08-14 Thread Alex Deucher
This commit introduces the emission of a cleaner shader at the end of the IB submission process. This is achieved by adding a new function pointer, `emit_cleaner_shader`, to the `amdgpu_ring_funcs` structure. If the `emit_cleaner_shader` function is set in the ring functions, it is called during th

[PATCH 12/17] drm/amdgpu/gfx9: Add cleaner shader support for GFX9.4.4 hardware

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam This commit extends the cleaner shader feature to support GFX9.4.4 hardware. The cleaner shader feature is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Registers

[PATCH 16/17] drm/amdgpu/gfx_v9_4_3: Apply Isolation Enforcement to GFX & Compute rings

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam This commit applies isolation enforcement to the GFX and Compute rings in the gfx_v9_4_3 module. The commit sets `amdgpu_gfx_enforce_isolation_ring_begin_use` and `amdgpu_gfx_enforce_isolation_ring_end_use` as the functions to be called when a ring begins and ends its

[PATCH 11/17] drm/amdgpu/gfx9: Add cleaner shader for GFX9.4.3

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam This commit adds the cleaner shader microcode for GFX9.4.3 GPUs. The cleaner shader is a piece of GPU code that is used to clear or initialize certain GPU resources, such as Local Data Share (LDS), Vector General Purpose Registers (VGPRs), and Scalar General Purpose Reg

[PATCH 09/17] drm/amdgpu/gfx9: Implement cleaner shader support for GFX9 hardware

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam The patch modifies the gfx_v9_0_kiq_set_resources function to write the cleaner shader's memory controller address to the ring buffer. It also adds a new function, gfx_v9_0_ring_emit_cleaner_shader, which emits the PACKET3_RUN_CLEANER_SHADER packet to the ring buffer.

[PATCH 14/17] drm/amdgpu: Implement Enforce Isolation Handler for KGD/KFD serialization

2024-08-14 Thread Alex Deucher
From: Srinivasan Shanmugam This commit introduces the Enforce Isolation Handler designed to enforce shader isolation on AMD GPUs, which helps to prevent data leakage between different processes. The handler counts the number of emitted fences for each GFX and compute ring. If there are any fence

[PATCH 17/17] drm/amdkfd: Enable processes isolation on gfx9

2024-08-14 Thread Alex Deucher
From: Amber Lin When amdgpu enable enforce_isolation, KFD enables single-process mode in HWS and sets exec_cleaner_shader bit in MAP_PROCESS. Signed-off-by: Amber Lin Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdkfd/kfd_packet_manager_v9.c | 14 +- drivers/gpu/drm/amd/am

Re: [PATCH 01/17] drm/amdgpu: handle enforce isolation on non-0 gfxhub

2024-08-14 Thread SRINIVASAN SHANMUGAM
On 8/15/2024 5:34 AM, Alex Deucher wrote: Some chips have more than one gfxhub so check if we are a gfxhub rather than just gfxhub 0. Signed-off-by: Alex Deucher --- drivers/gpu/drm/amd/amdgpu/amdgpu_ids.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/