Re: [PATCH 00/34] GC per queue reset

2024-07-19 Thread Alex Deucher
On Fri, Jul 19, 2024 at 9:39 AM Alex Deucher wrote: > > On Thu, Jul 18, 2024 at 1:00 PM Friedrich Vock wrote: > > > > Hi, > > > > On 18.07.24 16:06, Alex Deucher wrote: > > > This adds preliminary support for GC per queue reset. In this > > > case, only the jobs currently in the queue are lost.

[PATCH v2] drm/amdkfd: Change kfd/svm page fault drain handling

2024-07-19 Thread Xiaogang . Chen
From: Xiaogang Chen When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and not handle any incoming pages fault of this process until a deferred work item got executed by default system wq. The time period of "not handle page fault" can be long and is unpredicable. That is ad

RE: [PATCH] drm/amdkfd: allow users to target recommended SDMA engines

2024-07-19 Thread Kim, Jonathan
[Public] > -Original Message- > From: Kuehling, Felix > Sent: Friday, July 19, 2024 2:34 PM > To: Kim, Jonathan ; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdkfd: allow users to target recommended SDMA > engines > > On 2024-07-18 19:05, Jonathan Kim wrote: > > Certain GPUs

Re: [PATCH] drm/amdkfd: allow users to target recommended SDMA engines

2024-07-19 Thread Felix Kuehling
On 2024-07-18 19:05, Jonathan Kim wrote: Certain GPUs have better copy performance over xGMI on specific SDMA engines depending on the source and destination GPU. Allow users to create SDMA queues on these recommended engines. Close to 2x overall performance has been observed with this optimizati

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-19 Thread Matthew Brost
On Fri, Jul 19, 2024 at 05:18:05PM +0200, Christian König wrote: > Am 19.07.24 um 15:02 schrieb Christian König: > > Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: > > > From: Tvrtko Ursulin > > > > > > Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity > > > creation") a change wa

[PATCH] drm/amd/display: Implement bounds check for stream encoder creation in DCN401

2024-07-19 Thread Srinivasan Shanmugam
'stream_enc_regs' array is an array of dcn10_stream_enc_registers structures. The array is initialized with four elements, corresponding to the four calls to stream_enc_regs() in the array initializer. This means that valid indices for this array are 0, 1, 2, and 3. The error message 'stream_enc_r

[PATCH] drm/amdgpu: harden the HW access lockdep check

2024-07-19 Thread Christian König
While Alex already fixed a bunch of them we still have tons of call paths which are accessing the hw without holding the reset lock to prevent concurrent GPU resets. Start pointing those out so that we can eventually fix them. Only point out the first misbehavior per driver load so that we won't o

Re: [PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-19 Thread Christian König
Am 19.07.24 um 11:16 schrieb Jack Xiao: wait memory room until enough before writing mes packets to avoid ring buffer overflow. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 18 ++ drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 18 ++ 2 file

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-19 Thread Christian König
Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change was made which prevented priority changes for entities with only one assigned scheduler. Th

Re: [PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-19 Thread Alex Deucher
On Fri, Jul 19, 2024 at 5:35 AM Jack Xiao wrote: > > wait memory room until enough before writing mes packets > to avoid ring buffer overflow. > > Signed-off-by: Jack Xiao Fixes: de3246254156 ("drm/amdgpu: cleanup MES11 command submission") Fixes: fffe347e1478 ("drm/amdgpu: cleanup MES12 command

Re: [PATCH 00/34] GC per queue reset

2024-07-19 Thread Alex Deucher
On Thu, Jul 18, 2024 at 1:00 PM Friedrich Vock wrote: > > Hi, > > On 18.07.24 16:06, Alex Deucher wrote: > > This adds preliminary support for GC per queue reset. In this > > case, only the jobs currently in the queue are lost. If this > > fails, we fall back to a full adapter reset. > > First o

[PATCH v3] drm/amdgpu: fix a possible null pointer dereference

2024-07-19 Thread Ma Ke
In amdgpu_connector_add_common_modes(), the return value of drm_cvt_mode() is assigned to mode, which will lead to a NULL pointer dereference on failure of drm_cvt_mode(). Add a check to avoid npd. Cc: sta...@vger.kernel.org Fixes: d38ceaf99ed0 ("drm/amdgpu: add core driver (v4)") Signed-off-by: M

[PATCH v2] drm/amd/amdgpu: Fix uninitialized variable warnings

2024-07-19 Thread Ma Ke
Return 0 to avoid returning an uninitialized variable r. Cc: sta...@vger.kernel.org Fixes: 230dd6bb6117 ("drm/amd/amdgpu: implement mode2 reset on smu_v13_0_10") Signed-off-by: Ma Ke --- Changes in v2: - added Cc stable line. --- drivers/gpu/drm/amd/amdgpu/smu_v13_0_10.c | 2 +- 1 file changed,

[PATCH v2] drm/radeon: fix null pointer dereference in radeon_add_common_modes

2024-07-19 Thread Ma Ke
In radeon_add_common_modes(), the return value of drm_cvt_mode() is assigned to mode, which will lead to a possible NULL pointer dereference on failure of drm_cvt_mode(). Add a check to avoid npd. Cc: sta...@vger.kernel.org Fixes: d50ba256b5f1 ("drm/kms: start adding command line interface using f

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-19 Thread Christian König
Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change was made which prevented priority changes for entities with only one assigned scheduler. The commit reduced drm_sched_entity_set_priority()

Re: [PATCH] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-19 Thread Christian König
Am 19.07.24 um 11:36 schrieb Yin, ZhenGuo (Chris): [AMD Official Use Only - AMD Internal Distribution Only] Hi, Christian Why loosing VRAM would result in the vm entity to become invalid? I think only if there has a fence error appeared(like a pending SDMA job got timedout or cancelled), then

Re: [PATCH] drm/buddy: Add start address support to trim function

2024-07-19 Thread Matthew Auld
On 17/07/2024 16:02, Paneer Selvam, Arunpravin wrote: On 7/16/2024 3:34 PM, Matthew Auld wrote: On 16/07/2024 10:50, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 7/10/2024 6:20 PM, Matthew Auld wrote: On 10/07/2024 07:03, Paneer Selvam, Arunpravin wrote: Thanks Alex. Hi Matthew, Any co

[PATCH] drm/amd: Use a constant format string for amdgpu_ucode_request

2024-07-19 Thread Arnd Bergmann
From: Arnd Bergmann Multiple files in amdgpu call amdgpu_ucode_request() with a fw_name variable that the compiler cannot check for being a valid format string, as seen by enabling the (default-disabled) -Wformat-security option: drivers/gpu/drm/amd/amdgpu/amdgpu_mes.c: In function 'amdgpu_mes_

[PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-19 Thread Tvrtko Ursulin
From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change was made which prevented priority changes for entities with only one assigned scheduler. The commit reduced drm_sched_entity_set_priority() to simply update the entities priority, but the

RE: [PATCH] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-19 Thread Yin, ZhenGuo (Chris)
[AMD Official Use Only - AMD Internal Distribution Only] Hi, Christian Why loosing VRAM would result in the vm entity to become invalid? I think only if there has a fence error appeared(like a pending SDMA job got timedout or cancelled), then the entity vm->delayed will be set as error. If a g

Re: [PATCH] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-19 Thread Christian König
Am 19.07.24 um 11:19 schrieb ZhenGuo Yin: [Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when the counter is not equa

[PATCH] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-19 Thread ZhenGuo Yin
[Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when the counter is not equal to current vram_lost_counter of the device

[PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-19 Thread Jack Xiao
wait memory room until enough before writing mes packets to avoid ring buffer overflow. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 18 ++ drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 18 ++ 2 files changed, 28 insertions(+), 8 deletions(-)

Re: [PATCH v2 0/9] KFD user queue validation

2024-07-19 Thread Christian König
Am 18.07.24 um 23:12 schrieb Felix Kuehling: On 2024-07-18 17:05, Philip Yang wrote: This patch series do additional queue buffers validation in the queue creation IOCTLS, fail the queue creation if buffers not mapped on the GPU with the expected size. Ensure queue buffers residency by tracking

RE: [PATCH] drm/amdgpu: Mark amdgpu_bo as invalid after moved

2024-07-19 Thread YuanShang Mao (River)
[AMD Official Use Only - AMD Internal Distribution Only] Same issue on CPU page table update. -Original Message- From: Kuehling, Felix Sent: Thursday, July 18, 2024 12:28 AM To: Christian König ; YuanShang Mao (River) ; Huang, Trigger ; amd-gfx@lists.freedesktop.org; cao, lin Subject: