Re: [PATCH v5 2/2] drm/amd: Add Suspend/Hibernate notification callback support

2024-11-27 Thread Lazar, Lijo
On 11/28/2024 8:56 AM, Mario Limonciello wrote: > From: Mario Limonciello > > As part of the suspend sequence VRAM needs to be evicted on dGPUs. > In order to make suspend/resume more reliable we moved this into > the pmops prepare() callback so that the suspend sequence would fail > but the s

Re: [PATCH] drm/amdkfd: Use the correct wptr size

2024-11-27 Thread Lazar, Lijo
On 11/28/2024 5:43 AM, Felix Kuehling wrote: > > On 2024-11-18 00:34, Lijo Lazar wrote: >> Write pointer could be 32-bit or 64-bit. Use the correct size during >> initialization. >> >> Signed-off-by: Lijo Lazar >> --- >>   drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 +- >>   1 file change

[PATCH] drm/amdgpu: Add secure display v2 command

2024-11-27 Thread Jinzhou Su
Add secure display v2 command to support multiple ROI instances per display. Signed-off-by: Wayne Lin Signed-off-by: Jinzhou Su --- drivers/gpu/drm/amd/amdgpu/amdgpu_psp.c | 3 ++- .../gpu/drm/amd/amdgpu/ta_secureDisplay_if.h | 22 ++- 2 files changed, 23 insertions(+),

[PATCH v5 1/2] drm/amd: Invert APU check for amdgpu_device_evict_resources()

2024-11-27 Thread Mario Limonciello
From: Mario Limonciello Resource eviction isn't needed for s3 or s2idle on APUs, but should be run for S4. As amdgpu_device_evict_resources() will be called by prepare notifier adjust logic so that APUs only cover S4. -- v2: * New patch Suggested-by: Lijo Lazar Signed-off-by: Mario Limonciell

[PATCH v5 2/2] drm/amd: Add Suspend/Hibernate notification callback support

2024-11-27 Thread Mario Limonciello
From: Mario Limonciello As part of the suspend sequence VRAM needs to be evicted on dGPUs. In order to make suspend/resume more reliable we moved this into the pmops prepare() callback so that the suspend sequence would fail but the system could remain operational under high memory usage suspend.

[PATCH] drm/amd: Sanity check the ACPI EDID

2024-11-27 Thread Mario Limonciello
From: Mario Limonciello An HP Pavilion Aero Laptop 13-be0xxx/8916 has an ACPI EDID, but using it is causing corruption. It's got illogical values of not specifying a digital interface. Sanity check the ACPI EDID to avoid tripping such problems. Suggested-by: Tobias Jakobi Reported-and-tested-by

[PATCH] drm/amd/display: Fix programming backlight on OLED panels

2024-11-27 Thread Mario Limonciello
From: Mario Limonciello commit 38077562e0594 ("drm/amd/display: Implement new backlight_level_params structure") adjusted DC core to require the backlight type to be programmed in the dc link when changing brightness. This isn't initialized in amdgpu_dm for OLED panels though which broke brightn

Re: [PATCH v6.1] drm/amdkfd: amdkfd_free_gtt_mem clear the correct pointer

2024-11-27 Thread Felix Kuehling
On 2024-11-13 07:13, Christian König wrote: Am 13.11.24 um 13:10 schrieb Vamsi Krishna Brahmajosyula: From: Philip Yang [ Upstream commit c86ad39140bbcb9dc75a10046c2221f657e8083b ] Pass pointer reference to amdgpu_bo_unref to clear the correct pointer, otherwise amdgpu_bo_unref clear the lo

Re: [PATCH] drm/amdkfd: Use the correct wptr size

2024-11-27 Thread Felix Kuehling
On 2024-11-18 00:34, Lijo Lazar wrote: Write pointer could be 32-bit or 64-bit. Use the correct size during initialization. Signed-off-by: Lijo Lazar --- drivers/gpu/drm/amd/amdkfd/kfd_kernel_queue.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/amd/am

Re: [PATCH 1/1] amdgpu fix for gfx1103 queue evict/restore crash

2024-11-27 Thread Felix Kuehling
On 2024-11-27 06:51, Christian König wrote: Am 27.11.24 um 12:46 schrieb Mika Laitio: AMD gfx1103 / M780 iGPU will crash eventually when used for pytorch ML/AI operations on rocm sdk stack. After kernel error the application exits on error and linux desktop can itself sometimes either freeze o

[PATCH] drm/amd/amdgpu: Add Annotations to Process Isolation functions

2024-11-27 Thread Srinivasan Shanmugam
This update adds explanations to key functions that manage how the Kernel Fusion Driver (KFD) and Kernel Graphics Driver (KGD) share the GPU. amdgpu_gfx_enforce_isolation_wait_for_kfd: Controls the waiting period for KFD to ensure it takes turns with KGD in using the GPU. It uses a mutex to safely

[PATCH] drm/amdkfd: hard-code cacheline for gc943,gc944

2024-11-27 Thread David Yat Sin
Cacheline size is not available in IP discovery for gc943,gc944. Signed-off-by: David Yat Sin Reviewed-by: Harish Kasiviswanathan --- drivers/gpu/drm/amd/amdkfd/kfd_crat.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_crat.c b/drivers/gpu/drm/amd/amdk

Re: [PATCH] SWDEV-476969 - dm/amdgpu: Fail dm_atomic_check if cursor overlay is required at MAX_SURFACES

2024-11-27 Thread Melissa Wen
On 18/11/2024 09:52, Melissa Wen wrote: On 14/11/2024 16:04, Mario Limonciello wrote: Although it's really useful information for AMD people, the Jira shouldn't be in the "title" of the commit message. "If" we want to get into the habit of including this information for display code we

Re: [PATCH 1/1] amdgpu fix for gfx1103 queue evict/restore crash

2024-11-27 Thread Christian König
Am 27.11.24 um 12:46 schrieb Mika Laitio: AMD gfx1103 / M780 iGPU will crash eventually when used for pytorch ML/AI operations on rocm sdk stack. After kernel error the application exits on error and linux desktop can itself sometimes either freeze or reset back to login screen. Error will happe

[PATCH 1/1] amdgpu fix for gfx1103 queue evict/restore crash

2024-11-27 Thread Mika Laitio
AMD gfx1103 / M780 iGPU will crash eventually when used for pytorch ML/AI operations on rocm sdk stack. After kernel error the application exits on error and linux desktop can itself sometimes either freeze or reset back to login screen. Error will happen randomly when kernel calls evict_process_q

[PATCH 0/1] amdgpu fix for gfx1103 queue evict/restore crash v2

2024-11-27 Thread Mika Laitio
This is the corrected v2 version from the patch that was send earlier. Fixes: - add cover letter - use "goto out_unlock" instead of "goto out" in restore_process_queues_cpsch method after the mutex has been acquired in the code. - fixed typo on patch subject line and improved patch description Pa

[Bug Report] Warning from __flush_work() on next-20241126

2024-11-27 Thread Muhammad Usama Anjum
Hi, We are getting this warning on x86_64 and i386 targets: [8.677157] amdgpu :03:00.0: [drm:amdgpu_ib_ring_tests] *ERROR* IB test failed on sdma0 (-110). [8.698661] [ cut here ] [8.703310] WARNING: CPU: 1 PID: 49 at kernel/workqueue.c:4192 __flush_work+0