Re: [PATCH v3] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-23 Thread Christian König
Am 24.07.24 um 05:00 schrieb ZhenGuo Yin: [Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when vm->genertaion is not e

[PATCH v3] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-23 Thread ZhenGuo Yin
[Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when vm->genertaion is not equal to the new generation token. v2: Check

Re: [PATCH v7 1/2] drm/buddy: Add start address support to trim function

2024-07-23 Thread Marek Olšák
The reason is that our DCC requires 768K alignment in some cases. I haven't read this patch series, but one way to do that is to align to 256K, overallocate by 512K, and then not use either 0, 256K, or 512K at the beginning to get to 768K alignment. Marek On Tue, Jul 23, 2024, 11:04 Matthew Auld

Re: [PATCH v2] drm/amdkfd: Change kfd/svm page fault drain handling

2024-07-23 Thread Philip Yang
On 2024-07-19 18:17, Xiaogang.Chen wrote: From: Xiaogang Chen When app unmap vm ranges(munmap) kfd/svm starts drain pending page fault and not handle any incoming pages fault of this process until a deferred work item got executed by default system wq. The

[PATCH 1/2 V2] drm/amdgpu: properly handle vbios fake edid sizing

2024-07-23 Thread Alex Deucher
The comment in the vbios structure says: // = 128 means EDID length is 128 bytes, otherwise the EDID length = ucFakeEDIDLength*128 This fake edid struct has not been used in a long time, so I'm not sure if there were actually any boards out there with a non-128 byte EDID, but align the code with

[PATCH 2/2 V2] drm/radeon: properly handle vbios fake edid sizing

2024-07-23 Thread Alex Deucher
The comment in the vbios structure says: // = 128 means EDID length is 128 bytes, otherwise the EDID length = ucFakeEDIDLength*128 This fake edid struct has not been used in a long time, so I'm not sure if there were actually any boards out there with a non-128 byte EDID, but align the code with

RE: [PATCH 1/2] drm/amdgpu: properly handle vbios fake edid sizing

2024-07-23 Thread Deucher, Alexander
[Public] > -Original Message- > From: Thomas Weißschuh > Sent: Tuesday, July 23, 2024 1:58 PM > To: Deucher, Alexander > Cc: amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH 1/2] drm/amdgpu: properly handle vbios fake edid sizing > > On 2024-07-23 13:33:56+, Alex Deucher wrote: >

Re: [PATCH 1/2] drm/amdgpu: properly handle vbios fake edid sizing

2024-07-23 Thread Thomas Weißschuh
On 2024-07-23 13:33:56+, Alex Deucher wrote: > The comment in the vbios structure says: > // = 128 means EDID length is 128 bytes, otherwise the EDID length = > ucFakeEDIDLength*128 > > This fake edid struct has not been used in a long time, so I'm > not sure if there were actually any boards

[PATCH 2/2] drm/radeon: properly handle vbios fake edid sizing

2024-07-23 Thread Alex Deucher
The comment in the vbios structure says: // = 128 means EDID length is 128 bytes, otherwise the EDID length = ucFakeEDIDLength*128 This fake edid struct has not been used in a long time, so I'm not sure if there were actually any boards out there with a non-128 byte EDID, but align the code with

[PATCH 1/2] drm/amdgpu: properly handle vbios fake edid sizing

2024-07-23 Thread Alex Deucher
The comment in the vbios structure says: // = 128 means EDID length is 128 bytes, otherwise the EDID length = ucFakeEDIDLength*128 This fake edid struct has not been used in a long time, so I'm not sure if there were actually any boards out there with a non-128 byte EDID, but align the code with

Re: [PATCH] drm/amdgpu: convert bios_hardcoded_edid to drm_edid

2024-07-23 Thread Alex Deucher
On Tue, Jul 23, 2024 at 12:49 PM Alex Deucher wrote: > > On Sun, Jun 16, 2024 at 2:32 PM Thomas Weißschuh wrote: > > > > On 2024-06-16 11:12:03+, Thomas Weißschuh wrote: > > > Instead of manually passing around 'struct edid *' and its size, > > > use 'struct drm_edid', which encapsulates a va

Re: [PATCH] drm/amdgpu: convert bios_hardcoded_edid to drm_edid

2024-07-23 Thread Alex Deucher
On Sun, Jun 16, 2024 at 2:32 PM Thomas Weißschuh wrote: > > On 2024-06-16 11:12:03+, Thomas Weißschuh wrote: > > Instead of manually passing around 'struct edid *' and its size, > > use 'struct drm_edid', which encapsulates a validated combination of > > both. > > > > As the drm_edid_ can hand

Re: [PATCH v7 1/2] drm/buddy: Add start address support to trim function

2024-07-23 Thread Matthew Auld
On 23/07/2024 14:43, Paneer Selvam, Arunpravin wrote: Hi Matthew, Can we push this version for now as we need to mainline the DCC changes ASAP, while we continue our discussion and proceed to implement the permanent solution for address alignment? Yeah, we can always merge now and circle ba

Re: [PATCH] drm/buddy: Add start address support to trim function

2024-07-23 Thread Matthew Auld
Hi, On 22/07/2024 12:41, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 7/19/2024 4:01 PM, Matthew Auld wrote: On 17/07/2024 16:02, Paneer Selvam, Arunpravin wrote: On 7/16/2024 3:34 PM, Matthew Auld wrote: On 16/07/2024 10:50, Paneer Selvam, Arunpravin wrote: Hi Matthew, On 7/10/2024 6

Re: [PATCH v7 1/2] drm/buddy: Add start address support to trim function

2024-07-23 Thread Paneer Selvam, Arunpravin
Hi Matthew, Can we push this version for now as we need to mainline the DCC changes ASAP, while we continue our discussion and proceed to implement the permanent solution for address alignment? Thanks, Arun. On 7/23/2024 6:55 PM, Arunpravin Paneer Selvam wrote: - Add a new start parameter i

[PATCH v7 2/2] drm/amdgpu: Add address alignment support to DCC buffers

2024-07-23 Thread Arunpravin Paneer Selvam
Add address alignment support to the DCC VRAM buffers. v2: - adjust size based on the max_texture_channel_caches values only for GFX12 DCC buffers. - used AMDGPU_GEM_CREATE_GFX12_DCC flag to apply change only for DCC buffers. - roundup non power of two DCC buffer adjusted size to nea

[PATCH v7 1/2] drm/buddy: Add start address support to trim function

2024-07-23 Thread Arunpravin Paneer Selvam
- Add a new start parameter in trim function to specify exact address from where to start the trimming. This would help us in situations like if drivers would like to do address alignment for specific requirements. - Add a new flag DRM_BUDDY_TRIM_DISABLE. Drivers can use this flag to disab

Reminder - The Call for Proposals is open for XDC 2024!

2024-07-23 Thread Mark Filion
Hello! Reminder - The CfP is now open for talks, workshops and demos at XDC 2024. The deadline for submissions is Monday, 12 August 2024. https://indico.freedesktop.org/event/6/abstracts/ While any serious proposal will be gratefully considered, topics of interest to X.Org and freedesktop.org de

amdgpu various gfx timeouts when running zoom on 6.10 kernel

2024-07-23 Thread Andrew Worsley
Twice running zoom when I connected to a meeting zoom crashed the graphics - screen went black but recovered. I've attended other meetings fine - so perhaps this zoom meeting was triggering particular issues. Any suggestions on how to avoid / debug this. Is it a zoom fault or should the driver ha

Re: [PATCH v4 3/6] drm/i915: Make I2C terminology more inclusive

2024-07-23 Thread Easwar Hariharan
On 7/22/2024 5:50 AM, Andi Shyti wrote: > Hi Easwar, > > merged to drm-intel-next. Thanks! > > On Thu, Jul 11, 2024 at 05:27:31AM +, Easwar Hariharan wrote: >> I2C v7, SMBus 3.2, and I3C 1.1.1 specifications have replaced "master/slave" >> with more appropriate terms. Inspired by Wolfram's se

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-23 Thread Tvrtko Ursulin
On 22/07/2024 15:06, Christian König wrote: Am 22.07.24 um 15:52 schrieb Tvrtko Ursulin: On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/

[PATCH] drm/amdgpu: fix OLAND card ip_init failed during kdump caputrue kernel boot

2024-07-23 Thread Lu Yao
[Why] When running kdump test on a machine with R7340 card, a hang is caused due to the failure of 'amdgpu_device_ip_init()', error message as follows: '[drm:amdgpu_device_ip_init [amdgpu]] *ERROR* hw_init of IP block failed -22' '[drm:uvd_v3_1_hw_init [amdgpu]] *ERROR* amdgpu: UVD Firmware

Re: [PATCH] drm/scheduler: Fix drm_sched_entity_set_priority()

2024-07-23 Thread Tvrtko Ursulin
On 19/07/2024 16:18, Christian König wrote: Am 19.07.24 um 15:02 schrieb Christian König: Am 19.07.24 um 11:47 schrieb Tvrtko Ursulin: From: Tvrtko Ursulin Long time ago in commit b3ac17667f11 ("drm/scheduler: rework entity creation") a change was made which prevented priority changes for

Re: [PATCH] drm/amdgpu: add missed harvest check for VCN IP v4/v5

2024-07-23 Thread Saleemkhan Jamadar
Hi Tim, Looks good to me. Reviewed-by: Saleemkhan Jamadar Thanks! -- Saleem On 23/07/24 16:37, Tim Huang wrote: To prevent below probe failure, add a check for models with VCN IP v4.0.6 where VCN1 may be harvested. v2: Apply the same check to VCN IP v4.0 and v5.0. [ 54.070117] RIP: 00

[PATCH] drm/amdgpu: add missed harvest check for VCN IP v4/v5

2024-07-23 Thread Tim Huang
To prevent below probe failure, add a check for models with VCN IP v4.0.6 where VCN1 may be harvested. v2: Apply the same check to VCN IP v4.0 and v5.0. [ 54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu] [ 54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43

RE: [PATCH] drm/amdgpu: fix potential probe issue for VCN IP v4.0.6

2024-07-23 Thread Huang, Tim
[Public] Please ignore this one, will send out a new one to apply the same check to more VCN versions. Thanks. > -Original Message- > From: Huang, Tim > Sent: Tuesday, July 23, 2024 5:59 PM > To: amd-gfx@lists.freedesktop.org > Cc: Deucher, Alexander ; Zhang, Yifan > ; Jamadar, Saleemkh

[PATCH] drm/amdgpu: fix potential probe issue for VCN IP v4.0.6

2024-07-23 Thread Tim Huang
To prevent below probe failure, add a check for models with VCN IP v4.0.6 where VCN1 may be harvested. [ 54.070117] RIP: 0010:vcn_v4_0_5_start_dpg_mode+0x9be/0x36b0 [amdgpu] [ 54.071055] Code: 80 fb ff 8d 82 00 80 fe ff 81 fe 00 06 00 00 0f 43 c2 49 69 d5 38 0d 00 00 48 8d 71 04 c1 e8 02 4c 01

Re: [PATCH v2] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-23 Thread Christian König
Am 23.07.24 um 05:05 schrieb ZhenGuo Yin: [Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when vm->genertaion is not e

Re: [PATCH v2] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-23 Thread Christian König
Am 23.07.24 um 11:04 schrieb Yin, ZhenGuo (Chris): [AMD Official Use Only - AMD Internal Distribution Only] Hi Christian I prepared this patch because we met a page fault after gpu reset in SRIOV triggered by quark. After investigation, I found that the page table did not get restored after gp

Re: [BUG] HID: amd_sfh (drivers/hid/amd-sfh-hid/): memory/page corruption

2024-07-23 Thread Basavaraj Natikar
On 7/22/2024 10:00 PM, Benjamin Tissoires wrote: > On Jul 21 2024, Chris Hixon wrote: >> On 7/21/24 00:20, Basavaraj Natikar wrote: >> >>> On 7/17/2024 4:51 PM, Linux regression tracking (Thorsten Leemhuis) wrote: On 15.07.24 06:39, Chris Hixon wrote: > System: HP ENVY x360 Convertible 1

RE: [PATCH v2] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-23 Thread Yin, ZhenGuo (Chris)
[AMD Official Use Only - AMD Internal Distribution Only] Hi Christian I prepared this patch because we met a page fault after gpu reset in SRIOV triggered by quark. After investigation, I found that the page table did not get restored after gpu reset. I just tried to use vm_update_mode=0 to dis

Re: [PATCH 00/34] GC per queue reset

2024-07-23 Thread Christopher Snowhill
Alex Deucher writes: > On Thu, Jul 18, 2024 at 10:15 AM Alex Deucher > wrote: >> >> This adds preliminary support for GC per queue reset. In this >> case, only the jobs currently in the queue are lost. If this >> fails, we fall back to a full adapter reset. > > Also available here via git: >

RE: [PATCH] drm/amdgpu: skip kfd init if GFX is not ready.

2024-07-23 Thread Zhang, Jesse(Jie)
[AMD Official Use Only - AMD Internal Distribution Only] This patch look good for me. Tested-and-Reviewed-by: Jesse Zhang -Original Message- From: amd-gfx On Behalf Of Yifan Zhang Sent: Monday, July 22, 2024 10:25 AM To: amd-gfx@lists.freedesktop.org Cc: Deucher, Alexander ; Zhang, Yif

Re: [PATCH] drm/amdgpu/mes: refine for maximum packet execution

2024-07-23 Thread Christian König
Am 23.07.24 um 10:27 schrieb Jack Xiao: Only allow API_NUMBER_OF_COMMAND_MAX packet in mes ring buffer, refine the code for maximum packet execution. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 2 ++ drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +- drivers/gpu/dr

[PATCH] drm/amdgpu/mes: refine for maximum packet execution

2024-07-23 Thread Jack Xiao
Only allow API_NUMBER_OF_COMMAND_MAX packet in mes ring buffer, refine the code for maximum packet execution. Signed-off-by: Jack Xiao --- drivers/gpu/drm/amd/amdgpu/amdgpu_ring.c | 2 ++ drivers/gpu/drm/amd/amdgpu/mes_v11_0.c | 2 +- drivers/gpu/drm/amd/amdgpu/mes_v12_0.c | 2 +- 3 files ch

RE: [PATCH] drm/amdgpu/mes: fix mes ring buffer overflow

2024-07-23 Thread Xiao, Jack
[AMD Official Use Only - AMD Internal Distribution Only] There is a max command number for mes queue in mes_api_def.h enum {API_NUMBER_OF_COMMAND_MAX = 32}; It should explain why the patch can’t fix the issue. I will send out another patch to refine code according to the firmware limitation.

Re: [PATCH v2] drm/amdgpu: reset vm state machine after gpu reset(vram lost)

2024-07-23 Thread Christian König
Am 23.07.24 um 05:05 schrieb ZhenGuo Yin: [Why] Page table of compute VM in the VRAM will lost after gpu reset. VRAM won't be restored since compute VM has no shadows. [How] Use higher 32-bit of vm->generation to record a vram_lost_counter. Reset the VM state machine when vm->genertaion is not e