RE: [PATCH] drm/amdkfd: use mode1 reset for RAS poison consumption
[AMD Official Use Only - AMD Internal Distribution Only] Reviewed-by: Hawking Zhang Regards, Hawking -Original Message- From: amd-gfx On Behalf Of Tao Zhou Sent: Thursday, June 13, 2024 14:57 To: amd-gfx@lists.freedesktop.org Cc: Zhou1, Tao Subject: [PATCH] drm/amdkfd: use mode1 reset for RAS poison consumption Per FW requirement, replace mode2 with mode1. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c index e1c21d250611..78dde62fb04a 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c @@ -164,7 +164,7 @@ static void event_interrupt_poison_consumption_v9(struct kfd_node *dev, case SOC15_IH_CLIENTID_SE3SH: case SOC15_IH_CLIENTID_UTCL2: block = AMDGPU_RAS_BLOCK__GFX; - reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; break; case SOC15_IH_CLIENTID_VMC: case SOC15_IH_CLIENTID_VMC1: @@ -177,7 +177,7 @@ static void event_interrupt_poison_consumption_v9(struct kfd_node *dev, case SOC15_IH_CLIENTID_SDMA3: case SOC15_IH_CLIENTID_SDMA4: block = AMDGPU_RAS_BLOCK__SDMA; - reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; break; default: dev_warn(dev->adev->dev, -- 2.34.1
Re: "firmware/sysfb: Set firmware-framebuffer parent device" breaks lightdm on Ubuntu 22.04 using amdgpu
Hi Am 13.06.24 um 08:00 schrieb Marek Olšák: +amd-gfx On Thu, Jun 13, 2024 at 1:59 AM Marek Olšák wrote: Hi Thomas, Commit 9eac534db0013aff9b9124985dab114600df9081 as per the title breaks (crashes?) lightdm (login screen) such that all I get is the terminal. It's also reproducible with tag v6.9 where the commit is present. Reverting the commit fixes lightdm. A workaround is to bypass lightdm by triggering auto-login. This is a bug report. I see. Do you know why it crashes? Or have any logs. Best regards Thomas (For AMD folks: It's also reproducible with amd-staging-drm-next.) Marek -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)
Re: [PATCH] drm/amdkfd: use mode1 reset for RAS poison consumption
On 6/13/2024 12:27 PM, Tao Zhou wrote: > Per FW requirement, replace mode2 with mode1. > > Signed-off-by: Tao Zhou > --- > drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > index e1c21d250611..78dde62fb04a 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > @@ -164,7 +164,7 @@ static void event_interrupt_poison_consumption_v9(struct > kfd_node *dev, > case SOC15_IH_CLIENTID_SE3SH: > case SOC15_IH_CLIENTID_UTCL2: > block = AMDGPU_RAS_BLOCK__GFX; > - reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; > + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; > break; > case SOC15_IH_CLIENTID_VMC: > case SOC15_IH_CLIENTID_VMC1: > @@ -177,7 +177,7 @@ static void event_interrupt_poison_consumption_v9(struct > kfd_node *dev, > case SOC15_IH_CLIENTID_SDMA3: > case SOC15_IH_CLIENTID_SDMA4: > block = AMDGPU_RAS_BLOCK__SDMA; > - reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; > + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; > break; Does this need 9.4.3 IP version check? Thanks, Lijo > default: > dev_warn(dev->adev->dev,
Re: [PATCH 1/5] drm/amdgpu: add condition check for waking up thread
On 6/13/2024 7:55 AM, YiPeng Chai wrote: > 1. Cannot add messages to fifo in gpu reset mode. > 2. Only when the message is successfully saved to the > fifo, the thread can be awakened. > I think fifo should still cache the poison requests while in reset. Page retirement thread may try to acquire the read side of reset lock and wait if any reset is in progress. Thanks Lijo > Signed-off-by: YiPeng Chai > --- > drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 ++-- > drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 18 +++--- > 2 files changed, 21 insertions(+), 13 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > index d0dcd3d37e6d..ed260966363f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c > @@ -2093,12 +2093,16 @@ static void > amdgpu_ras_interrupt_poison_creation_handler(struct ras_manager *obj > if (amdgpu_ip_version(obj->adev, UMC_HWIP, 0) >= IP_VERSION(12, 0, 0)) { > struct amdgpu_ras *con = amdgpu_ras_get_context(obj->adev); > > - amdgpu_ras_put_poison_req(obj->adev, > - AMDGPU_RAS_BLOCK__UMC, 0, NULL, NULL, false); > - > - atomic_inc(&con->page_retirement_req_cnt); > - > - wake_up(&con->page_retirement_wq); > + if (!amdgpu_in_reset(obj->adev) && > !atomic_read(&con->in_recovery)) { > + int ret; > + > + ret = amdgpu_ras_put_poison_req(obj->adev, > + AMDGPU_RAS_BLOCK__UMC, 0, NULL, NULL, false); > + if (!ret) { > + atomic_inc(&con->page_retirement_req_cnt); > + wake_up(&con->page_retirement_wq); > + } > + } > } > #endif > } > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c > b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c > index 1dbe69eabb9a..94181ae85886 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c > @@ -293,16 +293,20 @@ int amdgpu_umc_pasid_poison_handler(struct > amdgpu_device *adev, > > amdgpu_ras_error_data_fini(&err_data); > } else { > - struct amdgpu_ras *con = > amdgpu_ras_get_context(adev); > - > #ifdef HAVE_KFIFO_PUT_NON_POINTER > - amdgpu_ras_put_poison_req(adev, > - block, pasid, pasid_fn, data, reset); > -#endif > + struct amdgpu_ras *con = amdgpu_ras_get_context(adev); > > - atomic_inc(&con->page_retirement_req_cnt); > + if (!amdgpu_in_reset(adev) && > !atomic_read(&con->in_recovery)) { > + int ret; > > - wake_up(&con->page_retirement_wq); > + ret = amdgpu_ras_put_poison_req(adev, > + block, pasid, pasid_fn, data, reset); > + if (!ret) { > + > atomic_inc(&con->page_retirement_req_cnt); > + wake_up(&con->page_retirement_wq); > + } > + } > +#endif > } > } else { > if (adev->virt.ops && adev->virt.ops->ras_poison_handler)
RE: [PATCH] drm/amdkfd: use mode1 reset for RAS poison consumption
[AMD Official Use Only - AMD Internal Distribution Only] > -Original Message- > From: Lazar, Lijo > Sent: Thursday, June 13, 2024 4:07 PM > To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org > Subject: Re: [PATCH] drm/amdkfd: use mode1 reset for RAS poison consumption > > > > On 6/13/2024 12:27 PM, Tao Zhou wrote: > > Per FW requirement, replace mode2 with mode1. > > > > Signed-off-by: Tao Zhou > > --- > > drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > > index e1c21d250611..78dde62fb04a 100644 > > --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > > @@ -164,7 +164,7 @@ static void > event_interrupt_poison_consumption_v9(struct kfd_node *dev, > > case SOC15_IH_CLIENTID_SE3SH: > > case SOC15_IH_CLIENTID_UTCL2: > > block = AMDGPU_RAS_BLOCK__GFX; > > - reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; > > + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; > > break; > > case SOC15_IH_CLIENTID_VMC: > > case SOC15_IH_CLIENTID_VMC1: > > @@ -177,7 +177,7 @@ static void > event_interrupt_poison_consumption_v9(struct kfd_node *dev, > > case SOC15_IH_CLIENTID_SDMA3: > > case SOC15_IH_CLIENTID_SDMA4: > > block = AMDGPU_RAS_BLOCK__SDMA; > > - reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; > > + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; > > break; > > Does this need 9.4.3 IP version check? [Tao] It's applicable to all gfx9 ASICs. > > Thanks, > Lijo > > default: > > dev_warn(dev->adev->dev,
[BUG] 6.10-rc3 [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off
Hi, all! Running the vanilla torvalds tree kernel 6.10-rc3, there occurred an error in boot with amdgpu. Here is the complete output: kernel: [8.704024] WARNING: CPU: 24 PID: 689 at drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1379 amdgpu_bo_release_notify (drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1382 (discriminator 1)) amdgpu kernel: [8.704146] Modules linked in: binfmt_misc amd_atl intel_rapl_msr intel_rapl_common edac_mce_amd kvm_amd kvm snd_hda_codec_realtek snd_hda_codec_generic amdgpu(+) crct10dif_pclmul nls_iso8859_1 snd_hda_scodec_component polyval_clmulni snd_hda_codec_hdmi polyval_generic ghash_clmulni_intel snd_hda_intel sha256_ssse3 sha1_ssse3 snd_intel_dspcfg snd_intel_sdw_acpi aesni_intel snd_hda_codec crypto_simd cryptd snd_seq_midi amdxcp snd_seq_midi_event snd_hda_core drm_exec gpu_sched joydev snd_rawmidi snd_hwdep rapl drm_buddy input_leds drm_suballoc_helper snd_seq drm_ttm_helper wmi_bmof snd_pcm snd_seq_device ttm k10temp ccp snd_timer drm_display_helper snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc ppdev lp parport efi_pstore drm ip_tables x_tables autofs4 btrfs blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid nvme crc32_pclmul ahci i2c_piix4 nvme_core r8169 xhci_pci libahci xhci_pci_renesas realtek video wmi gpio_amdpt kernel: [8.704200] CPU: 24 PID: 689 Comm: systemd-udevd Not tainted 6.10.0-rc1-next-20240528 #1 kernel: [8.704202] Hardware name: ASRock X670E PG Lightning/X670E PG Lightning, BIOS 1.21 04/26/2023 kernel: [8.704203] RIP: 0010:amdgpu_bo_release_notify (drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1382 (discriminator 1)) amdgpu kernel: [ 8.704324] Code: 0b e9 a3 fe ff ff 48 ba ff ff ff ff ff ff ff 7f 31 f6 4c 89 ef e8 c2 c4 dc ee eb 99 e8 eb bc dc ee eb b2 0f 0b e9 3c fe ff ff <0f> 0b eb a7 be 03 00 00 00 e8 b4 4b a1 ee eb 9b e8 4d 5c 36 ef 66 All code 0: 0b e9 or %ecx,%ebp 2: a3 fe ff ff 48 ba ffmovabs %eax,0xffba48fe 9: ff ff b: ff (bad) c: ff (bad) d: ff (bad) e: ff (bad) f: 7f 31 jg 0x42 11: f6 4c 89 ef e8 testb $0xe8,-0x11(%rcx,%rcx,4) 16: c2 c4 dcret$0xdcc4 19: ee out%al,(%dx) 1a: eb 99 jmp0xffb5 1c: e8 eb bc dc ee call 0xeedcbd0c 21: eb b2 jmp0xffd5 23: 0f 0b ud2 25: e9 3c fe ff ff jmp0xfe66 2a:* 0f 0b ud2 <-- trapping instruction 2c: eb a7 jmp0xffd5 2e: be 03 00 00 00 mov$0x3,%esi 33: e8 b4 4b a1 ee call 0xeea14bec 38: eb 9b jmp0xffd5 3a: e8 4d 5c 36 ef call 0xef365c8c 3f: 66 data16 Code starting with the faulting instruction === 0: 0f 0b ud2 2: eb a7 jmp0xffab 4: be 03 00 00 00 mov$0x3,%esi 9: e8 b4 4b a1 ee call 0xeea14bc2 e: eb 9b jmp0xffab 10: e8 4d 5c 36 ef call 0xef365c62 15: 66 data16 kernel: [8.704325] RSP: 0018:b74b014d3380 EFLAGS: 00010282 kernel: [8.704327] RAX: ffea RBX: 940781ec5c48 RCX: kernel: [8.704328] RDX: RSI: RDI: kernel: [8.704329] RBP: b74b014d33b8 R08: R09: kernel: [8.704330] R10: R11: R12: 9407dc80ef58 kernel: [8.704330] R13: 940781ec5c00 R14: R15: kernel: [8.704331] FS: 783805ca28c0() GS:94169860() knlGS: kernel: [8.704333] CS: 0010 DS: ES: CR0: 80050033 kernel: [8.704334] CR2: 7ec7be572000 CR3: 00010f7e4000 CR4: 00750ef0 kernel: [8.704335] PKRU: 5554 kernel: [8.704335] Call Trace: kernel: [8.704337] kernel: [8.704339] ? show_regs+0x71/0x90 kernel: [8.704344] ? __warn+0x88/0x140 kernel: [8.704347] ? amdgpu_bo_release_notify (drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1382 (discriminator 1)) amdgpu kernel: [8.704464] ? report_bug+0x1ab/0x1c0 kernel: [8.704468] ? handle_bug+0x46/0x90 kernel: [8.704471] ? exc_invalid_op+0x19/0x80 kernel: [8.704473] ? asm_exc_invalid_op+0x1b/0x20 kernel: [8.704478] ? amdgpu_bo_release_notify (drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1382 (discriminator 1)) amdgpu kernel: [8.704595] ttm_bo_release (drivers/gpu/d
Re: [bug report] drm/amdgpu: amdgpu crash on playing videos, linux 6.10-rc
On 06.06.24 05:06, Winston Ma wrote: > Hi I got the same problem on Linux Kernel 6.10-rc2. I got the problem by > following the procedure below: > > 1. Boot Linux Kernel 6.10-rc2 > 2. Open Firefox (Any browser should work) > 3. Open a Youtube Video > 4. On the playing video, toggle fullscreen quickly Then after 10-20 > times of fullscreen toggling, the screen would enter freeze mode. > This is the log that I captured using the above method. Hmm, seems nothing happened here for a while. Could you maybe try to bisect this (https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html )? @amd-gfx devs: Or is this unneeded, as the cause found or maybe even fixed meanwhile? Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) -- Everything you wanna know about Linux kernel regression tracking: https://linux-regtracking.leemhuis.info/about/#tldr If I did something stupid, please tell me, as explained on that page. #regzbot poke > This is the kernel log > > 2024-06-06T10:26:40.747576+08:00 kernel: gmc_v10_0_process_interrupt: 6 > callbacks suppressed > 2024-06-06T10:26:40.747618+08:00 kernel: amdgpu :03:00.0: amdgpu: [mmhub] > page fault (src_id:0 ring:8 vmid:2 pasid:32789) > 2024-06-06T10:26:40.747623+08:00 kernel: amdgpu :03:00.0: amdgpu: in > process RDD Process pid 39524 thread firefox-bi:cs0 pid 40342 > 2024-06-06T10:26:40.747625+08:00 kernel: amdgpu :03:00.0: amdgpu: in > page starting at address 0x800106ffe000 from client 0x12 (VMC) > 2024-06-06T10:26:40.747628+08:00 kernel: amdgpu :03:00.0: amdgpu: > MMVM_L2_PROTECTION_FAULT_STATUS:0x00203811 > 2024-06-06T10:26:40.747629+08:00 kernel: amdgpu :03:00.0: amdgpu: > Faulty UTCL2 client ID: VCN (0x1c) > 2024-06-06T10:26:40.747631+08:00 kernel: amdgpu :03:00.0: amdgpu: > MORE_FAULTS: 0x1 > 2024-06-06T10:26:40.747651+08:00 kernel: amdgpu :03:00.0: amdgpu: > WALKER_ERROR: 0x0 > 2024-06-06T10:26:40.747653+08:00 kernel: amdgpu :03:00.0: amdgpu: > PERMISSION_FAULTS: 0x1 > 2024-06-06T10:26:40.747655+08:00 kernel: amdgpu :03:00.0: amdgpu: > MAPPING_ERROR: 0x0 > 2024-06-06T10:26:40.747656+08:00 kernel: amdgpu :03:00.0: amdgpu: > RW: 0x0 > 2024-06-06T10:26:40.747658+08:00 kernel: amdgpu :03:00.0: amdgpu: [mmhub] > page fault (src_id:0 ring:8 vmid:2 pasid:32789) > 2024-06-06T10:26:40.747660+08:00 kernel: amdgpu :03:00.0: amdgpu: in > process RDD Process pid 39524 thread firefox-bi:cs0 pid 40342 > 2024-06-06T10:26:40.747662+08:00 kernel: amdgpu :03:00.0: amdgpu: in > page starting at address 0x800106e0 from client 0x12 (VMC) > 2024-06-06T10:26:40.747663+08:00 kernel: amdgpu :03:00.0: amdgpu: > MMVM_L2_PROTECTION_FAULT_STATUS:0x > 2024-06-06T10:26:40.747664+08:00 kernel: amdgpu :03:00.0: amdgpu: > Faulty UTCL2 client ID: MP0 (0x0) > 2024-06-06T10:26:40.747666+08:00 kernel: amdgpu :03:00.0: amdgpu: > MORE_FAULTS: 0x0 > 2024-06-06T10:26:40.747667+08:00 kernel: amdgpu :03:00.0: amdgpu: > WALKER_ERROR: 0x0 > 2024-06-06T10:26:40.747668+08:00 kernel: amdgpu :03:00.0: amdgpu: > PERMISSION_FAULTS: 0x0 > 2024-06-06T10:26:40.747670+08:00 kernel: amdgpu :03:00.0: amdgpu: > MAPPING_ERROR: 0x0 > 2024-06-06T10:26:40.747671+08:00 kernel: amdgpu :03:00.0: amdgpu: > RW: 0x0 > 2024-06-06T10:26:40.747674+08:00 kernel: amdgpu :03:00.0: amdgpu: [mmhub] > page fault (src_id:0 ring:8 vmid:2 pasid:32789) > 2024-06-06T10:26:40.747677+08:00 kernel: amdgpu :03:00.0: amdgpu: in > process RDD Process pid 39524 thread firefox-bi:cs0 pid 40342 > 2024-06-06T10:26:40.747680+08:00 kernel: amdgpu :03:00.0: amdgpu: in > page starting at address 0x800106e07000 from client 0x12 (VMC) > 2024-06-06T10:26:40.747683+08:00 kernel: amdgpu :03:00.0: amdgpu: > MMVM_L2_PROTECTION_FAULT_STATUS:0x > 2024-06-06T10:26:40.747686+08:00 kernel: amdgpu :03:00.0: amdgpu: > Faulty UTCL2 client ID: MP0 (0x0) > 2024-06-06T10:26:40.747688+08:00 kernel: amdgpu :03:00.0: amdgpu: > MORE_FAULTS: 0x0 > 2024-06-06T10:26:40.747691+08:00 kernel: amdgpu :03:00.0: amdgpu: > WALKER_ERROR: 0x0 > 2024-06-06T10:26:40.747693+08:00 kernel: amdgpu :03:00.0: amdgpu: > PERMISSION_FAULTS: 0x0 > 2024-06-06T10:26:40.747696+08:00 kernel: amdgpu :03:00.0: amdgpu: > MAPPING_ERROR: 0x0 > 2024-06-06T10:26:40.747698+08:00 kernel: amdgpu :03:00.0: amdgpu: > RW: 0x0 > 2024-06-06T10:26:40.747700+08:00 kernel: amdgpu :03:00.0: amdgpu: [mmhub] > page fault (src_id:0 ring:8 vmid:2 pasid:32789) > 2024-06-06T10:26:40.747703+08:00 kernel: amdgpu :03:00.0: amdgpu: in > process RDD Process pid 39524 thread firefox-bi:cs0 pid 40342 > 2024-06-06T10:26:40.747705+08:00 kernel: amdgpu :03:00.0: amdgpu: in > page starting at address 0x800107001000 from cli
Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"
On Wed, Jun 12, 2024 at 12:10:37PM +1200, Matthew Ruffell wrote: > Hi Greg KH, Sasha, > > Please pick up this patch for 5.15 stable tree. I have built a test kernel and > can confirm that it fixes affected users. > > Downstream bug: > https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738 Sorry for the delay, now picked up. greg k-h
Patch "Revert "drm/amdgpu: init iommu after amdkfd device init"" has been added to the 5.15-stable tree
This is a note to let you know that I've just added the patch titled Revert "drm/amdgpu: init iommu after amdkfd device init" to the 5.15-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch and it can be found in the queue-5.15 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >From w_ar...@gmx.de Wed Jun 12 14:43:21 2024 From: Armin Wolf Date: Thu, 23 May 2024 19:30:31 +0200 Subject: Revert "drm/amdgpu: init iommu after amdkfd device init" To: alexander.deuc...@amd.com, christian.koe...@amd.com, xinhui@amd.com, gre...@linuxfoundation.org, sas...@kernel.org Cc: sta...@vger.kernel.org, bkau...@gmail.com, yifan1.zh...@amd.com, prike.li...@amd.com, dri-de...@lists.freedesktop.org, amd-gfx@lists.freedesktop.org Message-ID: <20240523173031.4212-1-w_ar...@gmx.de> From: Armin Wolf This reverts commit 56b522f4668167096a50c39446d6263c96219f5f. A user reported that this commit breaks the integrated gpu of his notebook, causing a black screen. He was able to bisect the problematic commit and verified that by reverting it the notebook works again. He also confirmed that kernel 6.8.1 also works on his device, so the upstream commit itself seems to be ok. An amdgpu developer (Alex Deucher) confirmed that this patch should have never been ported to 5.15 in the first place, so revert this commit from the 5.15 stable series. Reported-by: Barry Kauler Signed-off-by: Armin Wolf Link: https://lore.kernel.org/r/20240523173031.4212-1-w_ar...@gmx.de Signed-off-by: Greg Kroah-Hartman --- drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |8 1 file changed, 4 insertions(+), 4 deletions(-) --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct if (r) goto init_failed; + r = amdgpu_amdkfd_resume_iommu(adev); + if (r) + goto init_failed; + r = amdgpu_device_ip_hw_init_phase1(adev); if (r) goto init_failed; @@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct if (!adev->gmc.xgmi.pending_reset) amdgpu_amdkfd_device_init(adev); - r = amdgpu_amdkfd_resume_iommu(adev); - if (r) - goto init_failed; - amdgpu_fru_get_product_info(adev); init_failed: Patches currently in stable-queue which might be from w_ar...@gmx.de are queue-5.15/revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch
Re: [bug report] drm/amdgpu: amdgpu crash on playing videos, linux 6.10-rc
On Wed, 2024-06-12 at 15:14 +0200, Linux regression tracking (Thorsten Leemhuis) wrote: > On 06.06.24 05:06, Winston Ma wrote: > > Hi I got the same problem on Linux Kernel 6.10-rc2. I got the problem by > > following the procedure below: > > > > 1. Boot Linux Kernel 6.10-rc2 > > 2. Open Firefox (Any browser should work) > > 3. Open a Youtube Video > > 4. On the playing video, toggle fullscreen quickly Then after 10-20 > > times of fullscreen toggling, the screen would enter freeze mode. > > This is the log that I captured using the above method. > > Hmm, seems nothing happened here for a while. Could you maybe try to > bisect this > (https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html > )? > > @amd-gfx devs: Or is this unneeded, as the cause found or maybe even > fixed meanwhile? > > Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat) > -- > Everything you wanna know about Linux kernel regression tracking: > https://linux-regtracking.leemhuis.info/about/#tldr > If I did something stupid, please tell me, as explained on that page. > > #regzbot poke > > > This is the kernel log > > > > 2024-06-06T10:26:40.747576+08:00 kernel: gmc_v10_0_process_interrupt: 6 > > callbacks suppressed > > 2024-06-06T10:26:40.747618+08:00 kernel: amdgpu :03:00.0: amdgpu: > > [mmhub] page fault (src_id:0 ring:8 vmid:2 > > pasid:32789) > > 2024-06-06T10:26:40.747623+08:00 kernel: amdgpu :03:00.0: amdgpu: in > > process RDD Process pid 39524 thread > > firefox-bi:cs0 pid 40342 > > 2024-06-06T10:26:40.747625+08:00 kernel: amdgpu :03:00.0: amdgpu: in > > page starting at address > > 0x800106ffe000 from client 0x12 (VMC) > > 2024-06-06T10:26:40.747628+08:00 kernel: amdgpu :03:00.0: amdgpu: > > MMVM_L2_PROTECTION_FAULT_STATUS:0x00203811 > > 2024-06-06T10:26:40.747629+08:00 kernel: amdgpu :03:00.0: amdgpu: > > Faulty UTCL2 client ID: VCN (0x1c) > > 2024-06-06T10:26:40.747631+08:00 kernel: amdgpu :03:00.0: amdgpu: > > MORE_FAULTS: 0x1 > > 2024-06-06T10:26:40.747651+08:00 kernel: amdgpu :03:00.0: amdgpu: > > WALKER_ERROR: 0x0 > > 2024-06-06T10:26:40.747653+08:00 kernel: amdgpu :03:00.0: amdgpu: > > PERMISSION_FAULTS: 0x1 > > 2024-06-06T10:26:40.747655+08:00 kernel: amdgpu :03:00.0: amdgpu: > > MAPPING_ERROR: 0x0 > > 2024-06-06T10:26:40.747656+08:00 kernel: amdgpu :03:00.0: amdgpu: > > RW: 0x0 > > 2024-06-06T10:26:40.747658+08:00 kernel: amdgpu :03:00.0: amdgpu: > > [mmhub] page fault (src_id:0 ring:8 vmid:2 > > pasid:32789) > > 2024-06-06T10:26:40.747660+08:00 kernel: amdgpu :03:00.0: amdgpu: in > > process RDD Process pid 39524 thread > > firefox-bi:cs0 pid 40342 > > 2024-06-06T10:26:40.747662+08:00 kernel: amdgpu :03:00.0: amdgpu: in > > page starting at address > > 0x800106e0 from client 0x12 (VMC) > > 2024-06-06T10:26:40.747663+08:00 kernel: amdgpu :03:00.0: amdgpu: > > MMVM_L2_PROTECTION_FAULT_STATUS:0x > > 2024-06-06T10:26:40.747664+08:00 kernel: amdgpu :03:00.0: amdgpu: > > Faulty UTCL2 client ID: MP0 (0x0) > > 2024-06-06T10:26:40.747666+08:00 kernel: amdgpu :03:00.0: amdgpu: > > MORE_FAULTS: 0x0 > > 2024-06-06T10:26:40.747667+08:00 kernel: amdgpu :03:00.0: amdgpu: > > WALKER_ERROR: 0x0 > > 2024-06-06T10:26:40.747668+08:00 kernel: amdgpu :03:00.0: amdgpu: > > PERMISSION_FAULTS: 0x0 > > 2024-06-06T10:26:40.747670+08:00 kernel: amdgpu :03:00.0: amdgpu: > > MAPPING_ERROR: 0x0 > > 2024-06-06T10:26:40.747671+08:00 kernel: amdgpu :03:00.0: amdgpu: > > RW: 0x0 > > 2024-06-06T10:26:40.747674+08:00 kernel: amdgpu :03:00.0: amdgpu: > > [mmhub] page fault (src_id:0 ring:8 vmid:2 > > pasid:32789) > > 2024-06-06T10:26:40.747677+08:00 kernel: amdgpu :03:00.0: amdgpu: in > > process RDD Process pid 39524 thread > > firefox-bi:cs0 pid 40342 > > 2024-06-06T10:26:40.747680+08:00 kernel: amdgpu :03:00.0: amdgpu: in > > page starting at address > > 0x800106e07000 from client 0x12 (VMC) > > 2024-06-06T10:26:40.747683+08:00 kernel: amdgpu :03:00.0: amdgpu: > > MMVM_L2_PROTECTION_FAULT_STATUS:0x > > 2024-06-06T10:26:40.747686+08:00 kernel: amdgpu :03:00.0: amdgpu: > > Faulty UTCL2 client ID: MP0 (0x0) > > 2024-06-06T10:26:40.747688+08:00 kernel: amdgpu :03:00.0: amdgpu: > > MORE_FAULTS: 0x0 > > 2024-06-06T10:26:40.747691+08:00 kernel: amdgpu :03:00.0: amdgpu: > > WALKER_ERROR: 0x0 > > 2024-06-06T10:26:40.747693+08:00 kernel: amdgpu :03:00.0: amdgpu: > > PERMISSION_FAULTS: 0x0 > > 2024-06-06T10:26:40.747696+08:00 kernel: amdgpu :03:00.0: amdgpu: > > MAPPING_ERROR: 0x0 > > 2024-06-06T10:26:40.747698+08:00 kernel: amdgpu :03:00.0: amdgpu: > > RW: 0x0 > > 2024-06-06T10:26:40.747700+08:00 kernel: amdgpu :03:00.0: amdgpu: > > [mmhub] page fault (src_id:0 ring:8 vmid:2 > > pasid:3
[PATCH] drm/amdkfd: add ASIC version check for the reset selection of RAS poison
GFX v9.4.3 uses mode1 reset, other ASICs choose mode2. Signed-off-by: Tao Zhou --- drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c index 78dde62fb04a..816800555f7f 100644 --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c @@ -164,7 +164,10 @@ static void event_interrupt_poison_consumption_v9(struct kfd_node *dev, case SOC15_IH_CLIENTID_SE3SH: case SOC15_IH_CLIENTID_UTCL2: block = AMDGPU_RAS_BLOCK__GFX; - reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; + if (amdgpu_ip_version(dev->adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3)) + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; + else + reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; break; case SOC15_IH_CLIENTID_VMC: case SOC15_IH_CLIENTID_VMC1: @@ -177,7 +180,10 @@ static void event_interrupt_poison_consumption_v9(struct kfd_node *dev, case SOC15_IH_CLIENTID_SDMA3: case SOC15_IH_CLIENTID_SDMA4: block = AMDGPU_RAS_BLOCK__SDMA; - reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; + if (amdgpu_ip_version(dev->adev, GC_HWIP, 0) == IP_VERSION(9, 4, 3)) + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; + else + reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; break; default: dev_warn(dev->adev->dev, -- 2.34.1
Re: [PATCH] drm/amdkfd: add ASIC version check for the reset selection of RAS poison
On 6/13/2024 4:43 PM, Tao Zhou wrote: > GFX v9.4.3 uses mode1 reset, other ASICs choose mode2. > > Signed-off-by: Tao Zhou Acked-by: Lijo Lazar Thanks, Lijo > --- > drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 10 -- > 1 file changed, 8 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > index 78dde62fb04a..816800555f7f 100644 > --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c > @@ -164,7 +164,10 @@ static void event_interrupt_poison_consumption_v9(struct > kfd_node *dev, > case SOC15_IH_CLIENTID_SE3SH: > case SOC15_IH_CLIENTID_UTCL2: > block = AMDGPU_RAS_BLOCK__GFX; > - reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; > + if (amdgpu_ip_version(dev->adev, GC_HWIP, 0) == IP_VERSION(9, > 4, 3)) > + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; > + else > + reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; > break; > case SOC15_IH_CLIENTID_VMC: > case SOC15_IH_CLIENTID_VMC1: > @@ -177,7 +180,10 @@ static void event_interrupt_poison_consumption_v9(struct > kfd_node *dev, > case SOC15_IH_CLIENTID_SDMA3: > case SOC15_IH_CLIENTID_SDMA4: > block = AMDGPU_RAS_BLOCK__SDMA; > - reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; > + if (amdgpu_ip_version(dev->adev, GC_HWIP, 0) == IP_VERSION(9, > 4, 3)) > + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET; > + else > + reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET; > break; > default: > dev_warn(dev->adev->dev,
Re: [PATCH 4/5] drm/amdgpu: wait for gpu to complete reset
Am 13.06.24 um 04:25 schrieb YiPeng Chai: Add completion to wait for gpu to complete reset. Signed-off-by: YiPeng Chai --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12 drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h | 1 + 2 files changed, 13 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 7dfb2e548d70..341c9bd0d1a4 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -124,6 +124,8 @@ const char *get_ras_block_str(struct ras_common_if *ras_block) #define AMDGPU_RAS_RETIRE_PAGE_INTERVAL 100 //ms +#define MAX_GPU_RESET_COMPLETION_TIME 12 //ms + #define RAS_POISON_FIFO_MSG_PENDING_THRESHOLD (AMDGPU_RAS_POISON_FIFO_SIZE/4) enum amdgpu_ras_retire_page_reservation { @@ -2526,6 +2528,8 @@ static void amdgpu_ras_do_recovery(struct work_struct *work) atomic_set(&hive->ras_recovery, 0); amdgpu_put_xgmi_hive(hive); } + + complete(&ras->gpu_reset_completion); } /* alloc/realloc bps array */ @@ -2946,7 +2950,14 @@ static int amdgpu_ras_poison_consumption_handler(struct amdgpu_device *adev, con->gpu_reset_flags |= reset; } + reinit_completion(&con->gpu_reset_completion); + amdgpu_ras_reset_gpu(adev); + + if (!wait_for_completion_timeout(&con->gpu_reset_completion, + msecs_to_jiffies(MAX_GPU_RESET_COMPLETION_TIME))) + dev_err(adev->dev, "Waiting for GPU to complete reset timeout! reset:0x%x\n", + reset); Are there any looks taken here which the GPU reset might need as well? Regards, Christian. } return 0; @@ -3072,6 +3083,7 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev) } } + init_completion(&con->gpu_reset_completion); mutex_init(&con->page_rsv_lock); INIT_KFIFO(con->poison_fifo); mutex_init(&con->page_retirement_lock); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h index 103436bb650e..d5ddd0ca5de1 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h @@ -537,6 +537,7 @@ struct amdgpu_ras { DECLARE_KFIFO(poison_fifo, struct ras_poison_msg, AMDGPU_RAS_POISON_FIFO_SIZE); struct ras_ecc_log_info umc_ecc_log; struct delayed_work page_retirement_dwork; + struct completion gpu_reset_completion; /* Fatal error detected flag */ atomic_t fed;
Re: [PATCH 5/5] drm/amdgpu: add gpu reset check before page retirement thread runs
Am 13.06.24 um 04:25 schrieb YiPeng Chai: If gpu is recovering, clear all message reset flags in fifo and wait for gpu to complete recovery. Signed-off-by: YiPeng Chai --- drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12 1 file changed, 12 insertions(+) diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c index 341c9bd0d1a4..bf4f8d439ebe 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c @@ -2982,6 +2982,18 @@ static int amdgpu_ras_page_retirement_thread(void *param) atomic_dec(&con->page_retirement_req_cnt); + reinit_completion(&con->gpu_reset_completion); + + if (amdgpu_in_reset(adev) || atomic_read(&con->in_recovery)) { It's illegal to call amdgpu_in_reset() from outside of the hw specific backends. When you want to make the code mutual exclusive with GPU resets you need to grab the reset lock. Regards, Christian. + uint32_t reset; + + amdgpu_ras_clear_poison_fifo_msg_reset_flag(adev, &reset); + + if (!wait_for_completion_timeout(&con->gpu_reset_completion, + msecs_to_jiffies(MAX_GPU_RESET_COMPLETION_TIME))) + dev_err(adev->dev, "Waiting for GPU to complete reset timeout!\n"); + } + #ifdef HAVE_KFIFO_PUT_NON_POINTER if (!amdgpu_ras_get_poison_req(adev, &poison_msg)) continue;
Re: "firmware/sysfb: Set firmware-framebuffer parent device" breaks lightdm on Ubuntu 22.04 using amdgpu
On Thu, Jun 13, 2024 at 3:23 AM Thomas Zimmermann wrote: > > Hi > > Am 13.06.24 um 08:00 schrieb Marek Olšák: > > +amd-gfx > > > > On Thu, Jun 13, 2024 at 1:59 AM Marek Olšák wrote: > >> Hi Thomas, > >> > >> Commit 9eac534db0013aff9b9124985dab114600df9081 as per the title > >> breaks (crashes?) lightdm (login screen) such that all I get is the > >> terminal. It's also reproducible with tag v6.9 where the commit is > >> present. > >> > >> Reverting the commit fixes lightdm. A workaround is to bypass lightdm > >> by triggering auto-login. This is a bug report. > > I see. Do you know why it crashes? Or have any logs. How to debug this? I only know it's run through systemctl somehow. Marek
Re: "firmware/sysfb: Set firmware-framebuffer parent device" breaks lightdm on Ubuntu 22.04 using amdgpu
Hi Am 13.06.24 um 16:20 schrieb Marek Olšák: On Thu, Jun 13, 2024 at 3:23 AM Thomas Zimmermann wrote: Hi Am 13.06.24 um 08:00 schrieb Marek Olšák: +amd-gfx On Thu, Jun 13, 2024 at 1:59 AM Marek Olšák wrote: Hi Thomas, Commit 9eac534db0013aff9b9124985dab114600df9081 as per the title breaks (crashes?) lightdm (login screen) such that all I get is the terminal. It's also reproducible with tag v6.9 where the commit is present. Reverting the commit fixes lightdm. A workaround is to bypass lightdm by triggering auto-login. This is a bug report. I see. Do you know why it crashes? Or have any logs. How to debug this? I only know it's run through systemctl somehow. IDK what Ubuntu supports, but 'systemctl status' or 'journalctl' might turn up something. https://unix.stackexchange.com/questions/225401/how-to-see-full-log-from-systemctl-status-service From there, maybe with additional fprintf(stderr) output. Best regards Thomas Marek -- -- Thomas Zimmermann Graphics Driver Developer SUSE Software Solutions Germany GmbH Frankenstrasse 146, 90461 Nuernberg, Germany GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman HRB 36809 (AG Nuernberg)
[PATCH] drm/amd/display/dc: Remove dc code repetition
Code is repeated in functions optc1_enable_crtc (dc/optc/dcn10/dcn10_optc.c) and optc2_enable_crtc (dc/optc/dcn20/dcn20_optc.c). So, remove it with the creation of a macro. Signed-off-by: Joao Paulo Pereira da Silva --- .../amd/display/dc/optc/dcn10/dcn10_optc.c| 29 ++- .../amd/display/dc/optc/dcn10/dcn10_optc.h| 27 + .../amd/display/dc/optc/dcn20/dcn20_optc.c| 29 ++- 3 files changed, 33 insertions(+), 52 deletions(-) diff --git a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c index 5574bc628053..facdeeb41250 100644 --- a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c +++ b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c @@ -41,6 +41,8 @@ #define STATIC_SCREEN_EVENT_MASK_RANGETIMING_DOUBLE_BUFFER_UPDATE_EN 0x100 +#define OPTC_SRC_SEL_FIELD OPTC_SRC_SEL + /** * apply_front_porch_workaround() - This is a workaround for a bug that has * existed since R5xx and has not been fixed @@ -517,32 +519,7 @@ void optc1_enable_optc_clock(struct timing_generator *optc, bool enable) */ static bool optc1_enable_crtc(struct timing_generator *optc) { - /* TODO FPGA wait for answer -* OTG_MASTER_UPDATE_MODE != CRTC_MASTER_UPDATE_MODE -* OTG_MASTER_UPDATE_LOCK != CRTC_MASTER_UPDATE_LOCK -*/ - struct optc *optc1 = DCN10TG_FROM_TG(optc); - - /* opp instance for OTG. For DCN1.0, ODM is remoed. -* OPP and OPTC should 1:1 mapping -*/ - REG_UPDATE(OPTC_DATA_SOURCE_SELECT, - OPTC_SRC_SEL, optc->inst); - - /* VTG enable first is for HW workaround */ - REG_UPDATE(CONTROL, - VTG0_ENABLE, 1); - - REG_SEQ_START(); - - /* Enable CRTC */ - REG_UPDATE_2(OTG_CONTROL, - OTG_DISABLE_POINT_CNTL, 3, - OTG_MASTER_EN, 1); - - REG_SEQ_SUBMIT(); - REG_SEQ_WAIT_DONE(); - + _optc1_enable_crtc(optc); return true; } diff --git a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.h b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.h index 2f3bd7648ba7..aea80fa6fe91 100644 --- a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.h +++ b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.h @@ -604,4 +604,31 @@ struct dcn_optc_mask { void dcn10_timing_generator_init(struct optc *optc); +#define _optc1_enable_crtc(optc) \ + do {\ + /* TODO FPGA wait for answer */ \ + /* OTG_MASTER_UPDATE_MODE != CRTC_MASTER_UPDATE_MODE */ \ + /* OTG_MASTER_UPDATE_LOCK != CRTC_MASTER_UPDATE_LOCK */ \ + struct optc *optc1 = DCN10TG_FROM_TG(optc); \ + \ + /* opp instance for OTG. For DCN1.0, ODM is remoed. */ \ + /* OPP and OPTC should 1:1 mapping */ \ + REG_UPDATE(OPTC_DATA_SOURCE_SELECT, \ + OPTC_SRC_SEL_FIELD, optc->inst);\ + \ + /* VTG enable first is for HW workaround */ \ + REG_UPDATE(CONTROL, \ + VTG0_ENABLE, 1);\ + \ + REG_SEQ_START();\ + \ + /* Enable CRTC */ \ + REG_UPDATE_2(OTG_CONTROL, \ + OTG_DISABLE_POINT_CNTL, 3, \ + OTG_MASTER_EN, 1); \ + \ + REG_SEQ_SUBMIT(); \ + REG_SEQ_WAIT_DONE();\ + } while (0) + #endif /* __DC_TIMING_GENERATOR_DCN10_H__ */ diff --git a/drivers/gpu/drm/amd/display/dc/optc/dcn20/dcn20_optc.c b/drivers/gpu/drm/amd/display/dc/optc/dcn20/dcn20_optc.c index d6f095b4555d..012e0c52aeec 100644 --- a/drivers/gpu/drm/amd/display/dc/optc/dcn20/dcn20_optc.c +++ b/drivers/gpu/drm/amd/display/dc/optc/dcn20/dcn20_optc.c @@ -37,6 +37,8 @@ #define FN(reg_name, field_name) \ optc1->tg_shift->field_name, optc1->tg_mask->field_name +#define OPTC_SRC_SEL_FIELD OPTC_SEG0_SRC_SEL + /** * optc2_enable_crtc() - Enable CRTC - call ASIC Control Object to enable Timing generator. * @@ -47,32 +49,7 @@ */ bo
Re: [PATCH v5 2/3] drm: Allow drivers to choose plane types to async flip
Hi Dmitry, Em 12/06/2024 17:45, Dmitry Baryshkov escreveu: On Wed, Jun 12, 2024 at 04:37:12PM -0300, André Almeida wrote: Different planes may have different capabilities of doing async flips, so create a field to let drivers allow async flip per plane type. Signed-off-by: André Almeida --- drivers/gpu/drm/drm_atomic_uapi.c | 4 ++-- drivers/gpu/drm/drm_plane.c | 3 +++ include/drm/drm_plane.h | 5 + 3 files changed, 10 insertions(+), 2 deletions(-) diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c index 57662a1fd345..bbcec3940636 100644 --- a/drivers/gpu/drm/drm_plane.c +++ b/drivers/gpu/drm/drm_plane.c @@ -385,6 +385,9 @@ static int __drm_universal_plane_init(struct drm_device *dev, drm_modeset_lock_init(&plane->mutex); + if (type == DRM_PLANE_TYPE_PRIMARY) + plane->async_flip = true; + Why? Also note that the commit message writes about adding the field, not about enabling it for the primary planes. This is not meant to have any function change actually, just to enable per-plane configuration. Currently, any driver that supports async page flip in atomic API supports flipping the primary plane. But as Ville pointed out, that belongs to driver code, so I'll move there, hope that it makes more clear plane->base.properties = &plane->properties; plane->dev = dev; plane->funcs = funcs;
[linux-next:master] BUILD REGRESSION 6906a84c482f098d31486df8dc98cead21cce2d0
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master branch HEAD: 6906a84c482f098d31486df8dc98cead21cce2d0 Add linux-next specific files for 20240613 Error/Warning reports: https://lore.kernel.org/oe-kbuild-all/202406131636.ccrcjztc-...@intel.com Error/Warning: (recently discovered and may have been fixed) drivers/hwmon/pmbus/mp9941.c:60:33: error: call to undeclared function 'FIELD_PREP'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] drivers/hwmon/pmbus/mp9941.c:60:40: error: implicit declaration of function 'FIELD_PREP' [-Werror=implicit-function-declaration] drivers/hwmon/pmbus/mp9941.c:84:13: error: implicit declaration of function 'FIELD_GET' [-Werror=implicit-function-declaration] drivers/hwmon/pmbus/mp9941.c:84:6: error: call to undeclared function 'FIELD_GET'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] security/integrity/ima/ima_policy.c:430:10: error: too many arguments to function call, expected 4, have 5 Error/Warning ids grouped by kconfigs: gcc_recent_errors |-- alpha-allyesconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- arc-allmodconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- arc-allyesconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- arm64-randconfig-001-20240613 | `-- drivers-pinctrl-pinctrl-keembay.c:error:struct-function_desc-has-no-member-named-name |-- csky-allmodconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- csky-allyesconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- loongarch-defconfig | |-- drivers-gpu-drm-amd-amdgpu-..-display-dc-hubbub-dcn401-dcn401_hubbub.o:warning:objtool:unexpected-relocation-symbol-type-in-.rela.discard.reachable | `-- drivers-thermal-thermal_trip.o:warning:objtool:unexpected-relocation-symbol-type-in-.rela.discard.reachable |-- m68k-allmodconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- m68k-allyesconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- microblaze-allmodconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- microblaze-allyesconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- nios2-allmodconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- nios2-allyesconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- openrisc-allyesconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- parisc-allmodconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- parisc-allyesconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- sh-allmodconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- sh-allyesconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- sparc-allmodconfig | |-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET | `-- drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP |-- sparc64-allmodco
Re: [PATCH] drm/amd/display: Increase frame-larger-than warning limit
Hi Palmer (and AMD folks), On Tue, Jun 04, 2024 at 09:04:23AM -0700, Palmer Dabbelt wrote: > On Mon, 03 Jun 2024 15:29:48 PDT (-0700), nat...@kernel.org wrote: > > On Thu, May 30, 2024 at 07:57:42AM -0700, Palmer Dabbelt wrote: > > > From: Palmer Dabbelt > > > > > > I get a handful of build errors along the lines of > > > > > > > > > linux/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:58:13: > > > error: stack frame size (2352) exceeds limit (2048) in > > > 'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation' > > > [-Werror,-Wframe-larger-than] > > > static void > > > DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation( > > > ^ > > > > > > linux/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:1724:6: > > > error: stack frame size (2096) exceeds limit (2048) in > > > 'dml32_ModeSupportAndSystemConfigurationFull' > > > [-Werror,-Wframe-larger-than] > > > void dml32_ModeSupportAndSystemConfigurationFull(struct > > > display_mode_lib *mode_lib) > > > ^ > > > > Judging from the message, this is clang/LLVM? What version? > > Yes, LLVM. Looks like I'm on 16.0.6. Probably time for an update, so I'll > give it a shot. FWIW, I can reproduce this with tip of tree, I was just curious in case that ended up mattering. > > I assume > > this showed up in 6.10-rc1 because of commit 77acc6b55ae4 ("riscv: add > > support for kernel-mode FPU"), which allows this driver to be built for > > RISC-V. > > Seems reasonable. This didn't show up until post-merge, not 100% sure why. > I didn't really dig any farther. Perhaps you fast forwarded your tree to include that commit? > > Is this allmodconfig or some other configuration? > > IIRC both "allmodconfig" and "allyesconfig" show it, but I don't have a > build tree sitting around to be 100% sure. Yeah, allmodconfig triggers it. I was able to come up with a "trivial" reproducer using cvise (attached to this mail if you are curious) that has worse stack usage by a rough factor of 2: $ clang --target=riscv64-linux-gnu -O2 -Wall -Wframe-larger-than=512 -c -o /dev/null display_mode_vba_32.i display_mode_vba_32.i:598:6: warning: stack frame size (1264) exceeds limit (512) in 'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation' [-Wframe-larger-than] 598 | void DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation() { | ^ 1 warning generated. $ riscv64-linux-gcc -O2 -Wall -Wframe-larger-than=512 -c -o /dev/null display_mode_vba_32.i display_mode_vba_32.i: In function 'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation': display_mode_vba_32.i:1729:1: warning: the frame size of 528 bytes is larger than 512 bytes [-Wframe-larger-than=] 1729 | } | ^ I have not done too much further investigation but this is almost certainly the same issue that has come up before [1][2] with the AMD display code using functions with a large number of parameters, such that they have to passed on the stack, coupled with inlining (if I remember correctly, LLVM gives more of an inlining discount the less a function is used in a file). While clang does poorly with that code, I am not interested in continuing to fix this code new hardware revision after new hardware revision. We could just avoid this code like we do for arm64 for a similar reason: diff --git a/drivers/gpu/drm/amd/display/Kconfig b/drivers/gpu/drm/amd/display/Kconfig index 5fcd4f778dc3..64df713df878 100644 --- a/drivers/gpu/drm/amd/display/Kconfig +++ b/drivers/gpu/drm/amd/display/Kconfig @@ -8,7 +8,7 @@ config DRM_AMD_DC depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64 select SND_HDA_COMPONENT if SND_HDA_CORE # !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752 - select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || !CC_IS_CLANG) + select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!(ARM64 || RISCV) || !CC_IS_CLANG) help Choose this option if you want to use the new display engine support for AMDGPU. This adds required support for Vega and [1]: https://lore.kernel.org/20231019205117.GA839902@dev-arch.thelio-3990X/ [2]: https://lore.kernel.org/20220830203409.3491379-1-nat...@kernel.org/ Cheers, Nathan enum { false, true }; enum output_encoder_class { dm_dp2p0 }; enum output_format_class { dm_420 }; enum source_format_class { dm_444_32 }; enum scan_direction_class { dm_vert }; enum dm_swizzle_mode { dm_sw_linear }; enum clock_change_support { dm_std_cvt }; enum odm_combine_mode { dm_odm_combine_mode_2to1dm_odm_combine_mode_4to1 }; enum immediate_flip_requirement { dm_immediate_flip_not_required }; enum unbounded_requesting_policy { dm_unbounded_requesting_disable }; enum dm_rotation_angle { dm_rotation_270m };