RE: [PATCH] drm/amdkfd: use mode1 reset for RAS poison consumption

2024-06-13 Thread Zhang, Hawking
[AMD Official Use Only - AMD Internal Distribution Only]

Reviewed-by: Hawking Zhang 

Regards,
Hawking
-Original Message-
From: amd-gfx  On Behalf Of Tao Zhou
Sent: Thursday, June 13, 2024 14:57
To: amd-gfx@lists.freedesktop.org
Cc: Zhou1, Tao 
Subject: [PATCH] drm/amdkfd: use mode1 reset for RAS poison consumption

Per FW requirement, replace mode2 with mode1.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
index e1c21d250611..78dde62fb04a 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
@@ -164,7 +164,7 @@ static void event_interrupt_poison_consumption_v9(struct 
kfd_node *dev,
case SOC15_IH_CLIENTID_SE3SH:
case SOC15_IH_CLIENTID_UTCL2:
block = AMDGPU_RAS_BLOCK__GFX;
-   reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
+   reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
break;
case SOC15_IH_CLIENTID_VMC:
case SOC15_IH_CLIENTID_VMC1:
@@ -177,7 +177,7 @@ static void event_interrupt_poison_consumption_v9(struct 
kfd_node *dev,
case SOC15_IH_CLIENTID_SDMA3:
case SOC15_IH_CLIENTID_SDMA4:
block = AMDGPU_RAS_BLOCK__SDMA;
-   reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
+   reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
break;
default:
dev_warn(dev->adev->dev,
--
2.34.1



Re: "firmware/sysfb: Set firmware-framebuffer parent device" breaks lightdm on Ubuntu 22.04 using amdgpu

2024-06-13 Thread Thomas Zimmermann

Hi

Am 13.06.24 um 08:00 schrieb Marek Olšák:

+amd-gfx

On Thu, Jun 13, 2024 at 1:59 AM Marek Olšák  wrote:

Hi Thomas,

Commit 9eac534db0013aff9b9124985dab114600df9081 as per the title
breaks (crashes?) lightdm (login screen) such that all I get is the
terminal. It's also reproducible with tag v6.9 where the commit is
present.

Reverting the commit fixes lightdm. A workaround is to bypass lightdm
by triggering auto-login. This is a bug report.


I see. Do you know why it crashes? Or have any logs.

Best regards
Thomas



(For AMD folks: It's also reproducible with amd-staging-drm-next.)

Marek


--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)



Re: [PATCH] drm/amdkfd: use mode1 reset for RAS poison consumption

2024-06-13 Thread Lazar, Lijo



On 6/13/2024 12:27 PM, Tao Zhou wrote:
> Per FW requirement, replace mode2 with mode1.
> 
> Signed-off-by: Tao Zhou 
> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> index e1c21d250611..78dde62fb04a 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> @@ -164,7 +164,7 @@ static void event_interrupt_poison_consumption_v9(struct 
> kfd_node *dev,
>   case SOC15_IH_CLIENTID_SE3SH:
>   case SOC15_IH_CLIENTID_UTCL2:
>   block = AMDGPU_RAS_BLOCK__GFX;
> - reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
> + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
>   break;
>   case SOC15_IH_CLIENTID_VMC:
>   case SOC15_IH_CLIENTID_VMC1:
> @@ -177,7 +177,7 @@ static void event_interrupt_poison_consumption_v9(struct 
> kfd_node *dev,
>   case SOC15_IH_CLIENTID_SDMA3:
>   case SOC15_IH_CLIENTID_SDMA4:
>   block = AMDGPU_RAS_BLOCK__SDMA;
> - reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
> + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
>   break;

Does this need 9.4.3 IP version check?

Thanks,
Lijo
>   default:
>   dev_warn(dev->adev->dev,


Re: [PATCH 1/5] drm/amdgpu: add condition check for waking up thread

2024-06-13 Thread Lazar, Lijo



On 6/13/2024 7:55 AM, YiPeng Chai wrote:
> 1. Cannot add messages to fifo in gpu reset mode.
> 2. Only when the message is successfully saved to the
> fifo, the thread can be awakened.
> 

I think fifo should still cache the poison requests while in reset. Page
retirement thread may try to acquire the read side of reset lock and
wait if any reset is in progress.

Thanks
Lijo

> Signed-off-by: YiPeng Chai 
> ---
>  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 16 ++--
>  drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c | 18 +++---
>  2 files changed, 21 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> index d0dcd3d37e6d..ed260966363f 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
> @@ -2093,12 +2093,16 @@ static void 
> amdgpu_ras_interrupt_poison_creation_handler(struct ras_manager *obj
>   if (amdgpu_ip_version(obj->adev, UMC_HWIP, 0) >= IP_VERSION(12, 0, 0)) {
>   struct amdgpu_ras *con = amdgpu_ras_get_context(obj->adev);
>  
> - amdgpu_ras_put_poison_req(obj->adev,
> - AMDGPU_RAS_BLOCK__UMC, 0, NULL, NULL, false);
> -
> - atomic_inc(&con->page_retirement_req_cnt);
> -
> - wake_up(&con->page_retirement_wq);
> + if (!amdgpu_in_reset(obj->adev) && 
> !atomic_read(&con->in_recovery)) {
> + int ret;
> +
> + ret = amdgpu_ras_put_poison_req(obj->adev,
> + AMDGPU_RAS_BLOCK__UMC, 0, NULL, NULL, false);
> + if (!ret) {
> + atomic_inc(&con->page_retirement_req_cnt);
> + wake_up(&con->page_retirement_wq);
> + }
> + }
>   }
>  #endif
>  }
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c 
> b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> index 1dbe69eabb9a..94181ae85886 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_umc.c
> @@ -293,16 +293,20 @@ int amdgpu_umc_pasid_poison_handler(struct 
> amdgpu_device *adev,
>  
>   amdgpu_ras_error_data_fini(&err_data);
>   } else {
> - struct amdgpu_ras *con = 
> amdgpu_ras_get_context(adev);
> -
>  #ifdef HAVE_KFIFO_PUT_NON_POINTER
> - amdgpu_ras_put_poison_req(adev,
> - block, pasid, pasid_fn, data, reset);
> -#endif
> + struct amdgpu_ras *con = amdgpu_ras_get_context(adev);
>  
> - atomic_inc(&con->page_retirement_req_cnt);
> + if (!amdgpu_in_reset(adev) && 
> !atomic_read(&con->in_recovery)) {
> + int ret;
>  
> - wake_up(&con->page_retirement_wq);
> + ret = amdgpu_ras_put_poison_req(adev,
> + block, pasid, pasid_fn, data, reset);
> + if (!ret) {
> + 
> atomic_inc(&con->page_retirement_req_cnt);
> + wake_up(&con->page_retirement_wq);
> + }
> + }
> +#endif
>   }
>   } else {
>   if (adev->virt.ops && adev->virt.ops->ras_poison_handler)


RE: [PATCH] drm/amdkfd: use mode1 reset for RAS poison consumption

2024-06-13 Thread Zhou1, Tao
[AMD Official Use Only - AMD Internal Distribution Only]

> -Original Message-
> From: Lazar, Lijo 
> Sent: Thursday, June 13, 2024 4:07 PM
> To: Zhou1, Tao ; amd-gfx@lists.freedesktop.org
> Subject: Re: [PATCH] drm/amdkfd: use mode1 reset for RAS poison consumption
>
>
>
> On 6/13/2024 12:27 PM, Tao Zhou wrote:
> > Per FW requirement, replace mode2 with mode1.
> >
> > Signed-off-by: Tao Zhou 
> > ---
> >  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> > index e1c21d250611..78dde62fb04a 100644
> > --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> > +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> > @@ -164,7 +164,7 @@ static void
> event_interrupt_poison_consumption_v9(struct kfd_node *dev,
> > case SOC15_IH_CLIENTID_SE3SH:
> > case SOC15_IH_CLIENTID_UTCL2:
> > block = AMDGPU_RAS_BLOCK__GFX;
> > -   reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
> > +   reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
> > break;
> > case SOC15_IH_CLIENTID_VMC:
> > case SOC15_IH_CLIENTID_VMC1:
> > @@ -177,7 +177,7 @@ static void
> event_interrupt_poison_consumption_v9(struct kfd_node *dev,
> > case SOC15_IH_CLIENTID_SDMA3:
> > case SOC15_IH_CLIENTID_SDMA4:
> > block = AMDGPU_RAS_BLOCK__SDMA;
> > -   reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
> > +   reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
> > break;
>
> Does this need 9.4.3 IP version check?

[Tao] It's applicable to all gfx9 ASICs.

>
> Thanks,
> Lijo
> > default:
> > dev_warn(dev->adev->dev,


[BUG] 6.10-rc3 [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off

2024-06-13 Thread Mirsad Todorovac
Hi, all!

Running the vanilla torvalds tree kernel 6.10-rc3, there occurred an error in 
boot with
amdgpu.

Here is the complete output:

kernel: [8.704024] WARNING: CPU: 24 PID: 689 at 
drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1379 amdgpu_bo_release_notify 
(drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1382 (discriminator 1)) amdgpu
kernel: [8.704146] Modules linked in: binfmt_misc amd_atl intel_rapl_msr 
intel_rapl_common edac_mce_amd kvm_amd kvm snd_hda_codec_realtek 
snd_hda_codec_generic amdgpu(+) crct10dif_pclmul nls_iso8859_1 
snd_hda_scodec_component polyval_clmulni snd_hda_codec_hdmi polyval_generic 
ghash_clmulni_intel snd_hda_intel sha256_ssse3 sha1_ssse3 snd_intel_dspcfg 
snd_intel_sdw_acpi aesni_intel snd_hda_codec crypto_simd cryptd snd_seq_midi 
amdxcp snd_seq_midi_event snd_hda_core drm_exec gpu_sched joydev snd_rawmidi 
snd_hwdep rapl drm_buddy input_leds drm_suballoc_helper snd_seq drm_ttm_helper 
wmi_bmof snd_pcm snd_seq_device ttm k10temp ccp snd_timer drm_display_helper 
snd drm_kms_helper i2c_algo_bit soundcore mac_hid tcp_bbr sch_fq msr parport_pc 
ppdev lp parport efi_pstore drm ip_tables x_tables autofs4 btrfs 
blake2b_generic xor raid6_pq libcrc32c hid_generic usbhid hid nvme crc32_pclmul 
ahci i2c_piix4 nvme_core r8169 xhci_pci libahci xhci_pci_renesas realtek video 
wmi gpio_amdpt
kernel: [8.704200] CPU: 24 PID: 689 Comm: systemd-udevd Not tainted 
6.10.0-rc1-next-20240528 #1
kernel: [8.704202] Hardware name: ASRock X670E PG Lightning/X670E PG 
Lightning, BIOS 1.21 04/26/2023
kernel: [8.704203] RIP: 0010:amdgpu_bo_release_notify 
(drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1382 (discriminator 1)) amdgpu
kernel: [ 8.704324] Code: 0b e9 a3 fe ff ff 48 ba ff ff ff ff ff ff ff 7f 31 f6 
4c 89 ef e8 c2 c4 dc ee eb 99 e8 eb bc dc ee eb b2 0f 0b e9 3c fe ff ff <0f> 0b 
eb a7 be 03 00 00 00 e8 b4 4b a1 ee eb 9b e8 4d 5c 36 ef 66
All code

   0:   0b e9   or %ecx,%ebp
   2:   a3 fe ff ff 48 ba ffmovabs %eax,0xffba48fe
   9:   ff ff 
   b:   ff  (bad)  
   c:   ff  (bad)  
   d:   ff  (bad)  
   e:   ff  (bad)  
   f:   7f 31   jg 0x42
  11:   f6 4c 89 ef e8  testb  $0xe8,-0x11(%rcx,%rcx,4)
  16:   c2 c4 dcret$0xdcc4
  19:   ee  out%al,(%dx)
  1a:   eb 99   jmp0xffb5
  1c:   e8 eb bc dc ee  call   0xeedcbd0c
  21:   eb b2   jmp0xffd5
  23:   0f 0b   ud2
  25:   e9 3c fe ff ff  jmp0xfe66
  2a:*  0f 0b   ud2 <-- trapping instruction
  2c:   eb a7   jmp0xffd5
  2e:   be 03 00 00 00  mov$0x3,%esi
  33:   e8 b4 4b a1 ee  call   0xeea14bec
  38:   eb 9b   jmp0xffd5
  3a:   e8 4d 5c 36 ef  call   0xef365c8c
  3f:   66  data16

Code starting with the faulting instruction
===
   0:   0f 0b   ud2
   2:   eb a7   jmp0xffab
   4:   be 03 00 00 00  mov$0x3,%esi
   9:   e8 b4 4b a1 ee  call   0xeea14bc2
   e:   eb 9b   jmp0xffab
  10:   e8 4d 5c 36 ef  call   0xef365c62
  15:   66  data16
kernel: [8.704325] RSP: 0018:b74b014d3380 EFLAGS: 00010282
kernel: [8.704327] RAX: ffea RBX: 940781ec5c48 RCX: 

kernel: [8.704328] RDX:  RSI:  RDI: 

kernel: [8.704329] RBP: b74b014d33b8 R08:  R09: 

kernel: [8.704330] R10:  R11:  R12: 
9407dc80ef58
kernel: [8.704330] R13: 940781ec5c00 R14:  R15: 

kernel: [8.704331] FS:  783805ca28c0() GS:94169860() 
knlGS:
kernel: [8.704333] CS:  0010 DS:  ES:  CR0: 80050033
kernel: [8.704334] CR2: 7ec7be572000 CR3: 00010f7e4000 CR4: 
00750ef0
kernel: [8.704335] PKRU: 5554
kernel: [8.704335] Call Trace:
kernel: [8.704337]  
kernel: [8.704339] ? show_regs+0x71/0x90 
kernel: [8.704344] ? __warn+0x88/0x140 
kernel: [8.704347] ? amdgpu_bo_release_notify 
(drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1382 (discriminator 1)) amdgpu
kernel: [8.704464] ? report_bug+0x1ab/0x1c0 
kernel: [8.704468] ? handle_bug+0x46/0x90 
kernel: [8.704471] ? exc_invalid_op+0x19/0x80 
kernel: [8.704473] ? asm_exc_invalid_op+0x1b/0x20 
kernel: [8.704478] ? amdgpu_bo_release_notify 
(drivers/gpu/drm/amd/amdgpu/amdgpu_object.c:1382 (discriminator 1)) amdgpu
kernel: [8.704595] ttm_bo_release (drivers/gpu/d

Re: [bug report] drm/amdgpu: amdgpu crash on playing videos, linux 6.10-rc

2024-06-13 Thread Linux regression tracking (Thorsten Leemhuis)
On 06.06.24 05:06, Winston Ma wrote:
> Hi I got the same problem on Linux Kernel 6.10-rc2. I got the problem by
> following the procedure below:
> 
>  1. Boot Linux Kernel 6.10-rc2
>  2. Open Firefox (Any browser should work)
>  3. Open a Youtube Video
>  4. On the playing video, toggle fullscreen quickly Then after 10-20
> times of fullscreen toggling, the screen would enter freeze mode.
> This is the log that I captured using the above method.

Hmm, seems nothing happened here for a while. Could you maybe try to
bisect this
(https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html
)?

@amd-gfx devs: Or is this unneeded, as the cause found or maybe even
fixed meanwhile?

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

#regzbot poke

> This is the kernel log
> 
> 2024-06-06T10:26:40.747576+08:00 kernel: gmc_v10_0_process_interrupt: 6 
> callbacks suppressed
> 2024-06-06T10:26:40.747618+08:00 kernel: amdgpu :03:00.0: amdgpu: [mmhub] 
> page fault (src_id:0 ring:8 vmid:2 pasid:32789)
> 2024-06-06T10:26:40.747623+08:00 kernel: amdgpu :03:00.0: amdgpu:  in 
> process RDD Process pid 39524 thread firefox-bi:cs0 pid 40342
> 2024-06-06T10:26:40.747625+08:00 kernel: amdgpu :03:00.0: amdgpu:   in 
> page starting at address 0x800106ffe000 from client 0x12 (VMC)
> 2024-06-06T10:26:40.747628+08:00 kernel: amdgpu :03:00.0: amdgpu: 
> MMVM_L2_PROTECTION_FAULT_STATUS:0x00203811
> 2024-06-06T10:26:40.747629+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  Faulty UTCL2 client ID: VCN (0x1c)
> 2024-06-06T10:26:40.747631+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  MORE_FAULTS: 0x1
> 2024-06-06T10:26:40.747651+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  WALKER_ERROR: 0x0
> 2024-06-06T10:26:40.747653+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  PERMISSION_FAULTS: 0x1
> 2024-06-06T10:26:40.747655+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  MAPPING_ERROR: 0x0
> 2024-06-06T10:26:40.747656+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  RW: 0x0
> 2024-06-06T10:26:40.747658+08:00 kernel: amdgpu :03:00.0: amdgpu: [mmhub] 
> page fault (src_id:0 ring:8 vmid:2 pasid:32789)
> 2024-06-06T10:26:40.747660+08:00 kernel: amdgpu :03:00.0: amdgpu:  in 
> process RDD Process pid 39524 thread firefox-bi:cs0 pid 40342
> 2024-06-06T10:26:40.747662+08:00 kernel: amdgpu :03:00.0: amdgpu:   in 
> page starting at address 0x800106e0 from client 0x12 (VMC)
> 2024-06-06T10:26:40.747663+08:00 kernel: amdgpu :03:00.0: amdgpu: 
> MMVM_L2_PROTECTION_FAULT_STATUS:0x
> 2024-06-06T10:26:40.747664+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  Faulty UTCL2 client ID: MP0 (0x0)
> 2024-06-06T10:26:40.747666+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  MORE_FAULTS: 0x0
> 2024-06-06T10:26:40.747667+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  WALKER_ERROR: 0x0
> 2024-06-06T10:26:40.747668+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  PERMISSION_FAULTS: 0x0
> 2024-06-06T10:26:40.747670+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  MAPPING_ERROR: 0x0
> 2024-06-06T10:26:40.747671+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  RW: 0x0
> 2024-06-06T10:26:40.747674+08:00 kernel: amdgpu :03:00.0: amdgpu: [mmhub] 
> page fault (src_id:0 ring:8 vmid:2 pasid:32789)
> 2024-06-06T10:26:40.747677+08:00 kernel: amdgpu :03:00.0: amdgpu:  in 
> process RDD Process pid 39524 thread firefox-bi:cs0 pid 40342
> 2024-06-06T10:26:40.747680+08:00 kernel: amdgpu :03:00.0: amdgpu:   in 
> page starting at address 0x800106e07000 from client 0x12 (VMC)
> 2024-06-06T10:26:40.747683+08:00 kernel: amdgpu :03:00.0: amdgpu: 
> MMVM_L2_PROTECTION_FAULT_STATUS:0x
> 2024-06-06T10:26:40.747686+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  Faulty UTCL2 client ID: MP0 (0x0)
> 2024-06-06T10:26:40.747688+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  MORE_FAULTS: 0x0
> 2024-06-06T10:26:40.747691+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  WALKER_ERROR: 0x0
> 2024-06-06T10:26:40.747693+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  PERMISSION_FAULTS: 0x0
> 2024-06-06T10:26:40.747696+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  MAPPING_ERROR: 0x0
> 2024-06-06T10:26:40.747698+08:00 kernel: amdgpu :03:00.0: amdgpu: 
>  RW: 0x0
> 2024-06-06T10:26:40.747700+08:00 kernel: amdgpu :03:00.0: amdgpu: [mmhub] 
> page fault (src_id:0 ring:8 vmid:2 pasid:32789)
> 2024-06-06T10:26:40.747703+08:00 kernel: amdgpu :03:00.0: amdgpu:  in 
> process RDD Process pid 39524 thread firefox-bi:cs0 pid 40342
> 2024-06-06T10:26:40.747705+08:00 kernel: amdgpu :03:00.0: amdgpu:   in 
> page starting at address 0x800107001000 from cli

Re: [PATCH] Revert "drm/amdgpu: init iommu after amdkfd device init"

2024-06-13 Thread Greg KH
On Wed, Jun 12, 2024 at 12:10:37PM +1200, Matthew Ruffell wrote:
> Hi Greg KH, Sasha,
> 
> Please pick up this patch for 5.15 stable tree. I have built a test kernel and
> can confirm that it fixes affected users.
> 
> Downstream bug:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2068738

Sorry for the delay, now picked up.

greg k-h


Patch "Revert "drm/amdgpu: init iommu after amdkfd device init"" has been added to the 5.15-stable tree

2024-06-13 Thread gregkh


This is a note to let you know that I've just added the patch titled

Revert "drm/amdgpu: init iommu after amdkfd device init"

to the 5.15-stable tree which can be found at:

http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary

The filename of the patch is:
 revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch
and it can be found in the queue-5.15 subdirectory.

If you, or anyone else, feels it should not be added to the stable tree,
please let  know about it.


>From w_ar...@gmx.de  Wed Jun 12 14:43:21 2024
From: Armin Wolf 
Date: Thu, 23 May 2024 19:30:31 +0200
Subject: Revert "drm/amdgpu: init iommu after amdkfd device init"
To: alexander.deuc...@amd.com, christian.koe...@amd.com, xinhui@amd.com, 
gre...@linuxfoundation.org, sas...@kernel.org
Cc: sta...@vger.kernel.org, bkau...@gmail.com, yifan1.zh...@amd.com, 
prike.li...@amd.com, dri-de...@lists.freedesktop.org, 
amd-gfx@lists.freedesktop.org
Message-ID: <20240523173031.4212-1-w_ar...@gmx.de>

From: Armin Wolf 

This reverts commit 56b522f4668167096a50c39446d6263c96219f5f.

A user reported that this commit breaks the integrated gpu of his
notebook, causing a black screen. He was able to bisect the problematic
commit and verified that by reverting it the notebook works again.
He also confirmed that kernel 6.8.1 also works on his device, so the
upstream commit itself seems to be ok.

An amdgpu developer (Alex Deucher) confirmed that this patch should
have never been ported to 5.15 in the first place, so revert this
commit from the 5.15 stable series.

Reported-by: Barry Kauler 
Signed-off-by: Armin Wolf 
Link: https://lore.kernel.org/r/20240523173031.4212-1-w_ar...@gmx.de
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c |8 
 1 file changed, 4 insertions(+), 4 deletions(-)

--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c
@@ -2487,6 +2487,10 @@ static int amdgpu_device_ip_init(struct
if (r)
goto init_failed;
 
+   r = amdgpu_amdkfd_resume_iommu(adev);
+   if (r)
+   goto init_failed;
+
r = amdgpu_device_ip_hw_init_phase1(adev);
if (r)
goto init_failed;
@@ -2525,10 +2529,6 @@ static int amdgpu_device_ip_init(struct
if (!adev->gmc.xgmi.pending_reset)
amdgpu_amdkfd_device_init(adev);
 
-   r = amdgpu_amdkfd_resume_iommu(adev);
-   if (r)
-   goto init_failed;
-
amdgpu_fru_get_product_info(adev);
 
 init_failed:


Patches currently in stable-queue which might be from w_ar...@gmx.de are

queue-5.15/revert-drm-amdgpu-init-iommu-after-amdkfd-device-init.patch


Re: [bug report] drm/amdgpu: amdgpu crash on playing videos, linux 6.10-rc

2024-06-13 Thread Wang Yunchen
On Wed, 2024-06-12 at 15:14 +0200, Linux regression tracking (Thorsten 
Leemhuis) wrote:
> On 06.06.24 05:06, Winston Ma wrote:
> > Hi I got the same problem on Linux Kernel 6.10-rc2. I got the problem by
> > following the procedure below:
> > 
> >  1. Boot Linux Kernel 6.10-rc2
> >  2. Open Firefox (Any browser should work)
> >  3. Open a Youtube Video
> >  4. On the playing video, toggle fullscreen quickly Then after 10-20
> >     times of fullscreen toggling, the screen would enter freeze mode.
> >     This is the log that I captured using the above method.
> 
> Hmm, seems nothing happened here for a while. Could you maybe try to
> bisect this
> (https://docs.kernel.org/admin-guide/verify-bugs-and-bisect-regressions.html
> )?
> 
> @amd-gfx devs: Or is this unneeded, as the cause found or maybe even
> fixed meanwhile?
> 
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
> 
> #regzbot poke
> 
> > This is the kernel log
> > 
> > 2024-06-06T10:26:40.747576+08:00 kernel: gmc_v10_0_process_interrupt: 6 
> > callbacks suppressed
> > 2024-06-06T10:26:40.747618+08:00 kernel: amdgpu :03:00.0: amdgpu: 
> > [mmhub] page fault (src_id:0 ring:8 vmid:2
> > pasid:32789)
> > 2024-06-06T10:26:40.747623+08:00 kernel: amdgpu :03:00.0: amdgpu:  in 
> > process RDD Process pid 39524 thread
> > firefox-bi:cs0 pid 40342
> > 2024-06-06T10:26:40.747625+08:00 kernel: amdgpu :03:00.0: amdgpu:   in 
> > page starting at address
> > 0x800106ffe000 from client 0x12 (VMC)
> > 2024-06-06T10:26:40.747628+08:00 kernel: amdgpu :03:00.0: amdgpu: 
> > MMVM_L2_PROTECTION_FAULT_STATUS:0x00203811
> > 2024-06-06T10:26:40.747629+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  Faulty UTCL2 client ID: VCN (0x1c)
> > 2024-06-06T10:26:40.747631+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  MORE_FAULTS: 0x1
> > 2024-06-06T10:26:40.747651+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  WALKER_ERROR: 0x0
> > 2024-06-06T10:26:40.747653+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  PERMISSION_FAULTS: 0x1
> > 2024-06-06T10:26:40.747655+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  MAPPING_ERROR: 0x0
> > 2024-06-06T10:26:40.747656+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  RW: 0x0
> > 2024-06-06T10:26:40.747658+08:00 kernel: amdgpu :03:00.0: amdgpu: 
> > [mmhub] page fault (src_id:0 ring:8 vmid:2
> > pasid:32789)
> > 2024-06-06T10:26:40.747660+08:00 kernel: amdgpu :03:00.0: amdgpu:  in 
> > process RDD Process pid 39524 thread
> > firefox-bi:cs0 pid 40342
> > 2024-06-06T10:26:40.747662+08:00 kernel: amdgpu :03:00.0: amdgpu:   in 
> > page starting at address
> > 0x800106e0 from client 0x12 (VMC)
> > 2024-06-06T10:26:40.747663+08:00 kernel: amdgpu :03:00.0: amdgpu: 
> > MMVM_L2_PROTECTION_FAULT_STATUS:0x
> > 2024-06-06T10:26:40.747664+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  Faulty UTCL2 client ID: MP0 (0x0)
> > 2024-06-06T10:26:40.747666+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  MORE_FAULTS: 0x0
> > 2024-06-06T10:26:40.747667+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  WALKER_ERROR: 0x0
> > 2024-06-06T10:26:40.747668+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  PERMISSION_FAULTS: 0x0
> > 2024-06-06T10:26:40.747670+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  MAPPING_ERROR: 0x0
> > 2024-06-06T10:26:40.747671+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  RW: 0x0
> > 2024-06-06T10:26:40.747674+08:00 kernel: amdgpu :03:00.0: amdgpu: 
> > [mmhub] page fault (src_id:0 ring:8 vmid:2
> > pasid:32789)
> > 2024-06-06T10:26:40.747677+08:00 kernel: amdgpu :03:00.0: amdgpu:  in 
> > process RDD Process pid 39524 thread
> > firefox-bi:cs0 pid 40342
> > 2024-06-06T10:26:40.747680+08:00 kernel: amdgpu :03:00.0: amdgpu:   in 
> > page starting at address
> > 0x800106e07000 from client 0x12 (VMC)
> > 2024-06-06T10:26:40.747683+08:00 kernel: amdgpu :03:00.0: amdgpu: 
> > MMVM_L2_PROTECTION_FAULT_STATUS:0x
> > 2024-06-06T10:26:40.747686+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  Faulty UTCL2 client ID: MP0 (0x0)
> > 2024-06-06T10:26:40.747688+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  MORE_FAULTS: 0x0
> > 2024-06-06T10:26:40.747691+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  WALKER_ERROR: 0x0
> > 2024-06-06T10:26:40.747693+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  PERMISSION_FAULTS: 0x0
> > 2024-06-06T10:26:40.747696+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  MAPPING_ERROR: 0x0
> > 2024-06-06T10:26:40.747698+08:00 kernel: amdgpu :03:00.0: amdgpu:   
> >  RW: 0x0
> > 2024-06-06T10:26:40.747700+08:00 kernel: amdgpu :03:00.0: amdgpu: 
> > [mmhub] page fault (src_id:0 ring:8 vmid:2
> > pasid:3

[PATCH] drm/amdkfd: add ASIC version check for the reset selection of RAS poison

2024-06-13 Thread Tao Zhou
GFX v9.4.3 uses mode1 reset, other ASICs choose mode2.

Signed-off-by: Tao Zhou 
---
 drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
index 78dde62fb04a..816800555f7f 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
@@ -164,7 +164,10 @@ static void event_interrupt_poison_consumption_v9(struct 
kfd_node *dev,
case SOC15_IH_CLIENTID_SE3SH:
case SOC15_IH_CLIENTID_UTCL2:
block = AMDGPU_RAS_BLOCK__GFX;
-   reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
+   if (amdgpu_ip_version(dev->adev, GC_HWIP, 0) == IP_VERSION(9, 
4, 3))
+   reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
+   else
+   reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
break;
case SOC15_IH_CLIENTID_VMC:
case SOC15_IH_CLIENTID_VMC1:
@@ -177,7 +180,10 @@ static void event_interrupt_poison_consumption_v9(struct 
kfd_node *dev,
case SOC15_IH_CLIENTID_SDMA3:
case SOC15_IH_CLIENTID_SDMA4:
block = AMDGPU_RAS_BLOCK__SDMA;
-   reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
+   if (amdgpu_ip_version(dev->adev, GC_HWIP, 0) == IP_VERSION(9, 
4, 3))
+   reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
+   else
+   reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
break;
default:
dev_warn(dev->adev->dev,
-- 
2.34.1



Re: [PATCH] drm/amdkfd: add ASIC version check for the reset selection of RAS poison

2024-06-13 Thread Lazar, Lijo



On 6/13/2024 4:43 PM, Tao Zhou wrote:
> GFX v9.4.3 uses mode1 reset, other ASICs choose mode2.
> 
> Signed-off-by: Tao Zhou 

Acked-by: Lijo Lazar 

Thanks,
Lijo

> ---
>  drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c 
> b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> index 78dde62fb04a..816800555f7f 100644
> --- a/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> +++ b/drivers/gpu/drm/amd/amdkfd/kfd_int_process_v9.c
> @@ -164,7 +164,10 @@ static void event_interrupt_poison_consumption_v9(struct 
> kfd_node *dev,
>   case SOC15_IH_CLIENTID_SE3SH:
>   case SOC15_IH_CLIENTID_UTCL2:
>   block = AMDGPU_RAS_BLOCK__GFX;
> - reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
> + if (amdgpu_ip_version(dev->adev, GC_HWIP, 0) == IP_VERSION(9, 
> 4, 3))
> + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
> + else
> + reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
>   break;
>   case SOC15_IH_CLIENTID_VMC:
>   case SOC15_IH_CLIENTID_VMC1:
> @@ -177,7 +180,10 @@ static void event_interrupt_poison_consumption_v9(struct 
> kfd_node *dev,
>   case SOC15_IH_CLIENTID_SDMA3:
>   case SOC15_IH_CLIENTID_SDMA4:
>   block = AMDGPU_RAS_BLOCK__SDMA;
> - reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
> + if (amdgpu_ip_version(dev->adev, GC_HWIP, 0) == IP_VERSION(9, 
> 4, 3))
> + reset = AMDGPU_RAS_GPU_RESET_MODE1_RESET;
> + else
> + reset = AMDGPU_RAS_GPU_RESET_MODE2_RESET;
>   break;
>   default:
>   dev_warn(dev->adev->dev,


Re: [PATCH 4/5] drm/amdgpu: wait for gpu to complete reset

2024-06-13 Thread Christian König

Am 13.06.24 um 04:25 schrieb YiPeng Chai:

Add completion to wait for gpu to complete reset.

Signed-off-by: YiPeng Chai 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12 
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h |  1 +
  2 files changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 7dfb2e548d70..341c9bd0d1a4 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -124,6 +124,8 @@ const char *get_ras_block_str(struct ras_common_if 
*ras_block)
  
  #define AMDGPU_RAS_RETIRE_PAGE_INTERVAL 100  //ms
  
+#define MAX_GPU_RESET_COMPLETION_TIME  12 //ms

+
  #define RAS_POISON_FIFO_MSG_PENDING_THRESHOLD  (AMDGPU_RAS_POISON_FIFO_SIZE/4)
  
  enum amdgpu_ras_retire_page_reservation {

@@ -2526,6 +2528,8 @@ static void amdgpu_ras_do_recovery(struct work_struct 
*work)
atomic_set(&hive->ras_recovery, 0);
amdgpu_put_xgmi_hive(hive);
}
+
+   complete(&ras->gpu_reset_completion);
  }
  
  /* alloc/realloc bps array */

@@ -2946,7 +2950,14 @@ static int amdgpu_ras_poison_consumption_handler(struct 
amdgpu_device *adev,
con->gpu_reset_flags |= reset;
}
  
+		reinit_completion(&con->gpu_reset_completion);

+
amdgpu_ras_reset_gpu(adev);
+
+   if (!wait_for_completion_timeout(&con->gpu_reset_completion,
+   
msecs_to_jiffies(MAX_GPU_RESET_COMPLETION_TIME)))
+   dev_err(adev->dev, "Waiting for GPU to complete reset 
timeout! reset:0x%x\n",
+   reset);


Are there any looks taken here which the GPU reset might need as well?

Regards,
Christian.


}
  
  	return 0;

@@ -3072,6 +3083,7 @@ int amdgpu_ras_recovery_init(struct amdgpu_device *adev)
}
}
  
+	init_completion(&con->gpu_reset_completion);

mutex_init(&con->page_rsv_lock);
INIT_KFIFO(con->poison_fifo);
mutex_init(&con->page_retirement_lock);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
index 103436bb650e..d5ddd0ca5de1 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.h
@@ -537,6 +537,7 @@ struct amdgpu_ras {
DECLARE_KFIFO(poison_fifo, struct ras_poison_msg, 
AMDGPU_RAS_POISON_FIFO_SIZE);
struct ras_ecc_log_info  umc_ecc_log;
struct delayed_work page_retirement_dwork;
+   struct completion gpu_reset_completion;
  
  	/* Fatal error detected flag */

atomic_t fed;




Re: [PATCH 5/5] drm/amdgpu: add gpu reset check before page retirement thread runs

2024-06-13 Thread Christian König




Am 13.06.24 um 04:25 schrieb YiPeng Chai:

If gpu is recovering, clear all message reset flags
in fifo and wait for gpu to complete recovery.

Signed-off-by: YiPeng Chai 
---
  drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c | 12 
  1 file changed, 12 insertions(+)

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
index 341c9bd0d1a4..bf4f8d439ebe 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ras.c
@@ -2982,6 +2982,18 @@ static int amdgpu_ras_page_retirement_thread(void *param)
  
  		atomic_dec(&con->page_retirement_req_cnt);
  
+		reinit_completion(&con->gpu_reset_completion);

+
+   if (amdgpu_in_reset(adev) || atomic_read(&con->in_recovery)) {


It's illegal to call amdgpu_in_reset() from outside of the hw specific 
backends.


When you want to make the code mutual exclusive with GPU resets you need 
to grab the reset lock.


Regards,
Christian.


+   uint32_t reset;
+
+   amdgpu_ras_clear_poison_fifo_msg_reset_flag(adev, 
&reset);
+
+   if 
(!wait_for_completion_timeout(&con->gpu_reset_completion,
+   
msecs_to_jiffies(MAX_GPU_RESET_COMPLETION_TIME)))
+   dev_err(adev->dev, "Waiting for GPU to complete 
reset timeout!\n");
+   }
+
  #ifdef HAVE_KFIFO_PUT_NON_POINTER
if (!amdgpu_ras_get_poison_req(adev, &poison_msg))
continue;




Re: "firmware/sysfb: Set firmware-framebuffer parent device" breaks lightdm on Ubuntu 22.04 using amdgpu

2024-06-13 Thread Marek Olšák
On Thu, Jun 13, 2024 at 3:23 AM Thomas Zimmermann  wrote:
>
> Hi
>
> Am 13.06.24 um 08:00 schrieb Marek Olšák:
> > +amd-gfx
> >
> > On Thu, Jun 13, 2024 at 1:59 AM Marek Olšák  wrote:
> >> Hi Thomas,
> >>
> >> Commit 9eac534db0013aff9b9124985dab114600df9081 as per the title
> >> breaks (crashes?) lightdm (login screen) such that all I get is the
> >> terminal. It's also reproducible with tag v6.9 where the commit is
> >> present.
> >>
> >> Reverting the commit fixes lightdm. A workaround is to bypass lightdm
> >> by triggering auto-login. This is a bug report.
>
> I see. Do you know why it crashes? Or have any logs.

How to debug this? I only know it's run through systemctl somehow.

Marek


Re: "firmware/sysfb: Set firmware-framebuffer parent device" breaks lightdm on Ubuntu 22.04 using amdgpu

2024-06-13 Thread Thomas Zimmermann

Hi

Am 13.06.24 um 16:20 schrieb Marek Olšák:

On Thu, Jun 13, 2024 at 3:23 AM Thomas Zimmermann  wrote:

Hi

Am 13.06.24 um 08:00 schrieb Marek Olšák:

+amd-gfx

On Thu, Jun 13, 2024 at 1:59 AM Marek Olšák  wrote:

Hi Thomas,

Commit 9eac534db0013aff9b9124985dab114600df9081 as per the title
breaks (crashes?) lightdm (login screen) such that all I get is the
terminal. It's also reproducible with tag v6.9 where the commit is
present.

Reverting the commit fixes lightdm. A workaround is to bypass lightdm
by triggering auto-login. This is a bug report.

I see. Do you know why it crashes? Or have any logs.

How to debug this? I only know it's run through systemctl somehow.


IDK what Ubuntu supports, but 'systemctl status' or 'journalctl' might 
turn up something.


https://unix.stackexchange.com/questions/225401/how-to-see-full-log-from-systemctl-status-service

From there, maybe with additional fprintf(stderr) output.

Best regards
Thomas



Marek


--
--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Frankenstrasse 146, 90461 Nuernberg, Germany
GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman
HRB 36809 (AG Nuernberg)



[PATCH] drm/amd/display/dc: Remove dc code repetition

2024-06-13 Thread Joao Paulo Pereira da Silva
Code is repeated in functions optc1_enable_crtc
(dc/optc/dcn10/dcn10_optc.c) and optc2_enable_crtc
(dc/optc/dcn20/dcn20_optc.c).

So, remove it with the creation of a macro.

Signed-off-by: Joao Paulo Pereira da Silva 
---
 .../amd/display/dc/optc/dcn10/dcn10_optc.c| 29 ++-
 .../amd/display/dc/optc/dcn10/dcn10_optc.h| 27 +
 .../amd/display/dc/optc/dcn20/dcn20_optc.c| 29 ++-
 3 files changed, 33 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c 
b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c
index 5574bc628053..facdeeb41250 100644
--- a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.c
@@ -41,6 +41,8 @@
 
 #define STATIC_SCREEN_EVENT_MASK_RANGETIMING_DOUBLE_BUFFER_UPDATE_EN 0x100
 
+#define OPTC_SRC_SEL_FIELD OPTC_SRC_SEL
+
 /**
  * apply_front_porch_workaround() - This is a workaround for a bug that has
  *  existed since R5xx and has not been fixed
@@ -517,32 +519,7 @@ void optc1_enable_optc_clock(struct timing_generator 
*optc, bool enable)
  */
 static bool optc1_enable_crtc(struct timing_generator *optc)
 {
-   /* TODO FPGA wait for answer
-* OTG_MASTER_UPDATE_MODE != CRTC_MASTER_UPDATE_MODE
-* OTG_MASTER_UPDATE_LOCK != CRTC_MASTER_UPDATE_LOCK
-*/
-   struct optc *optc1 = DCN10TG_FROM_TG(optc);
-
-   /* opp instance for OTG. For DCN1.0, ODM is remoed.
-* OPP and OPTC should 1:1 mapping
-*/
-   REG_UPDATE(OPTC_DATA_SOURCE_SELECT,
-   OPTC_SRC_SEL, optc->inst);
-
-   /* VTG enable first is for HW workaround */
-   REG_UPDATE(CONTROL,
-   VTG0_ENABLE, 1);
-
-   REG_SEQ_START();
-
-   /* Enable CRTC */
-   REG_UPDATE_2(OTG_CONTROL,
-   OTG_DISABLE_POINT_CNTL, 3,
-   OTG_MASTER_EN, 1);
-
-   REG_SEQ_SUBMIT();
-   REG_SEQ_WAIT_DONE();
-
+   _optc1_enable_crtc(optc);
return true;
 }
 
diff --git a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.h 
b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.h
index 2f3bd7648ba7..aea80fa6fe91 100644
--- a/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.h
+++ b/drivers/gpu/drm/amd/display/dc/optc/dcn10/dcn10_optc.h
@@ -604,4 +604,31 @@ struct dcn_optc_mask {
 
 void dcn10_timing_generator_init(struct optc *optc);
 
+#define _optc1_enable_crtc(optc)   \
+   do {\
+   /* TODO FPGA wait for answer */ \
+   /* OTG_MASTER_UPDATE_MODE != CRTC_MASTER_UPDATE_MODE */ \
+   /* OTG_MASTER_UPDATE_LOCK != CRTC_MASTER_UPDATE_LOCK */ \
+   struct optc *optc1 = DCN10TG_FROM_TG(optc); \
+   \
+   /* opp instance for OTG. For DCN1.0, ODM is remoed. */  \
+   /* OPP and OPTC should 1:1 mapping */   \
+   REG_UPDATE(OPTC_DATA_SOURCE_SELECT, \
+   OPTC_SRC_SEL_FIELD, optc->inst);\
+   \
+   /* VTG enable first is for HW workaround */ \
+   REG_UPDATE(CONTROL, \
+   VTG0_ENABLE, 1);\
+   \
+   REG_SEQ_START();\
+   \
+   /* Enable CRTC */   \
+   REG_UPDATE_2(OTG_CONTROL,   \
+   OTG_DISABLE_POINT_CNTL, 3,  \
+   OTG_MASTER_EN, 1);  \
+   \
+   REG_SEQ_SUBMIT();   \
+   REG_SEQ_WAIT_DONE();\
+   } while (0)
+
 #endif /* __DC_TIMING_GENERATOR_DCN10_H__ */
diff --git a/drivers/gpu/drm/amd/display/dc/optc/dcn20/dcn20_optc.c 
b/drivers/gpu/drm/amd/display/dc/optc/dcn20/dcn20_optc.c
index d6f095b4555d..012e0c52aeec 100644
--- a/drivers/gpu/drm/amd/display/dc/optc/dcn20/dcn20_optc.c
+++ b/drivers/gpu/drm/amd/display/dc/optc/dcn20/dcn20_optc.c
@@ -37,6 +37,8 @@
 #define FN(reg_name, field_name) \
optc1->tg_shift->field_name, optc1->tg_mask->field_name
 
+#define OPTC_SRC_SEL_FIELD OPTC_SEG0_SRC_SEL
+
 /**
  * optc2_enable_crtc() - Enable CRTC - call ASIC Control Object to enable 
Timing generator.
  *
@@ -47,32 +49,7 @@
  */
 bo

Re: [PATCH v5 2/3] drm: Allow drivers to choose plane types to async flip

2024-06-13 Thread André Almeida

Hi Dmitry,

Em 12/06/2024 17:45, Dmitry Baryshkov escreveu:

On Wed, Jun 12, 2024 at 04:37:12PM -0300, André Almeida wrote:

Different planes may have different capabilities of doing async flips,
so create a field to let drivers allow async flip per plane type.

Signed-off-by: André Almeida 
---
  drivers/gpu/drm/drm_atomic_uapi.c | 4 ++--
  drivers/gpu/drm/drm_plane.c   | 3 +++
  include/drm/drm_plane.h   | 5 +
  3 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c
index 57662a1fd345..bbcec3940636 100644
--- a/drivers/gpu/drm/drm_plane.c
+++ b/drivers/gpu/drm/drm_plane.c
@@ -385,6 +385,9 @@ static int __drm_universal_plane_init(struct drm_device 
*dev,
  
  	drm_modeset_lock_init(&plane->mutex);
  
+	if (type == DRM_PLANE_TYPE_PRIMARY)

+   plane->async_flip = true;
+


Why? Also note that the commit message writes about adding the field,
not about enabling it for the primary planes.



This is not meant to have any function change actually, just to enable 
per-plane configuration. Currently, any driver that supports async page 
flip in atomic API supports flipping the primary plane.


But as Ville pointed out, that belongs to driver code, so I'll move 
there, hope that it makes more clear



plane->base.properties = &plane->properties;
plane->dev = dev;
plane->funcs = funcs;





[linux-next:master] BUILD REGRESSION 6906a84c482f098d31486df8dc98cead21cce2d0

2024-06-13 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 6906a84c482f098d31486df8dc98cead21cce2d0  Add linux-next specific 
files for 20240613

Error/Warning reports:

https://lore.kernel.org/oe-kbuild-all/202406131636.ccrcjztc-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

drivers/hwmon/pmbus/mp9941.c:60:33: error: call to undeclared function 
'FIELD_PREP'; ISO C99 and later do not support implicit function declarations 
[-Wimplicit-function-declaration]
drivers/hwmon/pmbus/mp9941.c:60:40: error: implicit declaration of function 
'FIELD_PREP' [-Werror=implicit-function-declaration]
drivers/hwmon/pmbus/mp9941.c:84:13: error: implicit declaration of function 
'FIELD_GET' [-Werror=implicit-function-declaration]
drivers/hwmon/pmbus/mp9941.c:84:6: error: call to undeclared function 
'FIELD_GET'; ISO C99 and later do not support implicit function declarations 
[-Wimplicit-function-declaration]
security/integrity/ima/ima_policy.c:430:10: error: too many arguments to 
function call, expected 4, have 5

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- arc-allmodconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- arc-allyesconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- arm64-randconfig-001-20240613
|   `-- 
drivers-pinctrl-pinctrl-keembay.c:error:struct-function_desc-has-no-member-named-name
|-- csky-allmodconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- csky-allyesconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- loongarch-defconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-hubbub-dcn401-dcn401_hubbub.o:warning:objtool:unexpected-relocation-symbol-type-in-.rela.discard.reachable
|   `-- 
drivers-thermal-thermal_trip.o:warning:objtool:unexpected-relocation-symbol-type-in-.rela.discard.reachable
|-- m68k-allmodconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- m68k-allyesconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- microblaze-allmodconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- microblaze-allyesconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- nios2-allmodconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- nios2-allyesconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- openrisc-allyesconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- parisc-allmodconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- parisc-allyesconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- sh-allmodconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- sh-allyesconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- sparc-allmodconfig
|   |-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_GET
|   `-- 
drivers-hwmon-pmbus-mp9941.c:error:implicit-declaration-of-function-FIELD_PREP
|-- sparc64-allmodco

Re: [PATCH] drm/amd/display: Increase frame-larger-than warning limit

2024-06-13 Thread Nathan Chancellor
Hi Palmer (and AMD folks),

On Tue, Jun 04, 2024 at 09:04:23AM -0700, Palmer Dabbelt wrote:
> On Mon, 03 Jun 2024 15:29:48 PDT (-0700), nat...@kernel.org wrote:
> > On Thu, May 30, 2024 at 07:57:42AM -0700, Palmer Dabbelt wrote:
> > > From: Palmer Dabbelt 
> > > 
> > > I get a handful of build errors along the lines of
> > > 
> > > 
> > > linux/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:58:13:
> > >  error: stack frame size (2352) exceeds limit (2048) in 
> > > 'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
> > >  [-Werror,-Wframe-larger-than]
> > > static void 
> > > DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation(
> > > ^
> > > 
> > > linux/drivers/gpu/drm/amd/amdgpu/../display/dc/dml/dcn32/display_mode_vba_32.c:1724:6:
> > >  error: stack frame size (2096) exceeds limit (2048) in 
> > > 'dml32_ModeSupportAndSystemConfigurationFull' 
> > > [-Werror,-Wframe-larger-than]
> > > void dml32_ModeSupportAndSystemConfigurationFull(struct 
> > > display_mode_lib *mode_lib)
> > >  ^
> > 
> > Judging from the message, this is clang/LLVM? What version?
> 
> Yes, LLVM.  Looks like I'm on 16.0.6.  Probably time for an update, so I'll
> give it a shot.

FWIW, I can reproduce this with tip of tree, I was just curious in case
that ended up mattering.

> > I assume
> > this showed up in 6.10-rc1 because of commit 77acc6b55ae4 ("riscv: add
> > support for kernel-mode FPU"), which allows this driver to be built for
> > RISC-V.
> 
> Seems reasonable.  This didn't show up until post-merge, not 100% sure why.
> I didn't really dig any farther.

Perhaps you fast forwarded your tree to include that commit?

> > Is this allmodconfig or some other configuration?
> 
> IIRC both "allmodconfig" and "allyesconfig" show it, but I don't have a
> build tree sitting around to be 100% sure.

Yeah, allmodconfig triggers it.

I was able to come up with a "trivial" reproducer using cvise (attached
to this mail if you are curious) that has worse stack usage by a rough
factor of 2:

  $ clang --target=riscv64-linux-gnu -O2 -Wall -Wframe-larger-than=512 -c -o 
/dev/null display_mode_vba_32.i
  display_mode_vba_32.i:598:6: warning: stack frame size (1264) exceeds limit 
(512) in 
'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation'
 [-Wframe-larger-than]
598 | void 
DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation()
 {
|  ^
  1 warning generated.

  $ riscv64-linux-gcc -O2 -Wall -Wframe-larger-than=512 -c -o /dev/null 
display_mode_vba_32.i
  display_mode_vba_32.i: In function 
'DISPCLKDPPCLKDCFCLKDeepSleepPrefetchParametersWatermarksAndPerformanceCalculation':
  display_mode_vba_32.i:1729:1: warning: the frame size of 528 bytes is larger 
than 512 bytes [-Wframe-larger-than=]
   1729 | }
| ^

I have not done too much further investigation but this is almost
certainly the same issue that has come up before [1][2] with the AMD
display code using functions with a large number of parameters, such
that they have to passed on the stack, coupled with inlining (if I
remember correctly, LLVM gives more of an inlining discount the less a
function is used in a file).

While clang does poorly with that code, I am not interested in
continuing to fix this code new hardware revision after new hardware
revision. We could just avoid this code like we do for arm64 for a
similar reason:

diff --git a/drivers/gpu/drm/amd/display/Kconfig 
b/drivers/gpu/drm/amd/display/Kconfig
index 5fcd4f778dc3..64df713df878 100644
--- a/drivers/gpu/drm/amd/display/Kconfig
+++ b/drivers/gpu/drm/amd/display/Kconfig
@@ -8,7 +8,7 @@ config DRM_AMD_DC
depends on BROKEN || !CC_IS_CLANG || ARM64 || RISCV || SPARC64 || X86_64
select SND_HDA_COMPONENT if SND_HDA_CORE
# !CC_IS_CLANG: https://github.com/ClangBuiltLinux/linux/issues/1752
-   select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!ARM64 || 
!CC_IS_CLANG)
+   select DRM_AMD_DC_FP if ARCH_HAS_KERNEL_FPU_SUPPORT && (!(ARM64 || 
RISCV) || !CC_IS_CLANG)
help
  Choose this option if you want to use the new display engine
  support for AMDGPU. This adds required support for Vega and

[1]: https://lore.kernel.org/20231019205117.GA839902@dev-arch.thelio-3990X/
[2]: https://lore.kernel.org/20220830203409.3491379-1-nat...@kernel.org/

Cheers,
Nathan
enum { false, true };
enum output_encoder_class { dm_dp2p0 };
enum output_format_class { dm_420 };
enum source_format_class { dm_444_32 };
enum scan_direction_class { dm_vert };
enum dm_swizzle_mode { dm_sw_linear };
enum clock_change_support { dm_std_cvt };
enum odm_combine_mode { dm_odm_combine_mode_2to1dm_odm_combine_mode_4to1 };
enum immediate_flip_requirement { dm_immediate_flip_not_required };
enum unbounded_requesting_policy { dm_unbounded_requesting_disable };
enum dm_rotation_angle { dm_rotation_270m };