Re: Bug#1061449: linux-image-6.7-amd64: a boot message from amdgpu

2024-01-29 Thread Salvatore Bonaccorso
Hi

In Debian (https://bugs.debian.org/1061449) we got the following
quotred report:

On Wed, Jan 24, 2024 at 07:38:16PM +0100, Patrice Duroux wrote:
> Package: src:linux
> Version: 6.7.1-1~exp1
> Severity: normal
> 
> Dear Maintainer,
> 
> Giving a try to 6.7, here is a message extracted from dmesg:
> 
> [4.177226] [ cut here ]
> [4.177227] WARNING: CPU: 6 PID: 248 at
> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:387
> construct_phy+0xb26/0xd60 [amdgpu]
> [4.177658] Modules linked in: amdgpu(+) i915(+) sd_mod drm_exec amdxcp
> gpu_sched drm_buddy nvme i2c_algo_bit drm_suballoc_helper drm_display_helper
> ahci nvme_core hid_generic crc32_pclmul libahci crc32c_intel t10_pi cec libata
> crc64_rocksoft_generic ghash_clmulni_intel rc_core drm_ttm_helper
> crc64_rocksoft sha512_ssse3 i2c_hid_acpi ttm rtsx_pci_sdmmc i2c_hid xhci_pci
> crc_t10dif sha512_generic mmc_core scsi_mod xhci_hcd drm_kms_helper video hid
> crct10dif_generic intel_lpss_pci crct10dif_pclmul i2c_i801 sha256_ssse3
> intel_lpss crc64 thunderbolt drm e1000e usbcore sha1_ssse3 rtsx_pci i2c_smbus
> scsi_common crct10dif_common idma64 usb_common battery wmi button aesni_intel
> crypto_simd cryptd
> [4.177689] CPU: 6 PID: 248 Comm: (udev-worker) Not tainted 6.7-amd64 #1
> Debian 6.7.1-1~exp1
> [4.177691] Hardware name: Dell Inc. Precision 7540/0T2FXT, BIOS 1.29.0
> 11/03/2023
> [4.177692] RIP: 0010:construct_phy+0xb26/0xd60 [amdgpu]
> [4.178050] Code: b9 01 00 00 00 83 fe 01 74 40 48 8b 82 f8 03 00 00 89 f2
> 48 c7 c6 00 35 a7 c1 48 8b 40 10 48 8b 00 48 8b 78 08 e8 ba b7 5b fb <0f> 0b 
> 49
> 8b 87 d0 01 00 00 b9 0f 00 00 00 48 8b 80 e8 04 00 00 48
> [4.178052] RSP: 0018:aad300857408 EFLAGS: 00010246
> [4.178053] RAX:  RBX: 96df636a1700 RCX:
> c000efff
> [4.178054] RDX:  RSI: efff RDI:
> 0001
> [4.178055] RBP: 96df4d379c00 R08:  R09:
> aad3008571d0
> [4.178056] R10: 0003 R11: bded2428 R12:
> aad300857474
> [4.178057] R13: c1933140 R14: aad3008577d0 R15:
> 96df43e82000
> [4.178058] FS:  7fcd5d9648c0() GS:96e2cc38()
> knlGS:
> [4.178060] CS:  0010 DS:  ES:  CR0: 80050033
> [4.178061] CR2: 7fcd5d932a6d CR3: 000103e9a004 CR4:
> 003706f0
> [4.178062] DR0:  DR1:  DR2:
> 
> [4.178063] DR3:  DR6: fffe0ff0 DR7:
> 0400
> [4.178063] Call Trace:
> [4.178066]  
> [4.178067]  ? construct_phy+0xb26/0xd60 [amdgpu]
> [4.178422]  ? __warn+0x81/0x130
> [4.178426]  ? construct_phy+0xb26/0xd60 [amdgpu]
> [4.178784]  ? report_bug+0x171/0x1a0
> [4.178787]  ? handle_bug+0x3c/0x80
> [4.178789]  ? exc_invalid_op+0x17/0x70
> [4.178790]  ? asm_exc_invalid_op+0x1a/0x20
> [4.178793]  ? construct_phy+0xb26/0xd60 [amdgpu]
> [4.179149]  ? construct_phy+0xb26/0xd60 [amdgpu]
> [4.179507]  link_create+0x1b2/0x200 [amdgpu]
> [4.179865]  create_links+0x135/0x420 [amdgpu]
> [4.180196]  dc_create+0x321/0x640 [amdgpu]
> [4.180529]  amdgpu_dm_init.isra.0+0x2a0/0x1ed0 [amdgpu]
> [4.180881]  ? sysvec_apic_timer_interrupt+0xe/0x90
> [4.180883]  ? asm_sysvec_apic_timer_interrupt+0x1a/0x20
> [4.180885]  ? delay_tsc+0x37/0xa0
> [4.180889]  dm_hw_init+0x12/0x30 [amdgpu]
> [4.181240]  amdgpu_device_init+0x1e42/0x24a0 [amdgpu]
> [4.181517]  amdgpu_driver_load_kms+0x19/0x190 [amdgpu]
> [4.181793]  amdgpu_pci_probe+0x165/0x4c0 [amdgpu]
> [4.182067]  local_pci_probe+0x42/0xa0
> [4.182070]  pci_device_probe+0xc7/0x240
> [4.182072]  really_probe+0x19b/0x3e0
> [4.182075]  ? __pfx___driver_attach+0x10/0x10
> [4.182076]  __driver_probe_device+0x78/0x160
> [4.182078]  driver_probe_device+0x1f/0x90
> [4.182079]  __driver_attach+0xd2/0x1c0
> [4.182081]  bus_for_each_dev+0x85/0xd0
> [4.182083]  bus_add_driver+0x116/0x220
> [4.182085]  driver_register+0x59/0x100
> [4.182087]  ? __pfx_amdgpu_init+0x10/0x10 [amdgpu]
> [4.182356]  do_one_initcall+0x58/0x320
> [4.182359]  do_init_module+0x60/0x240
> [4.182361]  init_module_from_file+0x89/0xe0
> [4.182364]  idempotent_init_module+0x120/0x2b0
> [4.182366]  __x64_sys_finit_module+0x5e/0xb0
> [4.182367]  do_syscall_64+0x61/0x120
> [4.182370]  ? do_syscall_64+0x70/0x120
> [4.182372]  entry_SYSCALL_64_after_hwframe+0x6e/0x76
> [4.182375] RIP: 0033:0x7fcd5e130f19
> [4.182376] Code: 08 89 e8 5b 5d c3 66 2e 0f 1f 84 00 00 00 00 00 90 48 89
> f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 
> 01
> f0 ff ff 73 01 c3 48 8b 0d cf 1e 0d 00 f7 d8 64 89 01 48
> [4.182378] RSP: 002b:7ffd314afa38 EFLAGS: 0246 ORIG_RAX:
> 0139
> [4.182379] RAX: ffda RBX:

Re: Bug#1061449: linux-image-6.7-amd64: a boot message from amdgpu

2024-01-30 Thread Salvatore Bonaccorso
Hi,

[for this reply dropping the Debian bugreport to avoid later followups
sending the ack to the mailinglist and adding noise]

On Sun, Jan 28, 2024 at 11:44:59AM +0100, Linux regression tracking (Thorsten 
Leemhuis) wrote:
> On 27.01.24 14:14, Salvatore Bonaccorso wrote:
> >
> > In Debian (https://bugs.debian.org/1061449) we got the following
> > quotred report:
> > 
> > On Wed, Jan 24, 2024 at 07:38:16PM +0100, Patrice Duroux wrote:
> >>
> >> Giving a try to 6.7, here is a message extracted from dmesg:
> >> [4.177226] [ cut here ]
> >> [4.177227] WARNING: CPU: 6 PID: 248 at
> >> drivers/gpu/drm/amd/amdgpu/../display/dc/link/link_factory.c:387
> >> construct_phy+0xb26/0xd60 [amdgpu]
> > [...]
> 
> Not my area of expertise, but looks a lot like a duplicate of
> https://gitlab.freedesktop.org/drm/amd/-/issues/3122#note_2252835
> 
> Mario (now CCed) already prepared a patch for that issue that seems to work.

#regzbot link: https://gitlab.freedesktop.org/drm/amd/-/issues/3122

Thanks. Indeed the reporter confirmed in
https://bugs.debian.org/1061449#55 that the patch fixes the issue.

So a duplicate of the above.

Regards,
Salvatore


Regression from 3c196f056666 ("drm/amdgpu: always reset the asic in suspend (v2)") on suspend?

2022-02-14 Thread Salvatore Bonaccorso
Hi Alex, hi all

In Debian we got a regression report from Dominique Dumont, CC'ed in
https://bugs.debian.org/1005005 that afer an update to 5.15.15 based
kernel, his machine noe longer suspends correctly, after screen going
black as usual it comes back. The Debian bug above contians a trace.

Dominique confirmed that this issue persisted after updating to 5.16.7
furthermore he bisected the issue and found 

3c196f0510912645c7c5d9107706003f67c3 is the first bad commit
commit 3c196f0510912645c7c5d9107706003f67c3
Author: Alex Deucher 
Date:   Fri Nov 12 11:25:30 2021 -0500

drm/amdgpu: always reset the asic in suspend (v2)

[ Upstream commit daf8de0874ab5b74b38a38726fdd3d07ef98a7ee ]

If the platform suspend happens to fail and the power rail
is not turned off, the GPU will be in an unknown state on
resume, so reset the asic so that it will be in a known
good state on resume even if the platform suspend failed.

v2: handle s0ix

Acked-by: Luben Tuikov 
Acked-by: Evan Quan 
Signed-off-by: Alex Deucher 
Signed-off-by: Sasha Levin 

 drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

to be the first bad commit, see https://bugs.debian.org/1005005#34 .

Does this ring any bell? Any idea on the problem?

Regards,
Salvatore