Hi

On Fri, Nov 22, 2024 at 01:46:17AM +0100, Benjamin Drung wrote:
> On Thu, 2024-11-21 at 22:03 +0100, Salvatore Bonaccorso wrote:
> > Control: tags -1 + moreinfo
> > 
> > Hi Benjamin,
> > 
> > On Wed, Nov 20, 2024 at 02:22:42AM +0100, Benjamin Drung wrote:
> > > Package: linux
> > > Version: 6.11.9-1
> > > Severity: normal
> > > X-Debbugs-Cc: bdr...@debian.org
> > > 
> > > Dear Maintainer,
> > > 
> > > Running the dracut test TEST-60-NFS on Debian unstable with
> > > linux-image-6.11.9-amd64 fails with following kernel crash:
> > > 
> > > ```
> > > [   15.600535] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state 
> > > recovery directory
> > > [   15.602863] NFSD: Using legacy client tracking operations.
> > > [   15.603059] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state 
> > > recovery directory
> > > [   15.603569] ------------[ cut here ]------------
> > > [   15.603706] kernel BUG at fs/nfsd/nfs4recover.c:534!
> > > [   15.604360] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> > > [   15.604743] CPU: 0 UID: 0 PID: 471 Comm: rpc.nfsd Not tainted 
> > > 6.11.9-amd64 #1  Debian 6.11.9-1
> > > [   15.605019] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
> > > 1.16.3-debian-1.16.3-2 04/01/2014
> > > [   15.605337] RIP: 0010:nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd]
> > > [   15.606083] Code: 19 48 89 de 48 c7 c7 10 90 9c c0 e8 6d fb ff ff 89 
> > > c5 85 c0 0f 85 30 60 00 00 48 c7 c7 c0 af a3 c0 31 ed e8 25 b0 ca d2 eb 
> > > 07 <0f> 0b bd f4 ff ff ff 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 75
> > > [   15.606343] RSP: 0018:ff345c4e803fbb60 EFLAGS: 00010286
> > > [   15.606343] RAX: 0000000000000049 RBX: ff2fd43447182000 RCX: 
> > > 0000000000000003
> > > [   15.606343] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 
> > > 0000000000000001
> > > [   15.606343] RBP: ffffffff9525dd40 R08: 0000000000000000 R09: 
> > > ff345c4e803fb9f0
> > > [   15.606343] R10: ffffffff946b41e8 R11: 0000000000000003 R12: 
> > > ff2fd43447182000
> > > [   15.606343] R13: ff2fd43447182000 R14: ff2fd43469336c00 R15: 
> > > ff2fd43447182000
> > > [   15.606343] FS:  00007fe05a5e9740(0000) GS:ff2fd4347ce00000(0000) 
> > > knlGS:0000000000000000
> > > [   15.606343] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   15.606343] CR2: 0000559addf39db0 CR3: 000000002836e000 CR4: 
> > > 0000000000751ef0
> > > [   15.606343] PKRU: 55555554
> > > [   15.606343] Call Trace:
> > > [   15.606343]  <TASK>
> > > [   15.606343]  ? __die_body.cold+0x19/0x27
> > > [   15.606343]  ? die+0x2e/0x50
> > > [   15.606343]  ? do_trap+0xca/0x110
> > > [   15.606343]  ? do_error_trap+0x6a/0x90
> > > [   15.606343]  ? nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd]
> > > [   15.606343]  ? exc_invalid_op+0x50/0x70
> > > [   15.606343]  ? nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd]
> > > [   15.606343]  ? asm_exc_invalid_op+0x1a/0x20
> > > [   15.606343]  ? nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd]
> > > [   15.606343]  nfsd4_client_tracking_init+0x57/0x1b0 [nfsd]
> > > [   15.606343]  nfs4_state_start_net+0x2f9/0x3a0 [nfsd]
> > > [   15.606343]  nfsd_svc+0x1b9/0x340 [nfsd]
> > > [   15.606343]  write_threads+0xfc/0x1c0 [nfsd]
> > > [   15.606343]  ? __pfx_write_threads+0x10/0x10 [nfsd]
> > > [   15.606343]  nfsctl_transaction_write+0x4d/0x80 [nfsd]
> > > [   15.606343]  vfs_write+0xfe/0x460
> > > [   15.606343]  ksys_write+0x6d/0xf0
> > > [   15.606343]  do_syscall_64+0x82/0x190
> > > [   15.606343]  ? syscall_exit_to_user_mode+0x4d/0x210
> > > [   15.606343]  ? do_syscall_64+0x8e/0x190
> > > [   15.606343]  ? __x64_sys_getdents64+0xfa/0x130
> > > [   15.606343]  ? __pfx_filldir64+0x10/0x10
> > > [   15.606343]  ? syscall_exit_to_user_mode+0x4d/0x210
> > > [   15.606343]  ? do_syscall_64+0x8e/0x190
> > > [   15.606343]  ? __count_memcg_events+0x58/0xf0
> > > [   15.606343]  ? count_memcg_events.constprop.0+0x1a/0x30
> > > [   15.606343]  ? handle_mm_fault+0x1bb/0x2c0
> > > [   15.606343]  ? do_user_addr_fault+0x36c/0x620
> > > [   15.606343]  ? exc_page_fault+0x7e/0x180
> > > [   15.606343]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > [   15.606343] RIP: 0033:0x7fe05a6f0210
> > > [   15.606343] Code: 2c 0e 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 66 
> > > 2e 0f 1f 84 00 00 00 00 00 80 3d 59 ae 0e 00 00 74 17 b8 01 00 00 00 0f 
> > > 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
> > > [   15.606343] RSP: 002b:00007fff649d2b08 EFLAGS: 00000202 ORIG_RAX: 
> > > 0000000000000001
> > > [   15.606343] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 
> > > 00007fe05a6f0210
> > > [   15.606343] RDX: 0000000000000002 RSI: 000056540dbbb340 RDI: 
> > > 0000000000000003
> > > [   15.606343] RBP: 000056540dbbb340 R08: 0000000000000064 R09: 
> > > 00000000ffffffff
> > > [   15.606343] R10: 0000000000000000 R11: 0000000000000202 R12: 
> > > 0000000000020000
> > > [   15.606343] R13: 000056540dbb7116 R14: 000056543353a2a0 R15: 
> > > 0000000000000000
> > > [   15.606343]  </TASK>
> > > [   15.606343] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace 
> > > ext4 crc16 mbcache jbd2 crc32c_generic sd_mod ahci libahci libata 
> > > virtio_scsi scsi_mod crc32_pclmul crc32c_intel scsi_common virtio_net 
> > > net_failover failover i6300esb watchdog sunrpc qemu_fw_cfg virtio_rng 
> > > autofs4
> > > [   15.618032] ---[ end trace 0000000000000000 ]---
> > > [   15.618166] RIP: 0010:nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd]
> > > [   15.618718] Code: 19 48 89 de 48 c7 c7 10 90 9c c0 e8 6d fb ff ff 89 
> > > c5 85 c0 0f 85 30 60 00 00 48 c7 c7 c0 af a3 c0 31 ed e8 25 b0 ca d2 eb 
> > > 07 <0f> 0b bd f4 ff ff ff 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 75
> > > [   15.619086] RSP: 0018:ff345c4e803fbb60 EFLAGS: 00010286
> > > [   15.619198] RAX: 0000000000000049 RBX: ff2fd43447182000 RCX: 
> > > 0000000000000003
> > > [   15.619336] RDX: 0000000000000000 RSI: 0000000000000003 RDI: 
> > > 0000000000000001
> > > [   15.619472] RBP: ffffffff9525dd40 R08: 0000000000000000 R09: 
> > > ff345c4e803fb9f0
> > > [   15.619609] R10: ffffffff946b41e8 R11: 0000000000000003 R12: 
> > > ff2fd43447182000
> > > [   15.619746] R13: ff2fd43447182000 R14: ff2fd43469336c00 R15: 
> > > ff2fd43447182000
> > > [   15.619888] FS:  00007fe05a5e9740(0000) GS:ff2fd4347ce00000(0000) 
> > > knlGS:0000000000000000
> > > [   15.620045] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [   15.620158] CR2: 0000559addf39db0 CR3: 000000002836e000 CR4: 
> > > 0000000000751ef0
> > > [   15.620296] PKRU: 55555554
> > > [   15.620469] Kernel panic - not syncing: Fatal exception
> > > [   15.621342] Kernel Offset: 0x11a00000 from 0xffffffff81000000 
> > > (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> > > ```
> > > 
> > > This crash is 100% reproducible and I can easily test different kernels.
> > > The TEST-60-NFS works fine on Ubuntu oracular.
> > > linux-image-6.12-rc6-amd64 6.12~rc6-1~exp1 from experimental is affected
> > > as well.
> > 
> > Just to be clear, is this something you freshly hit with those version
> > or was the problem present before? If you have a last good version,
> > would you be able to bisect the changes to identify the culprit
> > introducing the issue?
> 
> I hit this bug when I tried to introduce the nfs autopkgtest. I don't
> know a good version in Debian. I pushed the this upstream-dracut-
> network-nfs autopkgtest for dracut to the debian-nfs branch:
> https://salsa.debian.org/debian/dracut/-/commits/debian-nfs?ref_type=heads
> Test:
> https://salsa.debian.org/debian/dracut/-/commit/a5b1da9ff33d412cc886408c3e6cafec265d6e29
> So you should be able to reproduce it.
> 
> The same test case upstream-dracut-network-nfs works on Ubuntu with
> linux 6.11.0-8.8:
> https://autopkgtest.ubuntu.com/results/autopkgtest-plucky/plucky/amd64/d/dracut/20241121_232300_a5f72@/log.gz
> 
> > I have so far not found an already known regression report specific to
> > this recently but there is a report back in august we found as 
> > https://lore.kernel.org/all/23faefd973c63f9b0ec8a735acb1ff1409776163.ca...@linuxfoundation.org/
> 
> Yes, that looks similar.
> 
> > In any case since you can reliably reproduce the issue, can you please
> > report it to upstream (linux-nfs list and relevant maintainers)?
> 
> I can do that.

So far I was not able to reproduce it, but this is because my
autopkgtest already fails before we reach that point.

What would be ideal is if we can break-down the trigger to something
which I can handle easier to forward to the linux-nfs list for further
debugging.

I will continue investigating it.

Regards,
Salvatore

Reply via email to