Control: tags -1 + moreinfo Hi Benjamin,
On Wed, Nov 20, 2024 at 02:22:42AM +0100, Benjamin Drung wrote: > Package: linux > Version: 6.11.9-1 > Severity: normal > X-Debbugs-Cc: bdr...@debian.org > > Dear Maintainer, > > Running the dracut test TEST-60-NFS on Debian unstable with > linux-image-6.11.9-amd64 fails with following kernel crash: > > ``` > [ 15.600535] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state > recovery directory > [ 15.602863] NFSD: Using legacy client tracking operations. > [ 15.603059] NFSD: Using /var/lib/nfs/v4recovery as the NFSv4 state > recovery directory > [ 15.603569] ------------[ cut here ]------------ > [ 15.603706] kernel BUG at fs/nfsd/nfs4recover.c:534! > [ 15.604360] Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > [ 15.604743] CPU: 0 UID: 0 PID: 471 Comm: rpc.nfsd Not tainted 6.11.9-amd64 > #1 Debian 6.11.9-1 > [ 15.605019] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > 1.16.3-debian-1.16.3-2 04/01/2014 > [ 15.605337] RIP: 0010:nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd] > [ 15.606083] Code: 19 48 89 de 48 c7 c7 10 90 9c c0 e8 6d fb ff ff 89 c5 85 > c0 0f 85 30 60 00 00 48 c7 c7 c0 af a3 c0 31 ed e8 25 b0 ca d2 eb 07 <0f> 0b > bd f4 ff ff ff 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 75 > [ 15.606343] RSP: 0018:ff345c4e803fbb60 EFLAGS: 00010286 > [ 15.606343] RAX: 0000000000000049 RBX: ff2fd43447182000 RCX: > 0000000000000003 > [ 15.606343] RDX: 0000000000000000 RSI: 0000000000000003 RDI: > 0000000000000001 > [ 15.606343] RBP: ffffffff9525dd40 R08: 0000000000000000 R09: > ff345c4e803fb9f0 > [ 15.606343] R10: ffffffff946b41e8 R11: 0000000000000003 R12: > ff2fd43447182000 > [ 15.606343] R13: ff2fd43447182000 R14: ff2fd43469336c00 R15: > ff2fd43447182000 > [ 15.606343] FS: 00007fe05a5e9740(0000) GS:ff2fd4347ce00000(0000) > knlGS:0000000000000000 > [ 15.606343] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 15.606343] CR2: 0000559addf39db0 CR3: 000000002836e000 CR4: > 0000000000751ef0 > [ 15.606343] PKRU: 55555554 > [ 15.606343] Call Trace: > [ 15.606343] <TASK> > [ 15.606343] ? __die_body.cold+0x19/0x27 > [ 15.606343] ? die+0x2e/0x50 > [ 15.606343] ? do_trap+0xca/0x110 > [ 15.606343] ? do_error_trap+0x6a/0x90 > [ 15.606343] ? nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd] > [ 15.606343] ? exc_invalid_op+0x50/0x70 > [ 15.606343] ? nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd] > [ 15.606343] ? asm_exc_invalid_op+0x1a/0x20 > [ 15.606343] ? nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd] > [ 15.606343] nfsd4_client_tracking_init+0x57/0x1b0 [nfsd] > [ 15.606343] nfs4_state_start_net+0x2f9/0x3a0 [nfsd] > [ 15.606343] nfsd_svc+0x1b9/0x340 [nfsd] > [ 15.606343] write_threads+0xfc/0x1c0 [nfsd] > [ 15.606343] ? __pfx_write_threads+0x10/0x10 [nfsd] > [ 15.606343] nfsctl_transaction_write+0x4d/0x80 [nfsd] > [ 15.606343] vfs_write+0xfe/0x460 > [ 15.606343] ksys_write+0x6d/0xf0 > [ 15.606343] do_syscall_64+0x82/0x190 > [ 15.606343] ? syscall_exit_to_user_mode+0x4d/0x210 > [ 15.606343] ? do_syscall_64+0x8e/0x190 > [ 15.606343] ? __x64_sys_getdents64+0xfa/0x130 > [ 15.606343] ? __pfx_filldir64+0x10/0x10 > [ 15.606343] ? syscall_exit_to_user_mode+0x4d/0x210 > [ 15.606343] ? do_syscall_64+0x8e/0x190 > [ 15.606343] ? __count_memcg_events+0x58/0xf0 > [ 15.606343] ? count_memcg_events.constprop.0+0x1a/0x30 > [ 15.606343] ? handle_mm_fault+0x1bb/0x2c0 > [ 15.606343] ? do_user_addr_fault+0x36c/0x620 > [ 15.606343] ? exc_page_fault+0x7e/0x180 > [ 15.606343] entry_SYSCALL_64_after_hwframe+0x76/0x7e > [ 15.606343] RIP: 0033:0x7fe05a6f0210 > [ 15.606343] Code: 2c 0e 00 64 c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f > 1f 84 00 00 00 00 00 80 3d 59 ae 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d > 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89 > [ 15.606343] RSP: 002b:00007fff649d2b08 EFLAGS: 00000202 ORIG_RAX: > 0000000000000001 > [ 15.606343] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: > 00007fe05a6f0210 > [ 15.606343] RDX: 0000000000000002 RSI: 000056540dbbb340 RDI: > 0000000000000003 > [ 15.606343] RBP: 000056540dbbb340 R08: 0000000000000064 R09: > 00000000ffffffff > [ 15.606343] R10: 0000000000000000 R11: 0000000000000202 R12: > 0000000000020000 > [ 15.606343] R13: 000056540dbb7116 R14: 000056543353a2a0 R15: > 0000000000000000 > [ 15.606343] </TASK> > [ 15.606343] Modules linked in: nfsd auth_rpcgss nfs_acl lockd grace ext4 > crc16 mbcache jbd2 crc32c_generic sd_mod ahci libahci libata virtio_scsi > scsi_mod crc32_pclmul crc32c_intel scsi_common virtio_net net_failover > failover i6300esb watchdog sunrpc qemu_fw_cfg virtio_rng autofs4 > [ 15.618032] ---[ end trace 0000000000000000 ]--- > [ 15.618166] RIP: 0010:nfsd4_legacy_tracking_init+0x17d/0x1b0 [nfsd] > [ 15.618718] Code: 19 48 89 de 48 c7 c7 10 90 9c c0 e8 6d fb ff ff 89 c5 85 > c0 0f 85 30 60 00 00 48 c7 c7 c0 af a3 c0 31 ed e8 25 b0 ca d2 eb 07 <0f> 0b > bd f4 ff ff ff 48 8b 44 24 08 65 48 2b 04 25 28 00 00 00 75 > [ 15.619086] RSP: 0018:ff345c4e803fbb60 EFLAGS: 00010286 > [ 15.619198] RAX: 0000000000000049 RBX: ff2fd43447182000 RCX: > 0000000000000003 > [ 15.619336] RDX: 0000000000000000 RSI: 0000000000000003 RDI: > 0000000000000001 > [ 15.619472] RBP: ffffffff9525dd40 R08: 0000000000000000 R09: > ff345c4e803fb9f0 > [ 15.619609] R10: ffffffff946b41e8 R11: 0000000000000003 R12: > ff2fd43447182000 > [ 15.619746] R13: ff2fd43447182000 R14: ff2fd43469336c00 R15: > ff2fd43447182000 > [ 15.619888] FS: 00007fe05a5e9740(0000) GS:ff2fd4347ce00000(0000) > knlGS:0000000000000000 > [ 15.620045] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 15.620158] CR2: 0000559addf39db0 CR3: 000000002836e000 CR4: > 0000000000751ef0 > [ 15.620296] PKRU: 55555554 > [ 15.620469] Kernel panic - not syncing: Fatal exception > [ 15.621342] Kernel Offset: 0x11a00000 from 0xffffffff81000000 (relocation > range: 0xffffffff80000000-0xffffffffbfffffff) > ``` > > This crash is 100% reproducible and I can easily test different kernels. > The TEST-60-NFS works fine on Ubuntu oracular. > linux-image-6.12-rc6-amd64 6.12~rc6-1~exp1 from experimental is affected > as well. Just to be clear, is this something you freshly hit with those version or was the problem present before? If you have a last good version, would you be able to bisect the changes to identify the culprit introducing the issue? I have so far not found an already known regression report specific to this recently but there is a report back in august we found as https://lore.kernel.org/all/23faefd973c63f9b0ec8a735acb1ff1409776163.ca...@linuxfoundation.org/ In any case since you can reliably reproduce the issue, can you please report it to upstream (linux-nfs list and relevant maintainers)? Regards, Salvatore