Control: tags -1 + moreinfo Hello Arne,
On Tue, Jul 12, 2022 at 08:14:22AM +0200, Arne Nordmark wrote: > > Package: src:linux > Version: 5.10.127-1 > Severity: normal > > Dear Maintainer, > > The new kernel in Debian 11.4 seems unstable and crashes when serving NFS. > On two different computers, these lockups happens within minutes, typically > when a client runs firefox on an NFS-mounted home directory. Typically the > servers lock up without any printout, but on one occasion, the following was > logged: > > jul 10 08:35:13 ano4 kernel: general protection fault, probably for > non-canonical address 0x2f48514544455145: 0000 [#1] SMP PTI > jul 10 08:35:13 ano4 kernel: CPU: 2 PID: 1244 Comm: nfsd Not tainted > 5.10.0-16-amd64 #1 Debian 5.10.127-1 > jul 10 08:35:13 ano4 kernel: Hardware name: System manufacturer System > Product Name/P5Q DELUXE, BIOS 2201 05/21/2009 > jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570 > jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 01 48 > 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 f1 41 85 > c1 74 4f <49> 8b 3f 48 8b 07 48 85 c0 0f 84 0a 01 00 00 48 8d 7c 24 38 44 89 > jul 10 08:35:13 ano4 kernel: RSP: 0018:ffffabe901fa3bc8 EFLAGS: 00010202 > jul 10 08:35:13 ano4 kernel: RAX: 00000000bab6aebe RBX: 0000000000000001 > RCX: 0000000000000004 > jul 10 08:35:13 ano4 kernel: RDX: 0000000000035a00 RSI: 0000000000000001 > RDI: 2f48514544455145 > jul 10 08:35:13 ano4 kernel: RBP: ffffabe901fa3c20 R08: 0000000000000001 > R09: 0000000000000002 > jul 10 08:35:13 ano4 kernel: R10: 0000000000000002 R11: 0000000000000002 > R12: 0000000000000002 > jul 10 08:35:13 ano4 kernel: R13: 0000000045495141 R14: 00000000424d6757 > R15: 2f48514544455145 > jul 10 08:35:13 ano4 kernel: FS: 0000000000000000(0000) > GS:ffff939527d00000(0000) knlGS:0000000000000000 > jul 10 08:35:13 ano4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > jul 10 08:35:13 ano4 kernel: CR2: 0000560b8cee4000 CR3: 00000001034da000 > CR4: 00000000000406e0 > jul 10 08:35:13 ano4 kernel: Call Trace: > jul 10 08:35:13 ano4 kernel: __fsnotify_parent+0xe7/0x2d0 > jul 10 08:35:13 ano4 kernel: ? ext4_buffered_write_iter+0xce/0x160 [ext4] > jul 10 08:35:13 ano4 kernel: ? do_iter_readv_writev+0x152/0x1b0 > jul 10 08:35:13 ano4 kernel: do_iter_write+0xc8/0x1b0 > jul 10 08:35:13 ano4 kernel: nfsd_vfs_write+0x175/0x510 [nfsd] > jul 10 08:35:13 ano4 kernel: nfsd4_write+0x135/0x1b0 [nfsd] > jul 10 08:35:13 ano4 kernel: nfsd4_proc_compound+0x40d/0x680 [nfsd] > jul 10 08:35:13 ano4 kernel: nfsd_dispatch+0xd3/0x180 [nfsd] > jul 10 08:35:13 ano4 kernel: svc_process_common+0x3d4/0x6d0 [sunrpc] > jul 10 08:35:13 ano4 kernel: ? nfsd_svc+0x320/0x320 [nfsd] > jul 10 08:35:13 ano4 kernel: svc_process+0xb7/0xf0 [sunrpc] > jul 10 08:35:13 ano4 kernel: nfsd+0xe8/0x140 [nfsd] > jul 10 08:35:13 ano4 kernel: ? nfsd_destroy+0x60/0x60 [nfsd] > jul 10 08:35:13 ano4 kernel: kthread+0x11b/0x140 > jul 10 08:35:13 ano4 kernel: ? __kthread_bind_mask+0x60/0x60 > jul 10 08:35:13 ano4 kernel: ret_from_fork+0x22/0x30 > jul 10 08:35:13 ano4 kernel: Modules linked in: dm_snapshot dm_bufio tun > cpufreq_ondemand cpufreq_powersave cpufreq_conservative cpufreq_userspace > aes_generic libaes crypto_simd cryptd glue_helper cbc cts rpcsec_gss_krb5 > sit tunnel4 ip_tunnel nft_nat sch_fq_codel rc_pinnacl > e_pctv_hd em28xx_rc rc_core si2157 si2168 i2c_mux em28xx_dvb dvb_core > snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio ivtv_alsa > tuner_simple tuner_types snd_hda_codec_hdmi wm8775 snd_hda_intel tda9887 > tda8290 snd_intel_dspcfg tea5767 soundwire_intel tuner > soundwire_generic_allocation snd_soc_core snd > _compress soundwire_cadence cx25840 snd_hda_codec ivtv snd_hda_core > snd_hwdep soundwire_bus em28xx kvm_intel radeon tveeprom snd_pcm cx2341x kvm > ttm videodev snd_timer snd irqbypass soundcore drm_kms_helper mc serio_raw > evdev cec i2c_algo_bit iTCO_wdt intel_pmc_bxt iTCO_vendor_support pcspkr > watchdog sg acpi_ > cpufreq asus_atk0110 button nft_chain_nat nf_nat nft_reject_inet > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_counter nft_ct > jul 10 08:35:13 ano4 kernel: nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 > coretemp firewire_sbp2 nf_tables nfnetlink loop nfsd parport_pc ppdev > nfs_acl lockd lp auth_rpcgss parport grace drm fuse sunrpc configfs > ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid10 raid4 > 56 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq > libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 md_mod sd_mod > hid_generic t10_pi ata_generic crc_t10dif crct10dif_generic st > crct10dif_common usbhid pata_marvell hid ahci libahci mpt3sas firewire_ohci > firewire_core aic7xxx > crc_itu_t libata skge ehci_pci uhci_hcd scsi_transport_spi lpc_ich i2c_i801 > sky2 ehci_hcd psmouse i2c_smbus raid_class scsi_transport_sas usbcore > scsi_mod usb_common floppy > jul 10 08:35:13 ano4 kernel: ---[ end trace 159cb95f57d30ea4 ]--- > jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570 > jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 01 48 > 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 f1 41 85 > c1 74 4f <49> 8b 3f 48 8b 07 48 85 c0 0f 84 0a 01 00 00 48 8d 7c 24 38 44 89 > jul 10 08:35:13 ano4 kernel: RSP: 0018:ffffabe901fa3bc8 EFLAGS: 00010202 > jul 10 08:35:13 ano4 kernel: RAX: 00000000bab6aebe RBX: 0000000000000001 > RCX: 0000000000000004 > jul 10 08:35:13 ano4 kernel: RDX: 0000000000035a00 RSI: 0000000000000001 > RDI: 2f48514544455145 > jul 10 08:35:13 ano4 kernel: RBP: ffffabe901fa3c20 R08: 0000000000000001 > R09: 0000000000000002 > jul 10 08:35:13 ano4 kernel: R10: 0000000000000002 R11: 0000000000000002 > R12: 0000000000000002 > jul 10 08:35:13 ano4 kernel: R13: 0000000045495141 R14: 00000000424d6757 > R15: 2f48514544455145 > jul 10 08:35:13 ano4 kernel: FS: 0000000000000000(0000) > GS:ffff939527d00000(0000) knlGS:0000000000000000 > jul 10 08:35:13 ano4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > jul 10 08:35:13 ano4 kernel: CR2: 0000560b8cee4000 CR3: 00000001034da000 > CR4: 00000000000406e0 > jul 10 08:35:13 ano4 kernel: list_del corruption. next->prev should be > ffff939408b1d6a0, but was 4141514142455142 > jul 10 08:35:13 ano4 kernel: ------------[ cut here ]------------ > jul 10 08:35:13 ano4 kernel: kernel BUG at lib/list_debug.c:54! > jul 10 08:35:13 ano4 kernel: invalid opcode: 0000 [#2] SMP PTI > jul 10 08:35:13 ano4 kernel: CPU: 2 PID: 1242 Comm: nfsd Tainted: G D > 5.10.0-16-amd64 #1 Debian 5.10.127-1 > jul 10 08:35:13 ano4 kernel: Hardware name: System manufacturer System > Product Name/P5Q DELUXE, BIOS 2201 05/21/2009 > jul 10 08:35:13 ano4 kernel: RIP: 0010:__list_del_entry_valid.cold+0x1d/0x47 > jul 10 08:35:13 ano4 kernel: Code: c7 c7 b8 1e d2 8e e8 1a 14 ff ff 0f 0b 48 > 89 fe 48 c7 c7 48 1f d2 8e e8 09 14 ff ff 0f 0b 48 c7 c7 f8 1f d2 8e e8 fb > 13 ff ff <0f> 0b 48 89 f2 48 89 fe 48 c7 c7 b8 1f d2 8e e8 e7 13 ff ff 0f 0b > jul 10 08:35:13 ano4 kernel: RSP: 0018:ffffabe901f93cf8 EFLAGS: 00010246 > jul 10 08:35:13 ano4 kernel: RAX: 0000000000000054 RBX: ffff93940f10b800 > RCX: 0000000000000000 > jul 10 08:35:13 ano4 kernel: RDX: 0000000000000000 RSI: ffff939527d1ca00 > RDI: ffff939527d1ca00 > jul 10 08:35:13 ano4 kernel: RBP: ffff939408b1d690 R08: 0000000000000000 > R09: ffffabe901f93b20 > jul 10 08:35:13 ano4 kernel: R10: ffffabe901f93b18 R11: ffffffff8f2cb448 > R12: ffff939408b1d6b0 > jul 10 08:35:13 ano4 kernel: R13: ffff939408b1d6a0 R14: dead000000000100 > R15: 0000000000000000 > jul 10 08:35:13 ano4 kernel: FS: 0000000000000000(0000) > GS:ffff939527d00000(0000) knlGS:0000000000000000 > jul 10 08:35:13 ano4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > jul 10 08:35:13 ano4 kernel: CR2: 0000560b8cee4000 CR3: 00000001034da000 > CR4: 00000000000406e0 > jul 10 08:35:13 ano4 kernel: Call Trace: > jul 10 08:35:13 ano4 kernel: fsnotify_detach_mark+0x44/0x90 > jul 10 08:35:13 ano4 kernel: fsnotify_destroy_mark+0x1f/0x40 > jul 10 08:35:13 ano4 kernel: nfsd_file_free+0xb7/0xe0 [nfsd] > jul 10 08:35:13 ano4 kernel: nfsd_file_close_inode_sync+0xfb/0x150 [nfsd] > jul 10 08:35:13 ano4 kernel: nfsd_unlink+0x244/0x250 [nfsd] > jul 10 08:35:13 ano4 kernel: nfsd4_remove+0x4c/0x130 [nfsd] > jul 10 08:35:13 ano4 kernel: nfsd4_proc_compound+0x40d/0x680 [nfsd] > jul 10 08:35:13 ano4 kernel: nfsd_dispatch+0xd3/0x180 [nfsd] > jul 10 08:35:13 ano4 kernel: svc_process_common+0x3d4/0x6d0 [sunrpc] > jul 10 08:35:13 ano4 kernel: ? nfsd_svc+0x320/0x320 [nfsd] > jul 10 08:35:13 ano4 kernel: svc_process+0xb7/0xf0 [sunrpc] > jul 10 08:35:13 ano4 kernel: nfsd+0xe8/0x140 [nfsd] > jul 10 08:35:13 ano4 kernel: ? nfsd_destroy+0x60/0x60 [nfsd] > jul 10 08:35:13 ano4 kernel: kthread+0x11b/0x140 > jul 10 08:35:13 ano4 kernel: ? __kthread_bind_mask+0x60/0x60 > jul 10 08:35:13 ano4 kernel: ret_from_fork+0x22/0x30 > jul 10 08:35:13 ano4 kernel: Modules linked in: dm_snapshot dm_bufio tun > cpufreq_ondemand cpufreq_powersave cpufreq_conservative cpufreq_userspace > aes_generic libaes crypto_simd cryptd glue_helper cbc cts rpcsec_gss_krb5 > sit tunnel4 ip_tunnel nft_nat sch_fq_codel rc_pinnacl > e_pctv_hd em28xx_rc rc_core si2157 si2168 i2c_mux em28xx_dvb dvb_core > snd_hda_codec_analog snd_hda_codec_generic ledtrig_audio ivtv_alsa > tuner_simple tuner_types snd_hda_codec_hdmi wm8775 snd_hda_intel tda9887 > tda8290 snd_intel_dspcfg tea5767 soundwire_intel tuner > soundwire_generic_allocation snd_soc_core snd > _compress soundwire_cadence cx25840 snd_hda_codec ivtv snd_hda_core > snd_hwdep soundwire_bus em28xx kvm_intel radeon tveeprom snd_pcm cx2341x kvm > ttm videodev snd_timer snd irqbypass soundcore drm_kms_helper mc serio_raw > evdev cec i2c_algo_bit iTCO_wdt intel_pmc_bxt iTCO_vendor_support pcspkr > watchdog sg acpi_ > cpufreq asus_atk0110 button nft_chain_nat nf_nat nft_reject_inet > nf_reject_ipv4 nf_reject_ipv6 nft_reject nft_counter nft_ct > jul 10 08:35:13 ano4 kernel: nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 > coretemp firewire_sbp2 nf_tables nfnetlink loop nfsd parport_pc ppdev > nfs_acl lockd lp auth_rpcgss parport grace drm fuse sunrpc configfs > ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 raid10 raid4 > 56 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq > libcrc32c crc32c_generic raid0 multipath linear dm_mod raid1 md_mod sd_mod > hid_generic t10_pi ata_generic crc_t10dif crct10dif_generic st > crct10dif_common usbhid pata_marvell hid ahci libahci mpt3sas firewire_ohci > firewire_core aic7xxx > crc_itu_t libata skge ehci_pci uhci_hcd scsi_transport_spi lpc_ich i2c_i801 > sky2 ehci_hcd psmouse i2c_smbus raid_class scsi_transport_sas usbcore > scsi_mod usb_common floppy > jul 10 08:35:13 ano4 kernel: ---[ end trace 159cb95f57d30ea5 ]--- > jul 10 08:35:13 ano4 kernel: RIP: 0010:fsnotify+0x2d9/0x570 > jul 10 08:35:13 ano4 kernel: Code: 78 08 44 0b 30 44 0b 68 40 48 83 c1 01 48 > 83 f9 04 75 d9 66 66 66 66 90 44 8b 4c 24 1c 44 89 e8 f7 d0 45 21 f1 41 85 > c1 74 4f <49> 8b 3f 48 8b 07 48 85 c0 0f 84 0a 01 00 00 48 8d 7c 24 38 44 89 > jul 10 08:35:13 ano4 kernel: RSP: 0018:ffffabe901fa3bc8 EFLAGS: 00010202 > jul 10 08:35:13 ano4 kernel: RAX: 00000000bab6aebe RBX: 0000000000000001 > RCX: 0000000000000004 > jul 10 08:35:13 ano4 kernel: RDX: 0000000000035a00 RSI: 0000000000000001 > RDI: 2f48514544455145 > jul 10 08:35:13 ano4 kernel: RBP: ffffabe901fa3c20 R08: 0000000000000001 > R09: 0000000000000002 > jul 10 08:35:13 ano4 kernel: R10: 0000000000000002 R11: 0000000000000002 > R12: 0000000000000002 > jul 10 08:35:13 ano4 kernel: R13: 0000000045495141 R14: 00000000424d6757 > R15: 2f48514544455145 > jul 10 08:35:13 ano4 kernel: FS: 0000000000000000(0000) > GS:ffff939527d00000(0000) knlGS:0000000000000000 > jul 10 08:35:13 ano4 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: > 0000000080050033 > jul 10 08:35:13 ano4 kernel: CR2: 0000560b8cee4000 CR3: 00000001034da000 > CR4: 00000000000406e0 > jul 10 08:35:21 ano4 kernel: general protection fault, probably for > non-canonical address 0xb1c8a36300fbcf32: 0000 [#3] SMP PTI > jul 10 08:35:21 ano4 kernel: CPU: 1 PID: 1239 Comm: nfsd Tainted: G D > 5.10.0-16-amd64 #1 Debian 5.10.127-1 > jul 10 08:35:21 ano4 kernel: Hardware name: System manufacturer System > Product Name/P5Q DELUXE, BIOS 2201 05/21/2009 > jul 10 08:35:21 ano4 kernel: RIP: 0010:kmem_cache_alloc+0x89/0x1f0 > jul 10 08:35:21 ano4 kernel: Code: 1e 18 72 49 8b 00 49 83 78 10 00 48 89 04 > 24 0f 84 42 01 00 00 48 85 c0 0f 84 39 01 00 00 41 8b 4c 24 28 49 8b 3c 24 > 48 01 c1 <48> 8b 19 48 89 ce 49 33 9c 24 b8 00 00 00 48 8d 4a 01 48 0f ce 48 > jul 10 08:35:21 ano4 kernel: RSP: 0018:ffffabe900f3fd50 EFLAGS: 00010282 > jul 10 08:35:21 ano4 kernel: RAX: b1c8a36300fbcee2 RBX: ffff939403b58070 > RCX: b1c8a36300fbcf32 > > After reverting to boot the servers on kernel linux-image-5.10.0-15-amd64 > 5.10.120-1 (but still using linux-image-5.10.0-16-amd64 on the clients) the > servers are stable again. > > From client mount output: type nfs4 > (rw,nosuid,nodev,relatime,vers=4.2,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp6,timeo=600,retrans=2,sec=krb5p,local_lock=none As you seem to reliably reproduce the issue, do you have the possiblity (on the nonproduction instance) to try to bisect down the problem? Additionally to the bisect, on a testinstance were the issue is reproducible, can you run a selfcompiled 5.10.130 upstream to see if the problem is still present? Regards, Salvatore