No worries Guilherme, that explains it. I've now built my own kernel from the Ubuntu-5.4.0-38.42 Git tag, which I've verified includes the fix. I'm running our workload again with this kernel and will know within a few hours whether it's looking good. The job as a whole is going to take the full weekend to finish. If it works fine, we can run on this custom kernel until 5.4.0-40 is out around next week.
I'm both testing, but also trying to get actual work done here (get our job run to run to completion). Your fast support here was much appreciated. -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1885010 Title: NULL deference in nfs4_get_valid_delegation Status in linux package in Ubuntu: Fix Released Status in linux source package in Eoan: Fix Committed Status in linux source package in Focal: Fix Committed Bug description: We are getting the following on an NFSv4 client running focal (kernel 5.4.0-33.37): [296787.347971] BUG: unable to handle page fault for address: ffffffffffffffb0 [296787.350255] #PF: supervisor read access in kernel mode [296787.352315] #PF: error_code(0x0000) - not-present page [296787.354137] PGD 15bf00e067 P4D 15bf00e067 PUD 15bf010067 PMD 0 [296787.355798] Oops: 0000 [#2] SMP NOPTI [296787.357271] CPU: 49 PID: 605315 Comm: kworker/u131:3 Tainted: P D OE 5.4.0-33-generic #37-Ubuntu [296787.358756] Hardware name: GIGABYTE G291-Z20-00/MZ21-G20-00, BIOS F06 10/04/2019 [296787.360274] Workqueue: rpciod rpc_async_schedule [sunrpc] [296787.361790] RIP: 0010:nfs4_get_valid_delegation+0xd/0x30 [nfsv4] [296787.363281] Code: 89 ef e8 06 c0 f9 ff e9 ec fd ff ff 90 0f 1f 44 00 00 55 48 89 e5 f0 80 4f 48 08 5d c3 0f 1f 44 00 00 55 31 f6 48 89 e5 41 54 <4c> 8b 67 b0 4c 89 e7 e8 07 f9 ff ff 84 c0 b8 00 00 00 00 4c 0f 44 [296787.366780] RSP: 0018:ffffb7b1634a7d98 EFLAGS: 00010246 [296787.368740] RAX: ffff9ef2958e9b00 RBX: ffff9ef59f910000 RCX: 0000000000000000 [296787.370648] RDX: 0000000000008000 RSI: 0000000000000000 RDI: 0000000000000000 [296787.372559] RBP: ffffb7b1634a7da0 R08: 0000000000000000 R09: 8080808080808080 [296787.374441] R10: ffff9ef731e9d26c R11: 0000000000000018 R12: ffff9ef781f22600 [296787.376330] R13: 0000000000000000 R14: ffff9efe1db4bc00 R15: ffffffffc0cc2950 [296787.378220] FS: 0000000000000000(0000) GS:ffff9ef78fc40000(0000) knlGS:0000000000000000 [296787.380165] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [296787.382076] CR2: ffffffffffffffb0 CR3: 0000001a2799c000 CR4: 00000000003406e0 [296787.384031] Call Trace: [296787.385985] nfs4_open_prepare+0x89/0x1e0 [nfsv4] [296787.387973] rpc_prepare_task+0x1f/0x30 [sunrpc] [296787.389971] __rpc_execute+0x8c/0x3a0 [sunrpc] [296787.391903] rpc_async_schedule+0x30/0x50 [sunrpc] [296787.393787] process_one_work+0x1eb/0x3b0 [296787.395617] worker_thread+0x4d/0x400 [296787.397431] kthread+0x104/0x140 [296787.399166] ? process_one_work+0x3b0/0x3b0 [296787.400868] ? kthread_park+0x90/0x90 [296787.402518] ret_from_fork+0x1f/0x40 [296787.404158] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 nfs lockd grace sunrpc xt_nat xt_tcpudp veth xt_conntrack xt_MASQUERADE nf_conntrack_netlink nfnetlink xfrm_user xt_addrtype iptable_filter iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c bpfilter br_netfilter bridge stp llc aufs overlay md4 cmac nls_utf8 cifs libarc4 fscache libdes binfmt_misc snd_hda_codec_hdmi amd64_edac_mod edac_mce_amd ipmi_ssif nls_iso8859_1 kvm_amd kvm snd_hda_intel snd_intel_dspcfg snd_hda_codec snd_seq_midi snd_seq_midi_event snd_hda_core snd_rawmidi snd_hwdep snd_pcm snd_seq snd_seq_device snd_timer ucsi_ccg snd typec_ucsi typec soundcore k10temp ccp ipmi_si mac_hid nvidia_uvm(OE) sch_fq_codel parport_pc ppdev lp parport ip_tables x_tables autofs4 mlx5_ib nvidia_drm(POE) nvidia_modeset(POE) ib_uverbs ib_core nvidia(POE) crct10dif_pclmul crc32_pclmul ghash_clmulni_intel ast drm_vram_helper aesni_intel i2c_algo_bit crypto_simd ixgbe cryptd ttm glue_helper xfrm_algo [296787.404232] drm_kms_helper mlx5_core dca mdio syscopyarea sysfillrect sysimgblt fb_sys_fops nvme pci_hyperv_intf drm tls nvme_core mlxfw ahci ipmi_devintf i2c_piix4 libahci ipmi_msghandler i2c_nvidia_gpu [296787.421858] CR2: ffffffffffffffb0 [296787.423680] ---[ end trace 2cf3edda87955a36 ]--- [296787.425547] RIP: 0010:nfs4_get_valid_delegation+0xd/0x30 [nfsv4] [296787.427389] Code: 89 ef e8 06 c0 f9 ff e9 ec fd ff ff 90 0f 1f 44 00 00 55 48 89 e5 f0 80 4f 48 08 5d c3 0f 1f 44 00 00 55 31 f6 48 89 e5 41 54 <4c> 8b 67 b0 4c 89 e7 e8 07 f9 ff ff 84 c0 b8 00 00 00 00 4c 0f 44 [296787.431172] RSP: 0018:ffffb7b1615e3d98 EFLAGS: 00010246 [296787.433050] RAX: ffff9ee9faf45ec0 RBX: ffff9ef16c5dd000 RCX: 0000000000000000 [296787.434922] RDX: 0000000000008000 RSI: 0000000000000000 RDI: 0000000000000000 [296787.436810] RBP: ffffb7b1615e3da0 R08: 0000000000000000 R09: 8080808080808080 [296787.438673] R10: ffff9ef26a0b8c6c R11: 0000000000000018 R12: ffff9ef7817cfa00 [296787.440539] R13: 0000000000000004 R14: ffff9ef8bdeb0400 R15: ffffffffc0cc2950 [296787.442400] FS: 0000000000000000(0000) GS:ffff9ef78fc40000(0000) knlGS:0000000000000000 [296787.444289] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [296787.446126] CR2: ffffffffffffffb0 CR3: 0000001a2799c000 CR4: 00000000003406e0 The problem is a known issue which has been fixed upstream: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=29fe839976266bc7c55b927360a1daae57477723 The patch is a simple 2 line fix. Would be great if you could do an SRU and add that upstream patch. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1885010/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp