This is the trace up to the LBUG [5533797.889690] Lustre: Skipped 341 previous similar messages [5533958.749284] LustreError: 105499:0:(tgt_grant.c:571:tgt_grant_incoming()) lustre19-OST002c: cli 901dcd33-cf45-dad4-a0c7-89b9a1fb91b6/ffff99656aa5a800 dirty 0 pend 0 grant -1310720 [5533958.754365] LustreError: 105499:0:(tgt_grant.c:573:tgt_grant_incoming()) LBUG [5533958.756929] Pid: 105499, comm: ll_ost_io01_071 3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Tue Apr 30 22:18:15 UTC 2019 [5533958.756931] Call Trace: [5533958.756948] [<ffffffffc0bf57cc>] libcfs_call_trace+0x8c/0xc0 [libcfs]
________________________________ From: Alex Zarochentsev <[email protected]> Sent: Wednesday, July 15, 2020 11:20 AM To: Kurt Strosahl <[email protected]> Cc: [email protected] <[email protected]> Subject: [EXTERNAL] Re: [lustre-discuss] oss servers crashing Hello! On Wed, Jul 15, 2020 at 5:28 PM Kurt Strosahl <[email protected]<mailto:[email protected]>> wrote: Good Morning, Yesterday one of our lustre file servers rebooted several times. the crash dump showed: can you please provide a failed lustre assert message just above the kernel panic message ? Thanks, Zam. [14333982.153989] Pid: 381367, comm: ll_ost_io01_076 3.10.0-957.10.1.el7_lustre.x86_64 #1 SMP Tue Apr 30 22:18:15 UTC 2019 [14333982.153989] Kernel panic - not syncing: LBUG [14333982.153990] Call Trace: [14333982.153993] CPU: 4 PID: 380760 Comm: ll_ost_io01_072 Kdump: loaded Tainted: P OE ------------ 3.10.0-957.10.1.el7_lustre.x86_64 #1 [14333982.153994] Hardware name: Supermicro Super Server/X11DPL-i, BIOS 3.1 05/21/2019 [14333982.153995] Call Trace: [14333982.154002] [<ffffffffbaf62e41>] dump_stack+0x19/0x1b [14333982.154006] [<ffffffffbaf5c550>] panic+0xe8/0x21f [14333982.154018] [<ffffffffc0ab87cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] [14333982.154026] [<ffffffffc0ab88cb>] lbug_with_loc+0x9b/0xa0 [libcfs] [14333982.154036] [<ffffffffc0ab887c>] lbug_with_loc+0x4c/0xa0 [libcfs] [14333982.154096] [<ffffffffc12dfae0>] tgt_grant_incoming.isra.6+0x570/0x570 [ptlrpc] [14333982.154174] [<ffffffffc12dfae0>] tgt_grant_prepare_read+0x0/0x3b0 [ptlrpc] [14333982.154232] [<ffffffffc12dfbeb>] tgt_grant_prepare_read+0x10b/0x3b0 [ptlrpc] [14333982.154297] [<ffffffffc12dfbeb>] tgt_grant_prepare_read+0x10b/0x3b0 [ptlrpc] [14333982.154306] [<ffffffffc15e1ad0>] ofd_preprw+0x450/0x1160 [ofd] lustre versions: lustre-resource-agents-2.12.1-1.el7.x86_64 lustre-2.12.1-1.el7.x86_64 kernel-devel-3.10.0-957.10.1.el7_lustre.x86_64 lustre-osd-zfs-mount-2.12.1-1.el7.x86_64 kernel-headers-3.10.0-957.10.1.el7_lustre.x86_64 kernel-3.10.0-957.10.1.el7_lustre.x86_64 lustre-zfs-dkms-2.12.1-1.el7.noarch Could this be: https://jira.whamcloud.com/browse/LU-12120<https://urldefense.proofpoint.com/v2/url?u=https-3A__jira.whamcloud.com_browse_LU-2D12120&d=DwMFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=a1-ymUluZsecMceDMlAHsomwMJl4Iqg-UcfvwQZVldk&m=YJT9Uk1-l_VkuqWZ8LzbYCOgQgIB7NodKUdU04ZH2I8&s=UUumm95pXNO2HROIgZLcbfcrDYD98rOqJY3diW7U1i4&e=> w/r, Kurt J. Strosahl System Administrator: Lustre, HPC Scientific Computing Group, Thomas Jefferson National Accelerator Facility _______________________________________________ lustre-discuss mailing list [email protected]<mailto:[email protected]> http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org<https://urldefense.proofpoint.com/v2/url?u=http-3A__lists.lustre.org_listinfo.cgi_lustre-2Ddiscuss-2Dlustre.org&d=DwMFaQ&c=CJqEzB1piLOyyvZjb8YUQw&r=a1-ymUluZsecMceDMlAHsomwMJl4Iqg-UcfvwQZVldk&m=YJT9Uk1-l_VkuqWZ8LzbYCOgQgIB7NodKUdU04ZH2I8&s=ImWiPMyWLoKXcmVRovEfUKFf5zp_d9wHSg1UfCKnzCU&e=>
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
