Thank you for your help. I've created an issue in the comminity JIRA for this: LU-13168.
Kind Regards, Christopher. On Mon, Jan 20, 2020 at 05:22:58PM +0000, Peter Jones wrote: > Christopher > > Apologies for the confusing message about requesting an account for JIRA - > I'll see if we can remove that message but I think that it might be > system-generated. We've had to disable self-registration because of repeated > hacking attempts via that mechanism. The message on the left "For questions > or login request, send email to Jira administrators" works - the link there > sends an email to [email protected] and several requests come through per > week via that channel - but I can see why the message on the right would draw > your eye... > > Peter > > On 2020-01-20, 8:15 AM, "lustre-discuss on behalf of Christopher Mountford" > <[email protected] on behalf of [email protected]> > wrote: > > We've seen 3 lustre client panics in the last few hours when using the > b2_12 branch (we're using it on client nodes as it patches a data on MDT bug > in 2.12.3. Still using 2.12.3 on MDS/OSS). This looks similar similar to > LU-12581, which we had seen on our system before but was fixed in 2.12.3. > Could this have been re-introduced in the b2_12 branch? > > I've included the dmesg from one of the panics below. Unfortunately we > have not yet found a way to reproduce the problem. Has anyone seen anything > similar to this? > > Is this mailing list a suitable place to ask for help on this sort of > bug? I've been looking at the Whamcloud Community Jira, but the link to > request an account returns "Your Jira administrator has not yet configured > this contact form." > > dmesg from failed client: > > [542909.741793] > ============================================================================= > [542909.741800] BUG kmalloc-8 (Tainted: G OE ------------ ): > Freechain corrupt > [542909.741802] > ----------------------------------------------------------------------------- > > [542909.741805] Disabling lock debugging due to kernel taint > [542909.741809] INFO: Slab 0xffffe0933440b3c0 objects=102 used=75 > fp=0xffff9bb6902cf558 flags=0x6fffff00000081 > [542909.741812] INFO: Object 0xffff9bb6902cfad0 @offset=2768 > fp=0x7fff9bb6902cfdf0 > > [542909.741816] Redzone ffff9bb6902cfac8: bb 3b 3b 3b 3b bb bb bb > .;;;;... > [542909.741818] Object ffff9bb6902cfad0: 6b 6b 6b 6b 6b 6b 6b a5 > kkkkkkk. > [542909.741821] Redzone ffff9bb6902cfad8: bb bb bb 3b bb bb bb bb > ...;.... > [542909.741823] Padding ffff9bb6902cfae8: 5a 5a 5a 5a 5a 5a 5a 5a > ZZZZZZZZ > [542909.741828] CPU: 25 PID: 50461 Comm: pool Kdump: loaded Tainted: G > B OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1 > [542909.741830] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 > 10/21/2019 > [542909.741832] Call Trace: > [542909.741846] [<ffffffffa277ac23>] dump_stack+0x19/0x1b > [542909.741852] [<ffffffffa2221561>] print_trailer+0x161/0x280 > [542909.741856] [<ffffffffa2221ebf>] on_freelist+0xff/0x270 > [542909.741860] [<ffffffffa27774cc>] free_debug_processing+0x18d/0x270 > [542909.741867] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40 > [542909.741870] [<ffffffffa2223bee>] __slab_free+0x1ce/0x290 > [542909.741878] [<ffffffffa2272e58>] ? generic_setxattr+0x68/0x80 > [542909.741883] [<ffffffffa2273635>] ? __vfs_setxattr_noperm+0x65/0x1b0 > [542909.741889] [<ffffffffa232b7ae>] ? evm_inode_setxattr+0xe/0x10 > [542909.741892] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40 > [542909.741895] [<ffffffffa2223db6>] kfree+0x106/0x140 > [542909.741899] [<ffffffffa21ddcb5>] kvfree+0x35/0x40 > [542909.741902] [<ffffffffa227399b>] setxattr+0x15b/0x1e0 > [542909.741909] [<ffffffffa225c3ed>] ? putname+0x3d/0x60 > [542909.741914] [<ffffffffa225d602>] ? user_path_at_empty+0x72/0xc0 > [542909.741920] [<ffffffffa224d828>] ? __sb_start_write+0x58/0x120 > [542909.741926] [<ffffffffa22802f1>] ? do_utimes+0xf1/0x180 > [542909.741930] [<ffffffffa2273c87>] SyS_setxattr+0xb7/0x100 > [542909.741937] [<ffffffffa278dede>] system_call_fastpath+0x25/0x2a > [542909.741940] > ============================================================================= > [542909.741942] BUG kmalloc-8 (Tainted: G B OE ------------ ): > Wrong object count. Counter is 75 but counted were 95 > [542909.741944] > ----------------------------------------------------------------------------- > > [542909.741947] INFO: Slab 0xffffe0933440b3c0 objects=102 used=75 > fp=0xffff9bb6902cf558 flags=0x6fffff00000081 > [542909.741951] CPU: 25 PID: 50461 Comm: pool Kdump: loaded Tainted: G > B OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1 > [542909.741953] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 > 10/21/2019 > [542909.741954] Call Trace: > [542909.741958] [<ffffffffa277ac23>] dump_stack+0x19/0x1b > [542909.741961] [<ffffffffa2221b54>] slab_err+0xb4/0xe0 > [542909.741969] [<ffffffffa2030a1e>] ? show_stack+0x4e/0x60 > [542909.741972] [<ffffffffa2221561>] ? print_trailer+0x161/0x280 > [542909.741975] [<ffffffffa2221f85>] on_freelist+0x1c5/0x270 > [542909.742227] [<ffffffffa27774cc>] free_debug_processing+0x18d/0x270 > [542909.742479] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40 > [542909.742483] [<ffffffffa2223bee>] __slab_free+0x1ce/0x290 > [542909.742488] [<ffffffffa2272e58>] ? generic_setxattr+0x68/0x80 > [542909.742491] [<ffffffffa2273635>] ? __vfs_setxattr_noperm+0x65/0x1b0 > [542909.742495] [<ffffffffa232b7ae>] ? evm_inode_setxattr+0xe/0x10 > [542909.742498] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40 > [542909.742501] [<ffffffffa2223db6>] kfree+0x106/0x140 > [542909.742504] [<ffffffffa21ddcb5>] kvfree+0x35/0x40 > [542909.742508] [<ffffffffa227399b>] setxattr+0x15b/0x1e0 > [542909.742511] [<ffffffffa225c3ed>] ? putname+0x3d/0x60 > [542909.742515] [<ffffffffa225d602>] ? user_path_at_empty+0x72/0xc0 > [542909.742519] [<ffffffffa224d828>] ? __sb_start_write+0x58/0x120 > [542909.742523] [<ffffffffa22802f1>] ? do_utimes+0xf1/0x180 > [542909.742527] [<ffffffffa2273c87>] SyS_setxattr+0xb7/0x100 > [542909.742530] [<ffffffffa278dede>] system_call_fastpath+0x25/0x2a > [542909.742533] FIX kmalloc-8: Object count adjusted. > [542909.742536] > ============================================================================= > [542909.742538] BUG kmalloc-8 (Tainted: G B OE ------------ ): > Redzone overwritten > [542909.742539] > ----------------------------------------------------------------------------- > > [542909.742543] INFO: 0xffff9bb6902cf858-0xffff9bb6902cf85f. First byte > 0x4c instead of 0xcc > [542909.742545] INFO: Slab 0xffffe0933440b3c0 objects=102 used=95 > fp=0xffff9bb6902cf558 flags=0x6fffff00000081 > [542909.742547] INFO: Object 0xffff9bb6902cf850 @offset=2128 > fp=0x7f7f1b36102c7c10 > > [542909.742550] Redzone ffff9bb6902cf848: cc cc cc cc cc cc cc cc > ........ > [542909.742552] Object ffff9bb6902cf850: d0 0b d6 0b 88 01 00 25 > .......% > [542909.742555] Redzone ffff9bb6902cf858: 4c 4c 4c 4c 4c 4c 4c 4c > LLLLLLLL > [542909.742557] Padding ffff9bb6902cf868: 5a 5a 5a 5a 5a 5a 5a 5a > ZZZZZZZZ > [542909.742560] CPU: 25 PID: 50461 Comm: pool Kdump: loaded Tainted: G > B OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1 > [542909.742562] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 > 10/21/2019 > [542909.742563] Call Trace: > [542909.742567] [<ffffffffa277ac23>] dump_stack+0x19/0x1b > [542909.742570] [<ffffffffa2221561>] print_trailer+0x161/0x280 > [542909.742573] [<ffffffffa22217ef>] check_bytes_and_report+0xcf/0x110 > [542909.742576] [<ffffffffa222237d>] check_object+0x1dd/0x2a0 > [542909.742580] [<ffffffffa27773cc>] free_debug_processing+0x8d/0x270 > [542909.742583] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40 > [542909.742586] [<ffffffffa2223bee>] __slab_free+0x1ce/0x290 > [542909.742590] [<ffffffffa2272e58>] ? generic_setxattr+0x68/0x80 > [542909.742593] [<ffffffffa2273635>] ? __vfs_setxattr_noperm+0x65/0x1b0 > [542909.742596] [<ffffffffa232b7ae>] ? evm_inode_setxattr+0xe/0x10 > [542909.742599] [<ffffffffa21ddcb5>] ? kvfree+0x35/0x40 > [542909.742602] [<ffffffffa2223db6>] kfree+0x106/0x140 > [542909.742606] [<ffffffffa21ddcb5>] kvfree+0x35/0x40 > [542909.742609] [<ffffffffa227399b>] setxattr+0x15b/0x1e0 > [542909.742613] [<ffffffffa225c3ed>] ? putname+0x3d/0x60 > [542909.742617] [<ffffffffa225d602>] ? user_path_at_empty+0x72/0xc0 > [542909.742621] [<ffffffffa224d828>] ? __sb_start_write+0x58/0x120 > [542909.742624] [<ffffffffa22802f1>] ? do_utimes+0xf1/0x180 > [542909.742628] [<ffffffffa2273c87>] SyS_setxattr+0xb7/0x100 > [542909.742631] [<ffffffffa278dede>] system_call_fastpath+0x25/0x2a > [542909.742635] FIX kmalloc-8: Restoring > 0xffff9bb6902cf858-0xffff9bb6902cf85f=0xcc > > [542909.742648] FIX kmalloc-8: Object at 0xffff9bb6902cf850 not freed > [542909.763926] general protection fault: 0000 [#1] SMP > [542909.792826] Modules linked in: tcp_diag inet_diag fuse nfsd mgc(OE) > lustre(OE) lmv(OE) mdc(OE) fid(OE) osc(OE) lov(OE) fld(OE) ko2iblnd(OE) > ptlrpc(OE) obdclass(OE) cts lnet(OE) rpcsec_gss_krb5 nfsv4 dns_resolver > libcfs(OE) rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) > ib_cm(OE) ib_umad(OE) mlx5_fpga_tools(OE) mlx5_ib(OE) mlx5_core(OE) mlxfw(OE) > mlx4_en(OE) ipt_REJECT nf_reject_ipv4 nf_log_ipv4 nf_log_common xt_LOG > nf_conntrack_ipv4 nf_defrag_ipv4 xt_multiport xt_recent xt_conntrack > nf_conntrack iptable_filter mlx4_ib(OE) dm_mirror dm_region_hash dm_log > dm_mod ib_uverbs(OE) ib_core(OE) sb_edac intel_powerclamp coretemp intel_rapl > iosf_mbi kvm_intel mgag200 mlx4_core(OE) iTCO_wdt iTCO_vendor_support ttm kvm > drm_kms_helper irqbypass syscopyarea sysfillrect crc32_pclmul sysimgblt > crc32c_intel > [542910.218156] fb_sys_fops mlx_compat(OE) ghash_clmulni_intel drm > aesni_intel lrw gf128mul glue_helper ses ablk_helper devlink enclosure cryptd > drm_panel_orientation_quirks hpwdt i2c_i801 pcspkr pcc_cpufreq wmi ioatdma > ipmi_si acpi_power_meter ipmi_devintf ipmi_msghandler lpc_ich knem(OE) > binfmt_misc auth_rpcgss ip_tables smartpqi bridge stp llc xfs isci libsas > qla3xxx e1000e igb i2c_algo_bit megaraid_sas aacraid aic79xx ata_piix mpt2sas > raid_class mptspi scsi_transport_spi mptsas mptscsih mptbase arcmsr ahci > libahci sata_nv sata_svw bnx2x libcrc32c bnx2 ext4 mbcache jbd2 sata_sil > libata tg3 e1000 nfsv3 nfs_acl nfs lockd grace sunrpc fscache tun sd_mod > crc_t10dif crct10dif_generic sg ixgbe crct10dif_pclmul crct10dif_common hpsa > dca mdio hpilo ptp scsi_transport_sas pps_core [last unloaded: > ipmi_msghandler] > [542910.624054] > [542910.625230] CPU: 27 PID: 25861 Comm: gdbus Kdump: loaded Tainted: G > B OE ------------ 3.10.0-1062.9.1.el7.x86_64 #1 > [542910.685731] Hardware name: HP ProLiant BL460c Gen9, BIOS I36 > 10/21/2019 > [542910.724144] task: ffff9ba5b5bc1070 ti: ffff9ba6067c0000 task.ti: > ffff9ba6067c0000 > [542910.768155] RIP: 0010:[<ffffffffa21f711b>] [<ffffffffa21f711b>] > find_vma+0x3b/0x60 > [542910.810986] RSP: 0000:ffff9ba6067c3ea8 EFLAGS: 00010202 > [542910.840760] RAX: ffff9bb72066f1b8 RBX: 0000000000000004 RCX: > ffff9ba6067c3fd8 > [542910.880983] RDX: 7fff9bb7c2fec608 RSI: 0000000000682888 RDI: > ffff9ba002a34b00 > [542910.919946] RBP: ffff9ba6067c3ea8 R08: 0000000000000001 R09: > 0000000000000000 > [542910.958846] R10: 000000000000001c R11: 00002aaaae480b40 R12: > 00000000000000a8 > [542910.998593] R13: 0000000000682888 R14: ffff9ba6067c3f58 R15: > ffff9ba002a34b00 > [542911.038992] FS: 00002aaabc395700(0000) GS:ffff9bb97f140000(0000) > knlGS:0000000000000000 > [542911.095715] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [542911.155694] CR2: 0000000000682888 CR3: 0000003214b00000 CR4: > 00000000003607e0 > [542911.202949] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > 0000000000000000 > [542911.265589] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > 0000000000000400 > [542911.315387] Call Trace: > [542911.355844] [<ffffffffa278857d>] __do_page_fault+0x13d/0x500 > [542911.413348] [<ffffffffa2788975>] do_page_fault+0x35/0x90 > [542911.455443] [<ffffffffa2784778>] page_fault+0x28/0x30 > [542911.495307] Code: 74 06 48 39 70 08 77 40 48 8b 57 08 31 c0 48 85 d2 > 75 18 eb 2e 0f 1f 00 48 3b 72 e0 48 8d 42 e0 73 1d 48 8b 52 10 48 85 d2 74 0f > <48> 3b 72 e8 72 e7 48 8b 52 08 48 85 d2 75 f1 48 85 c0 74 04 48 > [542911.665436] RIP [<ffffffffa21f711b>] find_vma+0x3b/0x60 > [542911.695917] RSP <ffff9ba6067c3ea8> > > -- > -- > # Dr. Christopher Mountford > # System specialist - Research Computing/HPC > # > # IT services, > # University of Leicester, University Road, > # Leicester, LE1 7RH, UK > # > # t: 0116 252 3471 > # e: [email protected] > > _______________________________________________ > lustre-discuss mailing list > [email protected] > > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.lustre.org%2Flistinfo.cgi%2Flustre-discuss-lustre.org&data=02%7C01%7Ccjm14%40leicester.ac.uk%7Cd30ebfdb815d4a0379ea08d79dcd6755%7Caebecd6a31d44b0195ce8274afe853d9%7C0%7C0%7C637151377842025728&sdata=w2ogPwBp4j9GQ1P4mtJuhnRIGR%2FcJw94FbNb151MX%2Fk%3D&reserved=0 > > -- -- # Dr. Christopher Mountford # System specialist - Research Computing/HPC # # IT services, # University of Leicester, University Road, # Leicester, LE1 7RH, UK # # t: 0116 252 3471 # e: [email protected] _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
