Date: Thu, 20 Jan 2022 12:07:40 +0000 From: Christopher Mountford <[email protected]> To: [email protected] Subject: Client crashes User-Agent: NeoMutt/20170306 (1.8.0)
Hi All, We've started getting some fairly regular client panics on out lustre 2.12.7 filesystem, looking at the stack trace I think we are hitting this bug: https://jira.whamcloud.com/browse/LU-12752 I note that a fix is in 2.15.0, is this likely to be patched in a 2.12 release? We're still trying to isolate the job that is causing the crash, but once we have we should be able to reproduce this reliably. Kind Regards, Christopher. Log entriy: Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_cache.c:2519:osc_teardown_async_page()) extent ffff937e2756e4d0@{[0 -> 255/255], [2|0|-|cache|wi|ffff92fdd1dd8b40], [1703936|1|+|-|ffff932384f1e880|256| (null)]} trunc at 42. +Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_cache.c:2519:osc_teardown_async_page()) ### extent: ffff937e2756e4d0 ns: alice3-OST001f-osc-ffff938e6a743000 lock: ffff932384f1e880/0x6024b6d908313ce7 lrc: 2/0,0 mode: PW/PW res: +[0x7c0000400:0x5c888a:0x0].0x0 rrc: 2 type: EXT [0->18446744073709551615] (req 65536->172031) flags: 0x800020000000000 nid: local remote: 0x345e4fe1c451a182 expref: -99 pid: 955 timeout: 0 lvb_type: 1 +Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) page@ffff933651225e00[2 ffff93228480b2f0 4 1 (null)] Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) vvp-page@ffff933651225e50(0:0) vm@ffffeaeada357d80 6fffff00000879 3:0 ffff933651225e00 42 lru Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) lov-page@ffff933651225e90, comp index: 10000, gen: 6 Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) osc-page@ffff933651225ec8 42: 1< 0x845fed 2 0 + - > 2< 172032 0 4096 0x0 0x420 | (null) ffff938e52a7d738 ffff92fdd1dd8b40 > 3< 0 0 0 > 4< 0 0 8 1703936 - | - - + - > +5< - - + - | 0 - | 1 - -> +Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) end page@ffff933651225e00 Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:192:osc_page_delete()) Trying to teardown failed: -16 Jan 20 10:23:39 lmem006 kernel: LustreError: 4661:0:(osc_page.c:193:osc_page_delete()) ASSERTION( 0 ) failed: Jan 20 10:23:40 lmem006 kernel: LustreError: 4661:0:(osc_page.c:193:osc_page_delete()) LBUG Jan 20 10:23:40 lmem006 kernel: Pid: 4661, comm: diamond 3.10.0-1160.49.1.el7.x86_64 #1 SMP Tue Nov 30 15:51:32 UTC 2021 Jan 20 10:23:40 lmem006 kernel: Call Trace: Jan 20 10:23:40 lmem006 kernel: [<ffffffffc0f087cc>] libcfs_call_trace+0x8c/0xc0 [libcfs] Jan 20 10:23:40 lmem006 kernel: [<ffffffffc0f0887c>] lbug_with_loc+0x4c/0xa0 [libcfs] Jan 20 10:23:40 lmem006 kernel: [<ffffffffc145fe7f>] osc_page_delete+0x48f/0x500 [osc] Jan 20 10:23:40 lmem006 kernel: [<ffffffffc107b2d0>] cl_page_delete0+0x80/0x220 [obdclass] Jan 20 10:23:40 lmem006 kernel: [<ffffffffc107b4a3>] cl_page_delete+0x33/0x110 [obdclass] Jan 20 10:23:40 lmem006 kernel: [<ffffffffc156f27f>] ll_invalidatepage+0x7f/0x170 [lustre] Jan 20 10:23:40 lmem006 kernel: [<ffffffff93bcefed>] do_invalidatepage_range+0x7d/0x90 Jan 20 10:23:40 lmem006 kernel: [<ffffffff93bcf097>] truncate_inode_page+0x77/0x80 Jan 20 10:23:40 lmem006 kernel: [<ffffffff93bcf2ca>] truncate_inode_pages_range+0x1ea/0x750 Jan 20 10:23:40 lmem006 kernel: [<ffffffff93bcf89f>] truncate_inode_pages_final+0x4f/0x60 Jan 20 10:23:40 lmem006 kernel: [<ffffffffc1554c1f>] ll_delete_inode+0x4f/0x230 [lustre] Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c6c934>] evict+0xb4/0x180 Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c6cd6c>] iput+0xfc/0x190 Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c676f8>] __dentry_kill+0x158/0x1d0 Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c67d95>] dput+0xb5/0x1a0 Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c5092d>] __fput+0x18d/0x230 Jan 20 10:23:40 lmem006 kernel: [<ffffffff93c50abe>] ____fput+0xe/0x10 Jan 20 10:23:40 lmem006 kernel: [<ffffffff93ac299b>] task_work_run+0xbb/0xe0 Jan 20 10:23:40 lmem006 kernel: [<ffffffff93a2cc65>] do_notify_resume+0xa5/0xc0 Jan 20 10:23:40 lmem006 kernel: [<ffffffff941962ef>] int_signal+0x12/0x17 Jan 20 10:23:40 lmem006 kernel: [<ffffffffffffffff>] 0xffffffffffffffff Jan 20 10:23:40 lmem006 kernel: Kernel panic - not syncing: LBUG _______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
