Relatively rarely across a 200-machine cluster I get an LBUG on the clients which seems triggered by specific access patterns (most jobs do not trigger it) and looks quite similar to:
https://jira.whamcloud.com/browse/LU-16637 http://lists.lustre.org/pipermail/lustre-devel-lustre.org/2023-April/011016.html https://git.whamcloud.com/?p=fs/lustre-release.git;a=commit;h=7bb1e211d217d5a82ac2d5e4edad5ae018090761 Since the LBUG is fatal all I get is the backtrace from the crash dump: lbug_with_loc.cxold.8+0x18 ll_truncate_inode_pages_final+0xab vvp_prune+0x181 cl_object_prune+0x58 lov_layout_change.isra.49+0x1ba lov_conf_set+0x391 cl_conf_set+0x60 ll_layout_conf+0x14b ? _ptlrpc_req_finished+0x54d ll_layout_lock_set+0x3df ? ll_take_md_lock+0x148 ll_layout_refresh+0x1cc vvp_io_init+0x22e cl_io_init0.isra.14+0x86 ll_file_io_generic+0x388 ? file_update_time+0x62 ? srso_return_thunk+0x5 ? __generic_file_write_iter+0x102 ll_file_write_iter+0x558 ? kmem_cache_freee+0x116 new_sync_write+0x112 vfs_write+0x5a If this is a manifestation of LU-16637 there is a fix, but I have checked the changelogs and LU-16637 is listed as applied to 2.16.0 but it does not seem to be listed in the 2.15.[1-6] changelogs. _______________________________________________ lustre-discuss mailing list lustre-discuss@lists.lustre.org http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org