Another tidbit: the 2 OST nodes showing problems have an lsfsck running and I cannot stop it
[root@elfsa2o1 ~]# grep status /proc/fs/lustre/osd-zfs/lfsarc02-OST*/oi_scrub /proc/fs/lustre/osd-zfs/lfsarc02-OST0000/oi_scrub:status: completed /proc/fs/lustre/osd-zfs/lfsarc02-OST0002/oi_scrub:status: scanning /proc/fs/lustre/osd-zfs/lfsarc02-OST0004/oi_scrub:status: scanning /proc/fs/lustre/osd-zfs/lfsarc02-OST0006/oi_scrub:status: scanning /proc/fs/lustre/osd-zfs/lfsarc02-OST0008/oi_scrub:status: scanning /proc/fs/lustre/osd-zfs/lfsarc02-OST000a/oi_scrub:status: scanning An lfsck on the MDT hangs, as orphaned inodes cannot be deleted [ 9568.345851] LustreError: 6592:0:(osp_precreate.c:970:osp_precreate_cleanup_orphans()) lfsarc02-OST0006-osc-MDT0000: cannot cleanup orphans: rc = -22 [ 9568.364339] LustreError: 6592:0:(osp_precreate.c:970:osp_precreate_cleanup_orphans()) Skipped 6590 previous similar messages Is there any way to stop the scans on the OSTs? From: Hebenstreit, Michael Sent: Tuesday, June 23, 2020 11:19 To: [email protected] Subject: problem after upgrading 2.10.4 to 2.12.4 We experienced on our Archive Lustre (ZFS based, 4 OST servers with 6 OSTs pools each) the very same issues as described here: https://jira.whamcloud.com/browse/LU-13392 Certain directories cannot be accessed, and the OSTs shows thousands of errors "Can't find FID Sequence". Unfortunately I cannot even start the recommended file system checking on the OST devices - example: [root@elfsa2o1 ~]# lctl lfsck_start -o -M lfsarc02-OST0002 Fail to start LFSCK: Operation not permitted [root@elfsa2o1 ~]# lctl lfsck_start -M lfsarc02-OST0002 Fail to start LFSCK: Operation not supported On a similar system that was first installed as 2.10.4, then upgraded to 2.10.8, and now is also running on 2.12.4, at least the second command starts: # lctl lfsck_start -M lfsarc01-OST0002 The commands are issued on the system with the actual ZFS pools running. Questions: Is there any way to force the file system checks? Has anyone found a workaround for the FID sequence errors? Can I downgrade from 2.12.4 to 2.10.8 without destroying the FS? Has the error described in https://jira.whamcloud.com/browse/LU-13392 been fixed in 2.12.5<https://jira.whamcloud.com/browse/LU-13392%20been%20fixed%20in%202.12.5>? Thanks Michael ------------------------------------------------------------------------ Michael Hebenstreit Senior Cluster Architect Intel Corporation, MS: RR1-105/H14 TSACG 1600 Rio Rancho Blvd SE Tel.: +1 505-794-3144 Rio Rancho, NM 87124 UNITED STATES E-mail: [email protected]<mailto:[email protected]>
_______________________________________________ lustre-discuss mailing list [email protected] http://lists.lustre.org/listinfo.cgi/lustre-discuss-lustre.org
