Hi!

Yes, resetting journals is exactly what we did, quite a while ago, when the mds 
ran out of memory because a journal entry had an absurdly large number in it (I 
think it may have been an inode number). We probably also reset the inode table 
later, which I recently learned resets a data structure on disk, and probably 
started us overwriting inodes or dentries or both.


So I take it (we are learning about filesystems very quickly over here) that 
ceph is reusing inode numbers. Re-scanning dentries will somehow figure out 
which dentry is most recent, and remove the older (now wrong) one. And somehow 
it can handle hard links, possibly (we don't have many, or any, of these).


Thanks very much for your help. This has been fascinating.


Neale




________________________________
From: Patrick Donnelly <pdonn...@redhat.com>
Sent: Monday, October 28, 2019 12:52
To: Pickett, Neale T
Cc: ceph-users
Subject: Re: [ceph-users] Problematic inode preventing ceph-mds from starting

On Fri, Oct 25, 2019 at 12:11 PM Pickett, Neale T <ne...@lanl.gov> wrote:
> In the last week we have made a few changes to the down filesystem in an 
> attempt to fix what we thought was an inode problem:
>
>
> cephfs-data-scan scan_extents   # about 1 day with 64 processes
>
> cephfs-data-scan scan_inodes   # about 1 day with 64 processes
>
> cephfs-data_scan scan_links   # about 1 day

Did you reset the journals or perform any other disaster recovery
commands? This process likely introduced the duplicate inodes.

> After these three, we tried to start an MDS and it stayed up. We then ran:
>
> ceph tell mds.a scrub start / recursive repair
>
>
> The repair ran about 3 days, spewing logs to `ceph -w` about duplicated 
> inodes, until it stopped. All looked well until we began bringing production 
> services back online, at which point many error messages appeared, the mds 
> went back into damaged, and the fs back to degraded. At this point I removed 
> the objects you suggested, which brought everything back briefly.
>
> The latest crash is:
>
>     -1> 2019-10-25 18:47:50.731 7fc1f3b56700 -1 
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc:
>  In function 'void MDCache::add_inode(CInode*)' thread 7fc1f3b56700 time 
> 2019-1...
>
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.2/rpm/el7/BUILD/ceph-14.2.2/src/mds/MDCache.cc:
>  258: FAILED ceph_assert(!p)

This error indicates a duplicate inode loaded into cache. Fixing this
probably requires significant intervention and (meta)data loss for
recent changes:

- Stop/unmount all clients. (Probably already the case if the rank is damaged!)

- Reset the MDS journal [1] and optionally recover any dentries first.
(This will hopefully resolve the ESubtreeMap errors you pasted.) Note
that some metadata may be lost through this command.

- `cephfs-data_scan scan_links` again. This should repair any
duplicate inodes (by dropping the older dentries).

- Then you can try marking the rank as repaired.

Good luck!

[1] 
https://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/#journal-truncation


--
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to