Re: [ceph-users] ceph mds crashing constantly : ceph_assert fail … prepare_new_inode

Gregory Farnum Fri, 10 Aug 2018 10:15:21 -0700

As Paul said, the MDS is loading "duplicate inodes" and that's very bad. If
you've already gone through some of the disaster recovery steps, that's
likely the cause. But you'll need to provide a *lot* more information about
what you've already done to the cluster for people to be sure.


The backwards scan referred to is the scan_extents/scan_inodes work
described in
http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/#recovery-from-missing-metadata-objects

Be advised that there is limited user experience with *any* of these tools
and that you have stumbled into some dark corners. I'm rather surprised
that a newish deployment could have needed to make use of any of this
repair functionality — if you are deliberately breaking things to see how
it recovers, you should probably spend some more time understanding
plausible failure cases. This generally only comes up in the case of
genuine data loss due to multiple simultaneous hardware failures.
-Greg

On Fri, Aug 10, 2018 at 9:05 AM Amit Handa <amit.ha...@gmail.com> wrote:

> Thanks alot, Paul.
> we did (hopefully) follow through with the disaster recovery.
> however, please guide me in how to get the cluster back up !
>
> Thanks,
>
>
> On Fri, Aug 10, 2018 at 9:32 PM Paul Emmerich <paul.emmer...@croit.io>
> wrote:
>
>> Looks like you got some duplicate inodes due to corrupted metadata, you
>> likely tried to a disaster recovery and didn't follow through it
>> completely or
>> you hit some bug in Ceph.
>>
>> The solution here is probably to do a full recovery of the metadata/full
>> backwards scan after resetting the inodes. I've recovered a cluster from
>> something similar just a few weeks ago. Annoying but recoverable.
>>
>> Paul
>>
>> 2018-08-10 13:26 GMT+02:00 Amit Handa <amit.ha...@gmail.com>:
>>
>>> We are facing constant crash from ceph mds. We have installed mimic
>>> (v13.2.1).
>>>
>>> mds: cephfs-1/1/1 up {0=node2=up:active(laggy or crashed)}
>>>
>>> *mds logs: https://pastebin.com/AWGMLRm0 <https://pastebin.com/AWGMLRm0>*
>>>
>>> we have followed the DR steps listed at
>>>
>>> *http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
>>> <http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/> *
>>>
>>> please help in resolving the errors :(
>>>
>>> mds crash stacktrace
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> * ceph version 13.2.1 (5533ecdc0fda920179d7ad84e0aa65a127b20d77) mimic
>>> (stable) 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
>>> const*)+0xff) [0x7f984fc3ee1f] 2: (()+0x284fe7) [0x7f984fc3efe7] 3:
>>> (()+0x2087fe) [0x5563e88537fe] 4:
>>> (Server::prepare_new_inode(boost::intrusive_ptr<MDRequestImpl>&, CDir*,
>>> inodeno_t, unsigned int, file_layout_t*)+0xf37) [0x5563e87ce777] 5:
>>> (Server::handle_client_openc(boost::intrusive_ptr<MDRequestImpl>&)+0xdb0)
>>> [0x5563e87d0bd0] 6: (Server::handle_client_request(MClientRequest*)+0x49e)
>>> [0x5563e87d3c0e] 7: (Server::dispatch(Message*)+0x2db) [0x5563e87d789b] 8:
>>> (MDSRank::handle_deferrable_message(Message*)+0x434) [0x5563e87514b4] 9:
>>> (MDSRank::_dispatch(Message*, bool)+0x63b) [0x5563e875db5b] 10:
>>> (MDSRank::retry_dispatch(Message*)+0x12) [0x5563e875e302] 11:
>>> (MDSInternalContextBase::complete(int)+0x67) [0x5563e89afb57] 12:
>>> (MDSRank::_advance_queues()+0xd1) [0x5563e875cd51] 13:
>>> (MDSRank::ProgressThread::entry()+0x43) [0x5563e875d3e3] 14: (()+0x7e25)
>>> [0x7f984d869e25] 15: (clone()+0x6d) [0x7f984c949bad] NOTE: a copy of the
>>> executable, or `objdump -rdS <executable>` is needed to interpret this.*
>>>
>>> --
>>> Loading ...
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>
>>
>> --
>> Paul Emmerich
>>
>> Looking for help with your Ceph cluster? Contact us at https://croit.io
>>
>> croit GmbH
>> Freseniusstr. 31h
>> 81247 München
>> www.croit.io
>> Tel: +49 89 1896585 90 <+49%2089%20189658590>
>>
>
>
> --
> Loading ...
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] ceph mds crashing constantly : ceph_assert fail … prepare_new_inode

Reply via email to