Any chance for a fix soon?  In 14.2.5 ?

On Thu, Sep 19, 2019 at 8:44 PM Yan, Zheng <uker...@gmail.com> wrote:

> On Thu, Sep 19, 2019 at 11:37 PM Dan van der Ster <d...@vanderster.com>
> wrote:
> >
> > You were running v14.2.2 before?
> >
> > It seems that that  ceph_assert you're hitting was indeed added
> > between v14.2.2. and v14.2.3 in this commit
> >
> https://github.com/ceph/ceph/commit/12f8b813b0118b13e0cdac15b19ba8a7e127730b
> >
> > There's a comment in the tracker for that commit which says the
> > original fix was incomplete
> > (https://tracker.ceph.com/issues/39987#note-5)
> >
> > So perhaps nautilus needs
> >
> https://github.com/ceph/ceph/pull/28459/commits/0a1e92abf1cfc8bddf526cbf5bceea7b854dcfe8
> > ??
> >
>
> You are right. Sorry for the bug. For now, please got back to 14.2.2
> (just mds) or complie ceph-mds from source
>
> Yan, Zheng
>
> > Did you already try going back to v14.2.2 (on the MDS's only) ??
> >
> > -- dan
> >
> > On Thu, Sep 19, 2019 at 4:59 PM Kenneth Waegeman
> > <kenneth.waege...@ugent.be> wrote:
> > >
> > > Hi all,
> > >
> > > I updated our ceph cluster to 14.2.3 yesterday, and today the mds are
> crashing one after another. I'm using two active mds.
> > >
> > > I've made a tracker ticket, but I was wondering if someone else also
> has seen this issue yet?
> > >
> > >    -27> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865333:8887 lookup
> #0x100166004d4/WindowsPhone-MSVC-CXX.cmake 2019-09-19 15:42:00.203132
> caller_uid=0, caller_gid=0{0,}) v4
> > >    -26> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865372:5815 lookup
> #0x20005a6eb3a/selectable.cpython-37.pyc 2019-09-19 15:42:00.204970
> caller_uid=0, caller_gid=0{0,}) v4
> > >    -25> 2019-09-19 15:42:00.196 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865333:8888 lookup
> #0x100166004d4/WindowsPhone.cmake 2019-09-19 15:42:00.206381 caller_uid=0,
> caller_gid=0{0,}) v4
> > >    -24> 2019-09-19 15:42:00.206 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865333:8889 lookup
> #0x100166004d4/WindowsStore-MSVC-C.cmake 2019-09-19 15:42:00.209703
> caller_uid=0, caller_gid=0{0,}) v4
> > >    -23> 2019-09-19 15:42:00.206 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865333:8890 lookup
> #0x100166004d4/WindowsStore-MSVC-CXX.cmake 2019-09-19 15:42:00.213200
> caller_uid=0, caller_gid=0{0,}) v4
> > >    -22> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865333:8891 lookup
> #0x100166004d4/WindowsStore.cmake 2019-09-19 15:42:00.216577 caller_uid=0,
> caller_gid=0{0,}) v4
> > >    -21> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865333:8892 lookup
> #0x100166004d4/Xenix.cmake 2019-09-19 15:42:00.220230 caller_uid=0,
> caller_gid=0{0,}) v4
> > >    -20> 2019-09-19 15:42:00.216 7f0369aeb700  2 mds.1.cache Memory
> usage:  total 4603496, rss 4167920, heap 323836, baseline 323836, 501 /
> 1162471 inodes have caps, 506 caps, 0.00043528 caps per inode
> > >    -19> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log
> _submit_thread 30520209420029~9062 : EUpdate scatter_writebehind [metablob
> 0x1000bd8ac7b, 2 dirs]
> > >    -18> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log
> _submit_thread 30520209429111~10579 : EUpdate scatter_writebehind [metablob
> 0x1000bf26309, 9 dirs]
> > >    -17> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log
> _submit_thread 30520209439710~2305 : EUpdate scatter_writebehind [metablob
> 0x1000bf2745b.001*, 2 dirs]
> > >    -16> 2019-09-19 15:42:00.216 7f03652e2700  5 mds.1.log
> _submit_thread 30520209442035~1845 : EUpdate scatter_writebehind [metablob
> 0x1000c233753, 2 dirs]
> > >    -15> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865333:8893 lookup
> #0x100166004d4/eCos.cmake 2019-09-19 15:42:00.223360 caller_uid=0,
> caller_gid=0{0,}) v4
> > >    -14> 2019-09-19 15:42:00.216 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865319:2381 lookup
> #0x1001172f39d/microsoft-cp1251 2019-09-19 15:42:00.224940 caller_uid=0,
> caller_gid=0{0,}) v4
> > >    -13> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865333:8894 lookup
> #0x100166004d4/gas.cmake 2019-09-19 15:42:00.226624 caller_uid=0,
> caller_gid=0{0,}) v4
> > >    -12> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865319:2382 readdir
> #0x1001172f3d7 2019-09-19 15:42:00.228673 caller_uid=0, caller_gid=0{0,}) v4
> > >    -11> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865333:8895 lookup
> #0x100166004d4/kFreeBSD.cmake 2019-09-19 15:42:00.229668 caller_uid=0,
> caller_gid=0{0,}) v4
> > >    -10> 2019-09-19 15:42:00.226 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865333:8896 lookup
> #0x100166004d4/syllable.cmake 2019-09-19 15:42:00.232746 caller_uid=0,
> caller_gid=0{0,}) v4
> > >     -9> 2019-09-19 15:42:00.236 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865333:8897 readdir
> #0x10016601379 2019-09-19 15:42:00.240672 caller_uid=0, caller_gid=0{0,}) v4
> > >     -8> 2019-09-19 15:42:00.236 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865356:3604574 readdir
> #0x2000090d630 2019-09-19 15:42:00.241832 caller_uid=0, caller_gid=0{0,}) v4
> > >     -7> 2019-09-19 15:42:00.266 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865356:3604575 readdir
> #0x2000090d631 2019-09-19 15:42:00.272158 caller_uid=0, caller_gid=0{0,}) v4
> > >     -6> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log
> _submit_thread 30520209443900~3089 : EUpdate scatter_writebehind [metablob
> 0x20005af5c63, 3 dirs]
> > >     -5> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log
> _submit_thread 30520209447009~10579 : EUpdate scatter_writebehind [metablob
> 0x1000bf26309, 9 dirs]
> > >     -4> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log
> _submit_thread 30520209457608~2305 : EUpdate scatter_writebehind [metablob
> 0x1000bf2745b.001*, 2 dirs]
> > >     -3> 2019-09-19 15:42:00.326 7f03652e2700  5 mds.1.log
> _submit_thread 30520209459933~1030 : EUpdate check_inode_max_size [metablob
> 0x20005af5c74, 1 dirs]
> > >     -2> 2019-09-19 15:42:00.326 7f036c2f0700  4 mds.1.server
> handle_client_request client_request(client.37865372:5816 setattr size=1138
> #0x20005a6eb67 2019-09-19 15:42:00.333015 caller_uid=0, caller_gid=0{0,}) v4
> > >     -1> 2019-09-19 15:42:00.336 7f036c2f0700 -1
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.3/rpm/el7/BUILD/ceph-14.2.3/src/mds/MDCache.cc:
> In function 'CInode* MDCache::cow_inode(CInode*, snapid_t)' thread
> 7f036c2f0700 time 2019-09-19 15:42:00.333567
> > >
> /home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/14.2.3/rpm/el7/BUILD/ceph-14.2.3/src/mds/MDCache.cc:
> 1498: FAILED ceph_assert(!lock->get_num_wrlocks())
> > >
> > >  ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39)
> nautilus (stable)
> > >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x14a) [0x7f0375773ac2]
> > >  2: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
> const*, char const*, ...)+0) [0x7f0375773c90]
> > >  3: (()+0x1ee48d) [0x55a7e4ccf48d]
> > >  4: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*,
> snapid_t, CInode**, CDentry::linkage_t*)+0x823) [0x55a7e4ccfcb3]
> > >  5: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*,
> snapid_t)+0xbc) [0x55a7e4cd042c]
> > >  6: (Locker::_do_cap_update(CInode*, Capability*, int, snapid_t,
> boost::intrusive_ptr<MClientCaps const> const&,
> boost::intrusive_ptr<MClientCaps> const&, bool*)+0xfb6) [0x55a7e4d957e6]
> > >  7: (Locker::handle_client_caps(boost::intrusive_ptr<MClientCaps
> const> const&)+0x2059) [0x55a7e4d9c8e9]
> > >  8: (Locker::dispatch(boost::intrusive_ptr<Message const>
> const&)+0xe7) [0x55a7e4daaf97]
> > >  9: (MDSRank::handle_deferrable_message(boost::intrusive_ptr<Message
> const> const&)+0x304) [0x55a7e4c089e4]
> > >  10: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&,
> bool)+0x6eb) [0x55a7e4c0b1bb]
> > >  11: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message
> const> const&)+0x40) [0x55a7e4c0b8d0]
> > >  12: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message>
> const&)+0xfc) [0x55a7e4bf82ac]
> > >  13: (DispatchQueue::entry()+0x12a9) [0x7f0375968dd9]
> > >  14: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f0375a183dd]
> > >  15: (()+0x7dd5) [0x7f0373642dd5]
> > >  16: (clone()+0x6d) [0x7f03722f302d]
> > >
> > >      0> 2019-09-19 15:42:00.336 7f036c2f0700 -1 *** Caught signal
> (Aborted) **
> > >  in thread 7f036c2f0700 thread_name:ms_dispatch
> > >
> > >  ceph version 14.2.3 (0f776cf838a1ae3130b2b73dc26be9c95c6ccc39)
> nautilus (stable)
> > >  1: (()+0xf5d0) [0x7f037364a5d0]
> > >  2: (gsignal()+0x37) [0x7f037222b2c7]
> > >  3: (abort()+0x148) [0x7f037222c9b8]
> > >  4: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x199) [0x7f0375773b11]
> > >  5: (ceph::__ceph_assertf_fail(char const*, char const*, int, char
> const*, char const*, ...)+0) [0x7f0375773c90]
> > >  6: (()+0x1ee48d) [0x55a7e4ccf48d]
> > >  7: (MDCache::journal_cow_dentry(MutationImpl*, EMetaBlob*, CDentry*,
> snapid_t, CInode**, CDentry::linkage_t*)+0x823) [0x55a7e4ccfcb3]
> > >  8: (MDCache::journal_dirty_inode(MutationImpl*, EMetaBlob*, CInode*,
> snapid_t)+0xbc) [0x55a7e4cd042c]
> > >  9: (Locker::_do_cap_update(CInode*, Capability*, int, snapid_t,
> boost::intrusive_ptr<MClientCaps const> const&,
> boost::intrusive_ptr<MClientCaps> const&, bool*)+0xfb6) [0x55a7e4d957e6]
> > >  10: (Locker::handle_client_caps(boost::intrusive_ptr<MClientCaps
> const> const&)+0x2059) [0x55a7e4d9c8e9]
> > >  11: (Locker::dispatch(boost::intrusive_ptr<Message const>
> const&)+0xe7) [0x55a7e4daaf97]
> > >  12: (MDSRank::handle_deferrable_message(boost::intrusive_ptr<Message
> const> const&)+0x304) [0x55a7e4c089e4]
> > >  13: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&,
> bool)+0x6eb) [0x55a7e4c0b1bb]
> > >  14: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message
> const> const&)+0x40) [0x55a7e4c0b8d0]
> > >  15: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message>
> const&)+0xfc) [0x55a7e4bf82ac]
> > >  16: (DispatchQueue::entry()+0x12a9) [0x7f0375968dd9]
> > >  17: (DispatchQueue::DispatchThread::entry()+0xd) [0x7f0375a183dd]
> > >  18: (()+0x7dd5) [0x7f0373642dd5]
> > >  19: (clone()+0x6d) [0x7f03722f302d]
> > >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> needed to interpret this.
> > >
> > >
> > > Thanks!
> > >
> > > K
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-us...@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> _______________________________________________
> ceph-users mailing list
> ceph-us...@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to