Dear all,

We have started to use more intensively cephfs for some wlcg related workload.
We have 3 active mds instances spread on 3 servers, mds_cache_memory_limit=12G, 
most of the other configs are default ones.
One of them has crashed this night leaving the log below.
Do you have any hint on what could be the cause and how to avoid it?

Regards,

Giuseppe

[root@naret-monitor03 ~]# journalctl -u 
ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service
...
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific >
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  1: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  2: abort()
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  3: /lib64/libstdc++.so.6(+0x987ba) [0x7fe2912567ba]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  4: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  5: /lib64/libstdc++.so.6(+0x95559) [0x7fe291253559]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  6: __gxx_personality_v0()
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  7: /lib64/libgcc_s.so.1(+0x10b03) [0x7fe290c34b03]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  8: _Unwind_Resume()
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  9: /usr/bin/ceph-mds(+0x18c104) [0x5638351e7104]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  10: /lib64/libpthread.so.0(+0x12ce0) [0x7fe291e4fce0]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  11: gsignal()
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  12: abort()
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  13: /lib64/libstdc++.so.6(+0x9009b) [0x7fe29124e09b]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  14: /lib64/libstdc++.so.6(+0x9653c) [0x7fe29125453c]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  15: /lib64/libstdc++.so.6(+0x96597) [0x7fe291254597]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  16: /lib64/libstdc++.so.6(+0x967f8) [0x7fe2912547f8]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  17: /lib64/libtcmalloc.so.4(+0x19fa4) [0x7fe29bae6fa4]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  18: (tcmalloc::ThreadCache::FetchFromCentralCache(unsigned int, int, vo>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  19: (std::shared_ptr<inode_t<mempool::mds_co::pool_allocator> > InodeSt>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  20: (CInode::_decode_base(ceph::buffer::v15_2_0::list::iterator_impl<tr>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  21: (CInode::decode_import(ceph::buffer::v15_2_0::list::iterator_impl<t>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  22: (Migrator::decode_import_inode(CDentry*, ceph::buffer::v15_2_0::lis>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  23: (Migrator::decode_import_dir(ceph::buffer::v15_2_0::list::iterator_>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  24: (Migrator::handle_export_dir(boost::intrusive_ptr<MExportDir const>>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  25: (Migrator::dispatch(boost::intrusive_ptr<Message const> const&)+0x1>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  26: (MDSRank::handle_message(boost::intrusive_ptr<Message const> const&>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  27: (MDSRank::_dispatch(boost::intrusive_ptr<Message const> const&, boo>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  28: (MDSRankDispatcher::ms_dispatch(boost::intrusive_ptr<Message const>>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  29: (MDSDaemon::ms_dispatch2(boost::intrusive_ptr<Message> const&)+0x10>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  30: (DispatchQueue::entry()+0x126a) [0x7fe2930a5aba]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  31: (DispatchQueue::DispatchThread::entry()+0x11) [0x7fe2931575d1]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  32: /lib64/libpthread.so.0(+0x81cf) [0x7fe291e451cf]
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  33: clone()
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
  NOTE: a copy of the executable, or `objdump -rdS <executable>` is neede>
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
 --- begin dump of recent events ---
Jan 19 04:49:40 naret-monitor03 
ceph-63334166-d991-11eb-99de-40a6b72108d0-mds-cephfs-naret-monitor03-lqppte[4397]:
 terminate called recursively
Jan 19 04:49:43 naret-monitor03 systemd[1]: 
ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service:
 Main process exited, code=exited, status=127/n/a
Jan 19 04:49:43 naret-monitor03 systemd[1]: 
ceph-63334166-d991-11eb-99de-40a6b72108d0@mds.cephfs.naret-monitor03.lqppte.service:
 Failed with result 'exit-code'.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to