Hi all,

Just providing an update to this -- I started the mds daemon on a new server 
and rebooted a box with a hung CephFS mount (from the first crash) and the 
problem seems to have gone away. 

I'm still not sure why the mds was shutting down with a "Caught signal", 
though. 

Cheers,
Lincoln

On Nov 13, 2014, at 11:01 AM, Lincoln Bryant wrote:

> Hi Cephers,
> 
> Over night, our MDS crashed, failing over to the standby which also crashed! 
> Upon trying to restart them this morning, I find that they no longer start 
> and always seem to crash on the same file in the logs. I've pasted part of a 
> "ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1'" below [1].
> 
> Can anyone help me interpret this error? 
> 
> Thanks for your time,
> Lincoln Bryant
> 
> [1]
>    -7> 2014-11-13 10:52:15.064784 7fc49d8ab700  7 mds.0.locker rdlock_start  
> on (ifile sync->mix) on [inode 1000258c3c8 [2,head] 
> /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) 
> (ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} 
> caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
> ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
>    -6> 2014-11-13 10:52:15.064794 7fc49d8ab700  7 mds.0.locker rdlock_start 
> waiting on (ifile sync->mix) on [inode 1000258c3c8 [2,head] 
> /stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) 
> (ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} 
> caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
> ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
>    -5> 2014-11-13 10:52:15.064805 7fc49d8ab700 10 
> mds.0.cache.ino(1000258c3c8) add_waiter tag 40000000 0xbf71920 !ambig 1 
> !frozen 1 !freezing 1
>    -4> 2014-11-13 10:52:15.064808 7fc49d8ab700 15 
> mds.0.cache.ino(1000258c3c8) taking waiter here
>    -3> 2014-11-13 10:52:15.064810 7fc49d8ab700 10 mds.0.locker nudge_log 
> (ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile 
> auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion 
> lock) cr={374559=0-4194304@1} 
> caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
> ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
>    -2> 2014-11-13 10:52:15.064827 7fc49d8ab700  1 -- 
> 192.170.227.116:6800/6489 <== osd.104 192.170.227.122:6812/1084 911 ==== 
> osd_op_reply(82611 100022a4e3a.00000000 [tmapget 0~0] v0'0 uv78780 ondisk = 
> 0) v6 ==== 187+0+1410 (1370366691 0 1858920835) 0x298ffd00 con 0x5b606e0
>    -1> 2014-11-13 10:52:15.064843 7fc49d8ab700 10 
> mds.0.cache.dir(100022a4e3a) _tmap_fetched 1410 bytes for [dir 100022a4e3a 
> /stash/user/daveminh/data/DUD/ampc/AlGDock/dock/DUDE.decoy.CHB-1l2sA.0-0/ 
> [2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | 
> waiter=1 authpin=1 0x3b0a040] want_dn=
>     0> 2014-11-13 10:52:15.066789 7fc49d8ab700 -1 *** Caught signal (Aborted) 
> **
> in thread 7fc49d8ab700
> 
> ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
> 1: /usr/bin/ceph-mds() [0x82f741]
> 2: /lib64/libpthread.so.0() [0x371c40f710]
> 3: (gsignal()+0x35) [0x371bc32635]
> 4: (abort()+0x175) [0x371bc33e15]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x371e0bea5d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
> interpret this.
> 
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to