Hi Cephers,

Over night, our MDS crashed, failing over to the standby which also crashed! 
Upon trying to restart them this morning, I find that they no longer start and 
always seem to crash on the same file in the logs. I've pasted part of a "ceph 
mds tell 0 injectargs '--debug-mds 20 --debug-ms 1'" below [1].

Can anyone help me interpret this error? 

Thanks for your time,
Lincoln Bryant

[1]
    -7> 2014-11-13 10:52:15.064784 7fc49d8ab700  7 mds.0.locker rdlock_start  
on (ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile 
auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion 
lock) cr={374559=0-4194304@1} 
caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
    -6> 2014-11-13 10:52:15.064794 7fc49d8ab700  7 mds.0.locker rdlock_start 
waiting on (ifile sync->mix) on [inode 1000258c3c8 [2,head] 
/stash/sys/etc/grid-mapfile auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) 
(ifile sync->mix) (iversion lock) cr={374559=0-4194304@1} 
caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
    -5> 2014-11-13 10:52:15.064805 7fc49d8ab700 10 mds.0.cache.ino(1000258c3c8) 
add_waiter tag 40000000 0xbf71920 !ambig 1 !frozen 1 !freezing 1
    -4> 2014-11-13 10:52:15.064808 7fc49d8ab700 15 mds.0.cache.ino(1000258c3c8) 
taking waiter here
    -3> 2014-11-13 10:52:15.064810 7fc49d8ab700 10 mds.0.locker nudge_log 
(ifile sync->mix) on [inode 1000258c3c8 [2,head] /stash/sys/etc/grid-mapfile 
auth v754009 ap=27+0 s=17384 n(v0 b17384 1=1+0) (ifile sync->mix) (iversion 
lock) cr={374559=0-4194304@1} 
caps={374511=pAsLsXsFr/pAsLsXsFscr/pFscr@5,374559=pAsLsXsFr/pAsxXsxFxwb@5} | 
ptrwaiter=0 request=26 lock=1 caps=1 dirty=1 waiter=1 authpin=1 0x5438900]
    -2> 2014-11-13 10:52:15.064827 7fc49d8ab700  1 -- 192.170.227.116:6800/6489 
<== osd.104 192.170.227.122:6812/1084 911 ==== osd_op_reply(82611 
100022a4e3a.00000000 [tmapget 0~0] v0'0 uv78780 ondisk = 0) v6 ==== 187+0+1410 
(1370366691 0 1858920835) 0x298ffd00 con 0x5b606e0
    -1> 2014-11-13 10:52:15.064843 7fc49d8ab700 10 mds.0.cache.dir(100022a4e3a) 
_tmap_fetched 1410 bytes for [dir 100022a4e3a 
/stash/user/daveminh/data/DUD/ampc/AlGDock/dock/DUDE.decoy.CHB-1l2sA.0-0/ 
[2,head] auth v=0 cv=0/0 ap=1+0+0 state=1073741952 f() n() hs=0+0,ss=0+0 | 
waiter=1 authpin=1 0x3b0a040] want_dn=
     0> 2014-11-13 10:52:15.066789 7fc49d8ab700 -1 *** Caught signal (Aborted) 
**
 in thread 7fc49d8ab700

 ceph version 0.80.7 (6c0127fcb58008793d3c8b62d925bc91963672a3)
 1: /usr/bin/ceph-mds() [0x82f741]
 2: /lib64/libpthread.so.0() [0x371c40f710]
 3: (gsignal()+0x35) [0x371bc32635]
 4: (abort()+0x175) [0x371bc33e15]
 5: (__gnu_cxx::__verbose_terminate_handler()+0x12d) [0x371e0bea5d]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to 
interpret this.


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to