Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-28 Thread Gregory Farnum
On Thu, May 28, 2015 at 1:04 AM, Kenneth Waegeman wrote: > > > On 05/27/2015 10:30 PM, Gregory Farnum wrote: >> >> On Wed, May 27, 2015 at 6:49 AM, Kenneth Waegeman >> wrote: >>> >>> We are also running a full backup sync to cephfs, using multiple >>> distributed >>> rsync streams (with zkrsync),

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-28 Thread Kenneth Waegeman
On 05/27/2015 10:30 PM, Gregory Farnum wrote: On Wed, May 27, 2015 at 6:49 AM, Kenneth Waegeman wrote: We are also running a full backup sync to cephfs, using multiple distributed rsync streams (with zkrsync), and also ran in this issue today on Hammer 0.94.1 . After setting the beacon higer

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-27 Thread Gregory Farnum
On Wed, May 27, 2015 at 6:49 AM, Kenneth Waegeman wrote: > We are also running a full backup sync to cephfs, using multiple distributed > rsync streams (with zkrsync), and also ran in this issue today on Hammer > 0.94.1 . > After setting the beacon higer, and eventually clearing the journal, it >

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-27 Thread Kenneth Waegeman
We are also running a full backup sync to cephfs, using multiple distributed rsync streams (with zkrsync), and also ran in this issue today on Hammer 0.94.1 . After setting the beacon higer, and eventually clearing the journal, it stabilized again. We were using ceph-fuse to mount the cephfs,

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-25 Thread Yan, Zheng
the kernel client bug should be fixed by https://github.com/ceph/ceph-client/commit/72f22efb658e6f9e126b2b0fcb065f66ffd02239 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread John Spray
On 22/05/2015 20:06, Gregory Farnum wrote: Ugh. We appear to be trying to allocate too much memory for this event in the journal dump; we'll need to fix this. :( It's not even per-event, it tries to load the entire journal into memory in one go. This a hangover from the old Dumper/Resetter

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Adam Tygart
Alright, bumping that up 10 worked. the MDS server came up and "recovered". Took about 1 minute. Thanks again, guys. -- Adam On Fri, May 22, 2015 at 2:50 PM, Gregory Farnum wrote: > On Fri, May 22, 2015 at 12:45 PM, Adam Tygart wrote: >> Fair enough. Anyway, is it safe to now increase the '

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Gregory Farnum
On Fri, May 22, 2015 at 12:45 PM, Adam Tygart wrote: > Fair enough. Anyway, is it safe to now increase the 'mds beacon grace' > to try and get the mds server functional again? Yep! Let us know how it goes... > > I realize there is nothing simple about the things that are being > accomplished her

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Adam Tygart
Fair enough. Anyway, is it safe to now increase the 'mds beacon grace' to try and get the mds server functional again? I realize there is nothing simple about the things that are being accomplished here, and thank everyone for their hard work on making this stuff work as well as it does. -- Adam

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Gregory Farnum
On Fri, May 22, 2015 at 12:34 PM, Adam Tygart wrote: > I believe I grabbed all of theses files: > > for x in $(rados -p metadata ls | grep -E '^200\.'); do rados -p > metadata get ${x} /tmp/metadata/${x}; done > tar czSf journal.tar.gz /tmp/metadata > > https://drive.google.com/file/d/0B4XF1RWjuGh

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Adam Tygart
I believe I grabbed all of theses files: for x in $(rados -p metadata ls | grep -E '^200\.'); do rados -p metadata get ${x} /tmp/metadata/${x}; done tar czSf journal.tar.gz /tmp/metadata https://drive.google.com/file/d/0B4XF1RWjuGh5MVFqVFZfNmpfQWc/view?usp=sharing When this crash occurred, the r

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Gregory Farnum
On Fri, May 22, 2015 at 11:34 AM, Adam Tygart wrote: > On Fri, May 22, 2015 at 11:47 AM, John Spray wrote: >> >> >> On 22/05/2015 15:33, Adam Tygart wrote: >>> >>> Hello all, >>> >>> The ceph-mds servers in our cluster are performing a constant >>> boot->replay->crash in our systems. >>> >>> I ha

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Adam Tygart
On Fri, May 22, 2015 at 11:47 AM, John Spray wrote: > > > On 22/05/2015 15:33, Adam Tygart wrote: >> >> Hello all, >> >> The ceph-mds servers in our cluster are performing a constant >> boot->replay->crash in our systems. >> >> I have enable debug logging for the mds for a restart cycle on one of

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Lincoln Bryant
I notice in both logs, the last entry before the MDS restart/failover is when the mds is replaying the journal and gets to /homes/gundimed/IPD/10kb/1e-500d/DisplayLog/ 2015-05-22 09:59:19.116231 7f9d930c1700 10 mds.0.journal EMetaBlob.replay for [2,head] had [inode 13f8e31 [...2,head] /hom

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread John Spray
On 22/05/2015 15:33, Adam Tygart wrote: Hello all, The ceph-mds servers in our cluster are performing a constant boot->replay->crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. You found a bug, or more correctly you probably found mult

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Adam Tygart
I knew I forgot to include something with my initial e-mail. Single active with failover. dumped mdsmap epoch 30608 epoch 30608 flags 0 created 2015-04-02 16:15:55.209894 modified2015-05-22 11:39:15.992774 tableserver 0 root0 session_timeout 60 session_autoclose 300 max_

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Lincoln Bryant
I've experienced MDS issues in the past, but nothing sticks out to me in your logs. Are you using a single active MDS with failover, or multiple active MDS? --Lincoln On May 22, 2015, at 10:10 AM, Adam Tygart wrote: > Thanks for the quick response. > > I had 'debug mds = 20' in the first log

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Adam Tygart
Thanks for the quick response. I had 'debug mds = 20' in the first log, I added 'debug ms = 1' for this one: https://drive.google.com/file/d/0B4XF1RWjuGh5bXFnRzE1SHF6blE/view?usp=sharing Based on these logs, it looks like heartbeat_map is_healthy 'MDS' just times out and then the mds gets respawn

Re: [ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Lincoln Bryant
Hi Adam, You can get the MDS to spit out more debug information like so: # ceph mds tell 0 injectargs '--debug-mds 20 --debug-ms 1' At least then you can see where it's at when it crashes. --Lincoln On May 22, 2015, at 9:33 AM, Adam Tygart wrote: > Hello all, > > The ceph-mds servers

[ceph-users] Ceph MDS continually respawning (hammer)

2015-05-22 Thread Adam Tygart
Hello all, The ceph-mds servers in our cluster are performing a constant boot->replay->crash in our systems. I have enable debug logging for the mds for a restart cycle on one of the nodes[1]. Kernel debug from cephfs client during reconnection attempts: [732586.352173] ceph: mdsc delayed_work