On Thu, Aug 23, 2018 at 3:01 PM William Lawton <william.law...@irdeto.com> wrote: > > Hi John. > > Just picking up this thread again after coming back from leave. Our ceph > storage project has progressed and we are now making sure that the active MON > and MDS are kept on separate nodes which has helped reduce the incidence of > delayed client reconnects on ceph node failure. We've also disabled client > blacklisting which has prevented late clients from being permanently > disconnected. However, we still have occasional slow client reconnects if we > lose the active MON and MDS nodes at the same time (i.e. an AWS AZ failure > scenario). We would love to irradiate these slow reconnects entirely ideally. > One other thing we've noticed with our resiliency tests is that when we bring > down a MON node, there is always a MON re-election triggered, even if the > stopped MON node was not the leader. Do you know if there is a way to > configure ceph so that there is only a MON re-election if the current MON > leader is lost?
Hmm, I'm not sure exactly what the bounds are meant to be on how long the mon cluster takes to recover from a peon failure. However, if the elections are taking an unreasonably long time, that would certainly be a viable explanation for the strange reconnect behaviour -- if the FSMap is being updated, and most clients see it, but a few don't see it until after an election perhaps. John > > Thanks > > William Lawton > > -----Original Message----- > From: William Lawton > Sent: Wednesday, August 01, 2018 2:05 PM > To: 'John Spray' <jsp...@redhat.com> > Cc: ceph-users@lists.ceph.com; Mark Standley <mark.stand...@irdeto.com> > Subject: RE: [ceph-users] Intermittent client reconnect delay following node > fail > > I didn't lose any clients this time around, all clients reconnected within at > most 21 seconds. We think the very long client disconnections occurred when > both the mgr and mds were active on the failed node, which was not the case > for any of my recent 10 tests. We have noticed in the client logs like the > following: > > Aug 1 10:39:06 dub-ditv-sim-goldenimage kernel: libceph: mon0 > 10.18.49.35:6789 session lost, hunting for new mon > > We're currently exploring whether keeping the mds and mon daemons on separate > servers has less impact on the client when either one is lost. > > William Lawton > > -----Original Message----- > From: John Spray <jsp...@redhat.com> > Sent: Wednesday, August 01, 2018 1:14 PM > To: William Lawton <william.law...@irdeto.com> > Cc: ceph-users@lists.ceph.com; Mark Standley <mark.stand...@irdeto.com> > Subject: Re: [ceph-users] Intermittent client reconnect delay following node > fail > > On Wed, Aug 1, 2018 at 12:09 PM William Lawton <william.law...@irdeto.com> > wrote: > > > > Thanks for the advice John. > > > > Our CentOS 7 clients use linux kernel v3.10 so I upgraded one of them to > > use v4.17 and have run 10 more node fail tests. Unfortunately, the kernel > > upgrade on the client hasn't resolved the issue. > > > > With each test I took down the active MDS node and monitored how long the > > two v3.10 clients and the v4.17 client lost the ceph mount for. There > > wasn't much difference between them i.e. the v3.10 clients lost the mount > > for between 0 and 21 seconds and the v4.17 client for between 0 and 16 > > seconds. Sometimes each node lost the mount at different times i.e. seconds > > apart. Other times, 2 nodes would lose and recover the mount at exactly the > > same time and the third node would lose/recover some time later. > > > > We are novices with Ceph so are not really sure what we should expect from > > it regarding resilience i.e. is it normal for clients to lose the mount > > point for a period of time and if so, how long should we consider an > > abnormal period. > > So with the more recent kernel you're finding the clients do reliably > reconnect, there's just some variation in the time it takes? Or are you > still losing some clients entirely? > > John > > > > > > William Lawton > > > > -----Original Message----- > > From: John Spray <jsp...@redhat.com> > > Sent: Tuesday, July 31, 2018 11:17 AM > > To: William Lawton <william.law...@irdeto.com> > > Cc: ceph-users@lists.ceph.com; Mark Standley > > <mark.stand...@irdeto.com> > > Subject: Re: [ceph-users] Intermittent client reconnect delay > > following node fail > > > > On Tue, Jul 31, 2018 at 12:33 AM William Lawton <william.law...@irdeto.com> > > wrote: > > > > > > Hi. > > > > > > > > > > > > We have recently setup our first ceph cluster (4 nodes) but our node > > > failure tests have revealed an intermittent problem. When we take down a > > > node (i.e. by powering it off) most of the time all clients reconnect to > > > the cluster within milliseconds, but occasionally it can take them 30 > > > seconds or more. All clients are Centos7 instances and have the ceph > > > cluster mount point configured in /etc/fstab as follows: > > > > The first thing I'd do is make sure you've got recent client code -- > > there are backports in RHEL but I'm unclear on how much of that (if > > any) makes it into centos. You may find it simpler to just install a > > recent 4.x kernel from ELRepo. Even if you don't want to use that in > > production, it would be useful to try and isolate any CephFS client issues > > you're encountering. > > > > John > > > > > > > > > > > > > > 10.18.49.35:6789,10.18.49.204:6789,10.18.49.101:6789,10.18.49.183:6789:/ > > > /mnt/ceph ceph name=admin,secretfile=/etc/ceph_key,noatime,_netdev 0 > > > 2 > > > > > > > > > > > > On rare occasions, using the ls command, we can see that a failover has > > > left a client’s /mnt/ceph directory with the following state: > > > “??????????? ? ? ? ? ? ceph”. When this occurs, we > > > think that the client has failed to connect within 45 seconds (the > > > mds_reconnect_timeout period) so the client has been evicted. We can > > > reproduce this circumstance by reducing the mds reconnect timeout down to > > > 1 second. > > > > > > > > > > > > We’d like to know why our clients sometimes struggle to reconnect after a > > > cluster node failure and how to prevent this i.e. how can we ensure that > > > all clients consistently reconnect to the cluster quickly following a > > > node failure. > > > > > > > > > > > > We are using the default configuration options. > > > > > > > > > > > > Ceph Status: > > > > > > > > > > > > cluster: > > > > > > id: ea2d9095-3deb-4482-bf6c-23229c594da4 > > > > > > health: HEALTH_OK > > > > > > > > > > > > services: > > > > > > mon: 4 daemons, quorum > > > dub-ceph-01,dub-ceph-03,dub-ceph-04,dub-ceph-02 > > > > > > mgr: dub-ceph-02(active), standbys: dub-ceph-04.ott.local, > > > dub-ceph-01, dub-ceph-03 > > > > > > mds: cephfs-1/1/1 up {0=dub-ceph-03=up:active}, 3 up:standby > > > > > > osd: 4 osds: 4 up, 4 in > > > > > > > > > > > > data: > > > > > > pools: 2 pools, 200 pgs > > > > > > objects: 2.36 k objects, 8.9 GiB > > > > > > usage: 31 GiB used, 1.9 TiB / 2.0 TiB avail > > > > > > pgs: 200 active+clean > > > > > > > > > > > > Thanks > > > > > > William Lawton > > > > > > > > > > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com