Re: [ceph-users] Intermittent client reconnect delay following node fail

John Spray Thu, 23 Aug 2018 09:49:26 -0700

On Thu, Aug 23, 2018 at 3:01 PM William Lawton
<william.law...@irdeto.com> wrote:
>
> Hi John.
>
> Just picking up this thread again after coming back from leave. Our ceph 
> storage project has progressed and we are now making sure that the active MON 
> and MDS are kept on separate nodes which has helped reduce the incidence of 
> delayed client reconnects on ceph node failure. We've also disabled client 
> blacklisting which has prevented late clients from being permanently 
> disconnected. However, we still have occasional slow client reconnects if we 
> lose the active MON and MDS nodes at the same time (i.e. an AWS AZ failure 
> scenario). We would love to irradiate these slow reconnects entirely ideally. 
> One other thing we've noticed with our resiliency tests is that when we bring 
> down a MON node, there is always a MON re-election triggered, even if the 
> stopped MON node was not the leader. Do you know if there is a way to 
> configure ceph so that there is only a MON re-election if the current MON 
> leader is lost?


Hmm, I'm not sure exactly what the bounds are meant to be on how long
the mon cluster takes to recover from a peon failure.  However, if the
elections are taking an unreasonably long time, that would certainly
be a viable explanation for the strange reconnect behaviour -- if the
FSMap is being updated, and most clients see it, but a few don't see
it until after an election perhaps.

John

>
> Thanks
>
> William Lawton
>
> -----Original Message-----
> From: William Lawton
> Sent: Wednesday, August 01, 2018 2:05 PM
> To: 'John Spray' <jsp...@redhat.com>
> Cc: ceph-users@lists.ceph.com; Mark Standley <mark.stand...@irdeto.com>
> Subject: RE: [ceph-users] Intermittent client reconnect delay following node 
> fail
>
> I didn't lose any clients this time around, all clients reconnected within at 
> most 21 seconds. We think the very long client disconnections occurred when 
> both the mgr and mds were active on the failed node, which was not the case 
> for any of my recent 10 tests. We have noticed in the client logs like the 
> following:
>
> Aug  1 10:39:06 dub-ditv-sim-goldenimage kernel: libceph: mon0 
> 10.18.49.35:6789 session lost, hunting for new mon
>
> We're currently exploring whether keeping the mds and mon daemons on separate 
> servers has less impact on the client when either one is lost.
>
> William Lawton
>
> -----Original Message-----
> From: John Spray <jsp...@redhat.com>
> Sent: Wednesday, August 01, 2018 1:14 PM
> To: William Lawton <william.law...@irdeto.com>
> Cc: ceph-users@lists.ceph.com; Mark Standley <mark.stand...@irdeto.com>
> Subject: Re: [ceph-users] Intermittent client reconnect delay following node 
> fail
>
> On Wed, Aug 1, 2018 at 12:09 PM William Lawton <william.law...@irdeto.com> 
> wrote:
> >
> > Thanks for the advice John.
> >
> > Our CentOS 7 clients use linux kernel v3.10 so I upgraded one of them to 
> > use v4.17 and have run 10 more node fail tests. Unfortunately, the kernel 
> > upgrade on the client hasn't resolved the issue.
> >
> > With each test I took down the active MDS node and monitored how long the 
> > two v3.10 clients and the v4.17 client lost the ceph mount for. There 
> > wasn't much difference between them i.e. the v3.10 clients lost the mount 
> > for between 0 and 21 seconds and the v4.17 client for between 0 and 16 
> > seconds. Sometimes each node lost the mount at different times i.e. seconds 
> > apart. Other times, 2 nodes would lose and recover the mount at exactly the 
> > same time and the third node would lose/recover some time later.
> >
> > We are novices with Ceph so are not really sure what we should expect from 
> > it regarding resilience i.e. is it normal for clients to lose the mount 
> > point for a period of time and if so, how long should we consider an 
> > abnormal period.
>
> So with the more recent kernel you're finding the clients do reliably 
> reconnect, there's just some variation in the time it takes?  Or are you 
> still losing some clients entirely?
>
> John
>
>
> >
> > William Lawton
> >
> > -----Original Message-----
> > From: John Spray <jsp...@redhat.com>
> > Sent: Tuesday, July 31, 2018 11:17 AM
> > To: William Lawton <william.law...@irdeto.com>
> > Cc: ceph-users@lists.ceph.com; Mark Standley
> > <mark.stand...@irdeto.com>
> > Subject: Re: [ceph-users] Intermittent client reconnect delay
> > following node fail
> >
> > On Tue, Jul 31, 2018 at 12:33 AM William Lawton <william.law...@irdeto.com> 
> > wrote:
> > >
> > > Hi.
> > >
> > >
> > >
> > > We have recently setup our first ceph cluster (4 nodes) but our node 
> > > failure tests have revealed an intermittent problem. When we take down a 
> > > node (i.e. by powering it off) most of the time all clients reconnect to 
> > > the cluster within milliseconds, but occasionally it can take them 30 
> > > seconds or more. All clients are Centos7 instances and have the ceph 
> > > cluster mount point configured in /etc/fstab as follows:
> >
> > The first thing I'd do is make sure you've got recent client code --
> > there are backports in RHEL but I'm unclear on how much of that (if
> > any) makes it into centos.  You may find it simpler to just install a 
> > recent 4.x kernel from ELRepo.  Even if you don't want to use that in 
> > production, it would be useful to try and isolate any CephFS client issues 
> > you're encountering.
> >
> > John
> >
> > >
> > >
> > >
> > > 10.18.49.35:6789,10.18.49.204:6789,10.18.49.101:6789,10.18.49.183:6789:/ 
> > > /mnt/ceph ceph name=admin,secretfile=/etc/ceph_key,noatime,_netdev    0   
> > >     2
> > >
> > >
> > >
> > > On rare occasions, using the ls command, we can see that a failover has 
> > > left a client’s /mnt/ceph directory with the following state: 
> > > “???????????  ? ?    ?       ?            ? ceph”. When this occurs, we 
> > > think that the client has failed to connect within 45 seconds (the 
> > > mds_reconnect_timeout period) so the client has been evicted. We can 
> > > reproduce this circumstance by reducing the mds reconnect timeout down to 
> > > 1 second.
> > >
> > >
> > >
> > > We’d like to know why our clients sometimes struggle to reconnect after a 
> > > cluster node failure and how to prevent this i.e. how can we ensure that 
> > > all clients consistently reconnect to the cluster quickly following a 
> > > node failure.
> > >
> > >
> > >
> > > We are using the default configuration options.
> > >
> > >
> > >
> > > Ceph Status:
> > >
> > >
> > >
> > >   cluster:
> > >
> > >     id:     ea2d9095-3deb-4482-bf6c-23229c594da4
> > >
> > >     health: HEALTH_OK
> > >
> > >
> > >
> > >   services:
> > >
> > >     mon: 4 daemons, quorum
> > > dub-ceph-01,dub-ceph-03,dub-ceph-04,dub-ceph-02
> > >
> > >     mgr: dub-ceph-02(active), standbys: dub-ceph-04.ott.local,
> > > dub-ceph-01, dub-ceph-03
> > >
> > >     mds: cephfs-1/1/1 up  {0=dub-ceph-03=up:active}, 3 up:standby
> > >
> > >     osd: 4 osds: 4 up, 4 in
> > >
> > >
> > >
> > >   data:
> > >
> > >     pools:   2 pools, 200 pgs
> > >
> > >     objects: 2.36 k objects, 8.9 GiB
> > >
> > >     usage:   31 GiB used, 1.9 TiB / 2.0 TiB avail
> > >
> > >     pgs:     200 active+clean
> > >
> > >
> > >
> > > Thanks
> > >
> > > William Lawton
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Intermittent client reconnect delay following node fail

Reply via email to