On Tue, Jul 31, 2018 at 12:33 AM William Lawton <william.law...@irdeto.com> wrote: > > Hi. > > > > We have recently setup our first ceph cluster (4 nodes) but our node failure > tests have revealed an intermittent problem. When we take down a node (i.e. > by powering it off) most of the time all clients reconnect to the cluster > within milliseconds, but occasionally it can take them 30 seconds or more. > All clients are Centos7 instances and have the ceph cluster mount point > configured in /etc/fstab as follows:
The first thing I'd do is make sure you've got recent client code -- there are backports in RHEL but I'm unclear on how much of that (if any) makes it into centos. You may find it simpler to just install a recent 4.x kernel from ELRepo. Even if you don't want to use that in production, it would be useful to try and isolate any CephFS client issues you're encountering. John > > > > 10.18.49.35:6789,10.18.49.204:6789,10.18.49.101:6789,10.18.49.183:6789:/ > /mnt/ceph ceph name=admin,secretfile=/etc/ceph_key,noatime,_netdev 0 > 2 > > > > On rare occasions, using the ls command, we can see that a failover has left > a client’s /mnt/ceph directory with the following state: “??????????? ? ? > ? ? ? ceph”. When this occurs, we think that the client has > failed to connect within 45 seconds (the mds_reconnect_timeout period) so the > client has been evicted. We can reproduce this circumstance by reducing the > mds reconnect timeout down to 1 second. > > > > We’d like to know why our clients sometimes struggle to reconnect after a > cluster node failure and how to prevent this i.e. how can we ensure that all > clients consistently reconnect to the cluster quickly following a node > failure. > > > > We are using the default configuration options. > > > > Ceph Status: > > > > cluster: > > id: ea2d9095-3deb-4482-bf6c-23229c594da4 > > health: HEALTH_OK > > > > services: > > mon: 4 daemons, quorum dub-ceph-01,dub-ceph-03,dub-ceph-04,dub-ceph-02 > > mgr: dub-ceph-02(active), standbys: dub-ceph-04.ott.local, dub-ceph-01, > dub-ceph-03 > > mds: cephfs-1/1/1 up {0=dub-ceph-03=up:active}, 3 up:standby > > osd: 4 osds: 4 up, 4 in > > > > data: > > pools: 2 pools, 200 pgs > > objects: 2.36 k objects, 8.9 GiB > > usage: 31 GiB used, 1.9 TiB / 2.0 TiB avail > > pgs: 200 active+clean > > > > Thanks > > William Lawton > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com