On Tue, Oct 2, 2018 at 12:18 PM Thomas Sumpter <thomas.sump...@irdeto.com> wrote:
> Hi Folks, > > > > I am looking for advice on how to troubleshoot some long operations found > in MDS. Most of the time performance is fantastic, but occasionally and to > no real pattern or trend, a gettattr op will take up to ~30 seconds to > complete in MDS which is stuck on "event": "failed to rdlock, waiting" > > > > E.g. > > "description": "client_request(client.84183:54794012 getattr pAsLsXsFs > #0x10000038585 2018-10-02 07:56:27.554282 caller_uid=48, caller_gid=48{})", > > "duration": 28.987992, > > { > > "time": "2018-09-25 07:56:27.552511", > > "event": "failed to rdlock, waiting" > > }, > > { > > "time": "2018-09-25 07:56:56.529748", > > "event": "failed to rdlock, waiting" > > }, > > { > > "time": "2018-09-25 07:56:56.540386", > > "event": "acquired locks" > > } > > > > I can find no corresponding long op on any of the OSDs and no other op in > MDS which this one could be waiting for. > > Nearly all configuration will be the default. Currently have a small > amount of data which is constantly being updated. 1 data pool and 1 > metadata pool. > > How can I track down what is holding up this op and try to stop it > happening? > This is a weakness in the MDS introspection right now, unfortunately. What the error message literally means is what it says — the op needs to get a read lock, but it can't, so it's waiting. This might mean that there's an MDS op in progress, but it usually means there's a client which is holding "write capabilities" on the inode in question, and it's asking for/waiting for that client to drop those capabilities. This might take a while because of a buggy client, or because the client had a very large amount of buffered writes it is now frantically trying to flush out to RADOS as fast as it can. -Greg > > > # rados df > > … > > total_objects 191 > > total_used 5.7 GiB > > total_avail 367 GiB > > total_space 373 GiB > > > > > > Cephfs version 13.2.1 on CentOs 7.5 > > Kernel: 3.10.0-862.11.6.el7.x86_64 > > 1x Active MDS, 1x Replay Standby MDS > > 3x MON > > 4x OSD > > Bluestore FS > > > > Ceph kernel client on CentOs 7.4 > > Kernel: 4.18.7-1.el7.elrepo.x86_64 (almost the latest, should be good?) > > > > Many Thanks! > > Tom > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com