On Tue, Oct 2, 2018 at 12:18 PM Thomas Sumpter <thomas.sump...@irdeto.com>
wrote:

> Hi Folks,
>
>
>
> I am looking for advice on how to troubleshoot some long operations found
> in MDS. Most of the time performance is fantastic, but occasionally and to
> no real pattern or trend, a gettattr op will take up to ~30 seconds to
> complete in MDS which is stuck on "event": "failed to rdlock, waiting"
>
>
>
> E.g.
>
> "description": "client_request(client.84183:54794012 getattr pAsLsXsFs
> #0x10000038585 2018-10-02 07:56:27.554282 caller_uid=48, caller_gid=48{})",
>
> "duration": 28.987992,
>
> {
>
> "time": "2018-09-25 07:56:27.552511",
>
> "event": "failed to rdlock, waiting"
>
> },
>
> {
>
> "time": "2018-09-25 07:56:56.529748",
>
> "event": "failed to rdlock, waiting"
>
> },
>
> {
>
> "time": "2018-09-25 07:56:56.540386",
>
> "event": "acquired locks"
>
> }
>
>
>
> I can find no corresponding long op on any of the OSDs and no other op in
> MDS which this one could be waiting for.
>
> Nearly all configuration will be the default. Currently have a small
> amount of data which is constantly being updated. 1 data pool and 1
> metadata pool.
>
> How can I track down what is holding up this op and try to stop it
> happening?
>

This is a weakness in the MDS introspection right now, unfortunately.

What the error message literally means is what it says — the op needs to
get a read lock, but it can't, so it's waiting. This might mean that
there's an MDS op in progress, but it usually means there's a client which
is holding "write capabilities" on the inode in question, and it's asking
for/waiting for that client to drop those capabilities.

This might take a while because of a buggy client, or because the client
had a very large amount of buffered writes it is now frantically trying to
flush out to RADOS as fast as it can.
-Greg


>
>
> # rados df
>
> …
>
> total_objects    191
>
> total_used       5.7 GiB
>
> total_avail      367 GiB
>
> total_space      373 GiB
>
>
>
>
>
> Cephfs version 13.2.1 on CentOs 7.5
>
> Kernel: 3.10.0-862.11.6.el7.x86_64
>
> 1x Active MDS, 1x Replay Standby MDS
>
> 3x MON
>
> 4x OSD
>
> Bluestore FS
>
>
>
> Ceph kernel client on CentOs 7.4
>
> Kernel: 4.18.7-1.el7.elrepo.x86_64  (almost the latest, should be good?)
>
>
>
> Many Thanks!
>
> Tom
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to