Okay, here's what I've got: https://www.paste.ie/view/abe8c712
Of note, I've changed things up a little bit for the moment. I've activated a second mds to see if it is a particular subtree that is more prone to issues. maybe EC vs replica... The one that is currently being slow has my EC volume pinned to it. -- Adam On Mon, Oct 1, 2018 at 10:02 PM Gregory Farnum <gfar...@redhat.com> wrote: > > Can you grab the perf dump during this time, perhaps plus dumps of the ops in > progress? > > This is weird but given it’s somewhat periodic it might be something like the > MDS needing to catch up on log trimming (though I’m unclear why changing the > cache size would impact this). > > On Sun, Sep 30, 2018 at 9:02 PM Adam Tygart <mo...@ksu.edu> wrote: >> >> Hello all, >> >> I've got a ceph (12.2.8) cluster with 27 servers, 500 osds, and 1000 >> cephfs mounts (kernel client). We're currently only using 1 active >> mds. >> >> Performance is great about 80% of the time. MDS responses (per ceph >> daemonperf mds.$(hostname -s), indicates 2k-9k requests per second, >> with a latency under 100. >> >> It is the other 20ish percent I'm worried about. I'll check on it and >> it with be going 5-15 seconds with "0" requests, "0" latency, then >> give me 2 seconds of reasonable response times, and then back to >> nothing. Clients are actually seeing blocked requests for this period >> of time. >> >> The strange bit is that when I *reduce* the mds_cache_size, requests >> and latencies go back to normal for a while. When it happens again, >> I'll increase it back to where it was. It feels like the mds server >> decides that some of these inodes can't be dropped from the cache >> unless the cache size changes. Maybe something wrong with the LRU? >> >> I feel like I've got a reasonable cache size for my workload, 30GB on >> the small end, 55GB on the large. No real reason for a swing this >> large except to potentially delay it recurring after expansion for >> longer. >> >> I also feel like there is probably some magic tunable to change how >> inodes get stuck in the LRU. perhaps mds_cache_mid. Anyone know what >> this tunable actually does? The documentation is a little sparse. >> >> I can grab logs from the mds if needed, just let me know the settings >> you'd like to see. >> >> -- >> Adam >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com