Okay, here's what I've got: https://www.paste.ie/view/abe8c712

Of note, I've changed things up a little bit for the moment. I've
activated a second mds to see if it is a particular subtree that is
more prone to issues. maybe EC vs replica... The one that is currently
being slow has my EC volume pinned to it.

--
Adam
On Mon, Oct 1, 2018 at 10:02 PM Gregory Farnum <gfar...@redhat.com> wrote:
>
> Can you grab the perf dump during this time, perhaps plus dumps of the ops in 
> progress?
>
> This is weird but given it’s somewhat periodic it might be something like the 
> MDS needing to catch up on log trimming (though I’m unclear why changing the 
> cache size would impact this).
>
> On Sun, Sep 30, 2018 at 9:02 PM Adam Tygart <mo...@ksu.edu> wrote:
>>
>> Hello all,
>>
>> I've got a ceph (12.2.8) cluster with 27 servers, 500 osds, and 1000
>> cephfs mounts (kernel client). We're currently only using 1 active
>> mds.
>>
>> Performance is great about 80% of the time. MDS responses (per ceph
>> daemonperf mds.$(hostname -s), indicates 2k-9k requests per second,
>> with a latency under 100.
>>
>> It is the other 20ish percent I'm worried about. I'll check on it and
>> it with be going 5-15 seconds with "0" requests, "0" latency, then
>> give me 2 seconds of reasonable response times, and then back to
>> nothing. Clients are actually seeing blocked requests for this period
>> of time.
>>
>> The strange bit is that when I *reduce* the mds_cache_size, requests
>> and latencies go back to normal for a while. When it happens again,
>> I'll increase it back to where it was. It feels like the mds server
>> decides that some of these inodes can't be dropped from the cache
>> unless the cache size changes. Maybe something wrong with the LRU?
>>
>> I feel like I've got a reasonable cache size for my workload, 30GB on
>> the small end, 55GB on the large. No real reason for a swing this
>> large except to potentially delay it recurring after expansion for
>> longer.
>>
>> I also feel like there is probably some magic tunable to change how
>> inodes get stuck in the LRU. perhaps mds_cache_mid. Anyone know what
>> this tunable actually does? The documentation is a little sparse.
>>
>> I can grab logs from the mds if needed, just let me know the settings
>> you'd like to see.
>>
>> --
>> Adam
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to