Re: [ceph-users] CephFS unresponsive at scale (2M files,

Kevin Sumner Mon, 17 Nov 2014 16:07:09 -0800

> On Nov 17, 2014, at 15:52, Sage Weil <s...@newdream.net> wrote:
> 
> On Mon, 17 Nov 2014, Kevin Sumner wrote:
>> I?ve got a test cluster together with a ~500 OSDs and, 5 MON, and 1 MDS.  All
>> the OSDs also mount CephFS at /ceph.  I?ve got Graphite pointing at a space
>> under /ceph.  Over the weekend, I drove almost 2 million metrics, each of
>> which creates a ~3MB file in a hierarchical path, each sending a datapoint
>> into the metric file once a minute.  CephFS seemed to handle the writes ok
>> while I was driving load.  All files containing each metric are at paths
>> like this:
>> /ceph/whisper/sandbox/cephtest-osd0013/2/3/4/5.wsp
>> 
>> Today, however, with the load generator still running, reading metadata of
>> files (e.g. directory entries and stat(2) info) in the filesystem
>> (presumably MDS-managed data) seems nearly impossible, especially deeper
>> into the tree.  For example, in a shell cd seems to work but ls hangs,
>> seemingly indefinitely.  After turning off the load generator and allowing a
>> while for things to settle down, everything seems to behave better.
>> 
>> ceph status and ceph health both return good statuses the entire time.
>>  During load generation, the ceph-mds process seems pegged at between 100%
>> and 150%, but with load generation turned off, the process has some high
>> variability from near-idle up to similar 100-150% CPU.
>> 
>> Hopefully, I?ve missed something in the CephFS tuning.  However, I?m looking 
>> for
>> direction on figuring out if it is, indeed, a tuning problem or if this
>> behavior is a symptom of the ?not ready for production? banner in the
>> documentation.
> 
> My first guess is that the MDS cache is just too small and it is 
> thrashing.  Try
> 
> ceph mds tell 0 injectargs '--mds-cache-size 1000000'
> 
> That's 10x bigger than the default, tho be aware that it will eat up 10x 
> as much RAM too.
> 
> We've also seen teh cache behave in a non-optimal way when evicting 
> things, making it thrash more often than it should.  I'm hoping we can 
> implement something like MQ instead of our two-level LRU, but it isn't 
> high on the priority list right now.
> 
> sage



Thanks!  I’ll pursue mds cache size tuning.  Is there any guidance on setting 
the cache and other mds tunables correctly, or is it an adjust-and-test sort of 
thing?  Cursory searching doesn’t return any relevant documentation for 
ceph.com <http://ceph.com/>.  I’m plowing through some other list posts now.
--
Kevin Sumner
ke...@sumner.io

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS unresponsive at scale (2M files,

Reply via email to