Re: [ceph-users] CephFS unresponsive at scale (2M files,

Kevin Sumner Tue, 18 Nov 2014 13:11:08 -0800

Hi Thomas,

I looked over the mds config reference a bit yesterday, but mds cache size 
seems to be the most relevant tunable.


As suggested, I upped mds-cache-size to 1 million yesterday and started the 
load generator.  During load generation, we’re seeing similar behavior on the 
filesystem and the mds.  The mds process is running a little hotter now with 
higher CPU average and 11GB resident size (was just under 10GB iirc).  
Enumerating files on the filesystem, e.g., with ls, is still hanging though.

With load generation disabled, the behavior is the same as before, i.e., things 
work ask expected.

I’ve got a lot of memory and CPU headroom on the box hosting the mds, so unless 
there’s good reason not to, I’m to continue increasing the mds cache 
iteratively in the hopes of finding a size that produces good behavior.  Right 
now, I’d expect us to hit around 2 million inodes each minute, so cache at 1 
million is still undersized.  If that doesn’t work, we’re running Firefly on 
the cluster currently and I’ll be upgrading it to Giant.
--
Kevin Sumner
ke...@sumner.io



> On Nov 18, 2014, at 1:36 AM, Thomas Lemarchand 
> <thomas.lemarch...@cloud-solutions.fr> wrote:
> 
> Hi Kevin,
> 
> There are every (I think) MDS tunables listed on this page with a short
> description : http://ceph.com/docs/master/cephfs/mds-config-ref/ 
> <http://ceph.com/docs/master/cephfs/mds-config-ref/>
> 
> Can you tell us how your cluster behave after the mds-cache-size
> change ? What is your MDS ram consumption, before and after ?
> 
> Thanks !
> -- 
> Thomas Lemarchand
> Cloud Solutions SAS - Responsable des systèmes d'information
> 
> 
> 
> On lun., 2014-11-17 at 16:06 -0800, Kevin Sumner wrote:
>>> On Nov 17, 2014, at 15:52, Sage Weil <s...@newdream.net> wrote:
>>> 
>>> On Mon, 17 Nov 2014, Kevin Sumner wrote:
>>>> I?ve got a test cluster together with a ~500 OSDs and, 5 MON, and
>>>> 1 MDS.  All
>>>> the OSDs also mount CephFS at /ceph.  I?ve got Graphite pointing
>>>> at a space
>>>> under /ceph.  Over the weekend, I drove almost 2 million metrics,
>>>> each of
>>>> which creates a ~3MB file in a hierarchical path, each sending a
>>>> datapoint
>>>> into the metric file once a minute.  CephFS seemed to handle the
>>>> writes ok
>>>> while I was driving load.  All files containing each metric are at
>>>> paths
>>>> like this:
>>>> /ceph/whisper/sandbox/cephtest-osd0013/2/3/4/5.wsp
>>>> 
>>>> Today, however, with the load generator still running, reading
>>>> metadata of
>>>> files (e.g. directory entries and stat(2) info) in the filesystem
>>>> (presumably MDS-managed data) seems nearly impossible, especially
>>>> deeper
>>>> into the tree.  For example, in a shell cd seems to work but
>>>> ls hangs,
>>>> seemingly indefinitely.  After turning off the load generator and
>>>> allowing a
>>>> while for things to settle down, everything seems to behave
>>>> better.
>>>> 
>>>> ceph status and ceph health both return good statuses the entire
>>>> time.
>>>> During load generation, the ceph-mds process seems pegged at
>>>> between 100%
>>>> and 150%, but with load generation turned off, the process has
>>>> some high
>>>> variability from near-idle up to similar 100-150% CPU.
>>>> 
>>>> Hopefully, I?ve missed something in the CephFS tuning.  However,
>>>> I?m looking for
>>>> direction on figuring out if it is, indeed, a tuning problem or if
>>>> this
>>>> behavior is a symptom of the ?not ready for production? banner in
>>>> the
>>>> documentation.
>>> 
>>> My first guess is that the MDS cache is just too small and it is 
>>> thrashing.  Try
>>> 
>>> ceph mds tell 0 injectargs '--mds-cache-size 1000000'
>>> 
>>> That's 10x bigger than the default, tho be aware that it will eat up
>>> 10x 
>>> as much RAM too.
>>> 
>>> We've also seen teh cache behave in a non-optimal way when evicting 
>>> things, making it thrash more often than it should.  I'm hoping we
>>> can 
>>> implement something like MQ instead of our two-level LRU, but it
>>> isn't 
>>> high on the priority list right now.
>>> 
>>> sage
>> 
>> 
>> Thanks!  I’ll pursue mds cache size tuning.  Is there any guidance on
>> setting the cache and other mds tunables correctly, or is it an
>> adjust-and-test sort of thing?  Cursory searching doesn’t return any
>> relevant documentation for ceph.com.  I’m plowing through some other
>> list posts now.
>> --
>> Kevin Sumner
>> ke...@sumner.io
>> 
>> 
>> 
>> 
>> -- 
>> This message has been scanned for viruses and 
>> dangerous content by MailScanner, and is 
>> believed to be clean. 
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
> 
> 
> -- 
> This message has been scanned for viruses and
> dangerous content by MailScanner, and is
> believed to be clean.

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS unresponsive at scale (2M files,

Reply via email to