Re: [ceph-users] CephFS unresponsive at scale (2M files,

Kevin Sumner Wed, 19 Nov 2014 11:59:48 -0800

Making mds cache size 5 million seems to have helped significantly, but we’re 
still seeing issues occasionally on metadata reads while under load.  Settings 
over 5 million don’t seem to have any noticeable impact on this problem.  I’m 
starting the upgrade to Giant today.
--
Kevin Sumner
ke...@sumner.io


> On Nov 18, 2014, at 1:10 PM, Kevin Sumner <ke...@sumner.io> wrote:
> 
> Hi Thomas,
> 
> I looked over the mds config reference a bit yesterday, but mds cache size 
> seems to be the most relevant tunable.
> 
> As suggested, I upped mds-cache-size to 1 million yesterday and started the 
> load generator.  During load generation, we’re seeing similar behavior on the 
> filesystem and the mds.  The mds process is running a little hotter now with 
> higher CPU average and 11GB resident size (was just under 10GB iirc).  
> Enumerating files on the filesystem, e.g., with ls, is still hanging though.
> 
> With load generation disabled, the behavior is the same as before, i.e., 
> things work ask expected.
> 
> I’ve got a lot of memory and CPU headroom on the box hosting the mds, so 
> unless there’s good reason not to, I’m to continue increasing the mds cache 
> iteratively in the hopes of finding a size that produces good behavior.  
> Right now, I’d expect us to hit around 2 million inodes each minute, so cache 
> at 1 million is still undersized.  If that doesn’t work, we’re running 
> Firefly on the cluster currently and I’ll be upgrading it to Giant.
> --
> Kevin Sumner
> ke...@sumner.io <mailto:ke...@sumner.io>
> 
> 
> 
>> On Nov 18, 2014, at 1:36 AM, Thomas Lemarchand 
>> <thomas.lemarch...@cloud-solutions.fr 
>> <mailto:thomas.lemarch...@cloud-solutions.fr>> wrote:
>> 
>> Hi Kevin,
>> 
>> There are every (I think) MDS tunables listed on this page with a short
>> description : http://ceph.com/docs/master/cephfs/mds-config-ref/ 
>> <http://ceph.com/docs/master/cephfs/mds-config-ref/>
>> 
>> Can you tell us how your cluster behave after the mds-cache-size
>> change ? What is your MDS ram consumption, before and after ?
>> 
>> Thanks !
>> -- 
>> Thomas Lemarchand
>> Cloud Solutions SAS - Responsable des systèmes d'information
>> 
>> 
>> 
>> On lun., 2014-11-17 at 16:06 -0800, Kevin Sumner wrote:
>>>> On Nov 17, 2014, at 15:52, Sage Weil <s...@newdream.net 
>>>> <mailto:s...@newdream.net>> wrote:
>>>> 
>>>> On Mon, 17 Nov 2014, Kevin Sumner wrote:
>>>>> I?ve got a test cluster together with a ~500 OSDs and, 5 MON, and
>>>>> 1 MDS.  All
>>>>> the OSDs also mount CephFS at /ceph.  I?ve got Graphite pointing
>>>>> at a space
>>>>> under /ceph.  Over the weekend, I drove almost 2 million metrics,
>>>>> each of
>>>>> which creates a ~3MB file in a hierarchical path, each sending a
>>>>> datapoint
>>>>> into the metric file once a minute.  CephFS seemed to handle the
>>>>> writes ok
>>>>> while I was driving load.  All files containing each metric are at
>>>>> paths
>>>>> like this:
>>>>> /ceph/whisper/sandbox/cephtest-osd0013/2/3/4/5.wsp
>>>>> 
>>>>> Today, however, with the load generator still running, reading
>>>>> metadata of
>>>>> files (e.g. directory entries and stat(2) info) in the filesystem
>>>>> (presumably MDS-managed data) seems nearly impossible, especially
>>>>> deeper
>>>>> into the tree.  For example, in a shell cd seems to work but
>>>>> ls hangs,
>>>>> seemingly indefinitely.  After turning off the load generator and
>>>>> allowing a
>>>>> while for things to settle down, everything seems to behave
>>>>> better.
>>>>> 
>>>>> ceph status and ceph health both return good statuses the entire
>>>>> time.
>>>>> During load generation, the ceph-mds process seems pegged at
>>>>> between 100%
>>>>> and 150%, but with load generation turned off, the process has
>>>>> some high
>>>>> variability from near-idle up to similar 100-150% CPU.
>>>>> 
>>>>> Hopefully, I?ve missed something in the CephFS tuning.  However,
>>>>> I?m looking for
>>>>> direction on figuring out if it is, indeed, a tuning problem or if
>>>>> this
>>>>> behavior is a symptom of the ?not ready for production? banner in
>>>>> the
>>>>> documentation.
>>>> 
>>>> My first guess is that the MDS cache is just too small and it is 
>>>> thrashing.  Try
>>>> 
>>>> ceph mds tell 0 injectargs '--mds-cache-size 1000000'
>>>> 
>>>> That's 10x bigger than the default, tho be aware that it will eat up
>>>> 10x 
>>>> as much RAM too.
>>>> 
>>>> We've also seen teh cache behave in a non-optimal way when evicting 
>>>> things, making it thrash more often than it should.  I'm hoping we
>>>> can 
>>>> implement something like MQ instead of our two-level LRU, but it
>>>> isn't 
>>>> high on the priority list right now.
>>>> 
>>>> sage
>>> 
>>> 
>>> Thanks!  I’ll pursue mds cache size tuning.  Is there any guidance on
>>> setting the cache and other mds tunables correctly, or is it an
>>> adjust-and-test sort of thing?  Cursory searching doesn’t return any
>>> relevant documentation for ceph.com <http://ceph.com/>.  I’m plowing 
>>> through some other
>>> list posts now.
>>> --
>>> Kevin Sumner
>>> ke...@sumner.io <mailto:ke...@sumner.io>
>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> This message has been scanned for viruses and 
>>> dangerous content by MailScanner, and is 
>>> believed to be clean. 
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com 
>>> <http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
>> 
>> 
>> -- 
>> This message has been scanned for viruses and
>> dangerous content by MailScanner, and is
>> believed to be clean.
> 
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS unresponsive at scale (2M files,

Reply via email to