Re: [ceph-users] MDS performance issue

Yan, Zheng Sun, 27 Jan 2019 21:03:38 -0800

On Mon, Jan 28, 2019 at 10:34 AM Albert Yue <transuranium....@gmail.com> wrote:
>
> Hi Yan Zheng,
>
> Our clients are also complaining about operations like 'du' or 'ncdu' being 
> very slow. Is there any alternative tool for such kind of operation on 
> CephFS? Thanks!
>


'du' traverse whole directory tree to calculate size. ceph supports
recursive stat. google it for detail

> Best regards,
> Albert
>
> On Wed, Jan 23, 2019 at 11:04 AM Yan, Zheng <uker...@gmail.com> wrote:
>>
>> On Wed, Jan 23, 2019 at 10:02 AM Albert Yue <transuranium....@gmail.com> 
>> wrote:
>> >
>> > But with enough memory on MDS, I can just cache all metadata into memory. 
>> > Right now there are around 500GB metadata in the ssd. So this is not 
>> > enough?
>> >
>>
>> mds needs to tracking lots of extra information for each object. For
>> 500G metadata, mds may need 1T or more memory.
>>
>> > On Tue, Jan 22, 2019 at 5:48 PM Yan, Zheng <uker...@gmail.com> wrote:
>> >>
>> >> On Tue, Jan 22, 2019 at 10:49 AM Albert Yue <transuranium....@gmail.com> 
>> >> wrote:
>> >> >
>> >> > Hi Yan Zheng,
>> >> >
>> >> > In your opinion, can we resolve this issue by move MDS to a 512GB or 
>> >> > 1TB memory machine?
>> >> >
>> >>
>> >> The problem is from client side, especially clients with large memory.
>> >> I don't think enlarge mds cache size is good idea. you can
>> >> periodically check periodically
>> >> each kernel clients' /sys/kernel/debug/ceph/xxx/caps. run 'echo 2
>> >> >/proc/sys/vm/drop_caches' if a client used too many caps (for example
>> >> 10k),
>> >>
>> >> > On Mon, Jan 21, 2019 at 10:49 PM Yan, Zheng <uker...@gmail.com> wrote:
>> >> >>
>> >> >> On Mon, Jan 21, 2019 at 11:16 AM Albert Yue 
>> >> >> <transuranium....@gmail.com> wrote:
>> >> >> >
>> >> >> > Dear Ceph Users,
>> >> >> >
>> >> >> > We have set up a cephFS cluster with 6 osd machines, each with 16 
>> >> >> > 8TB harddisk. Ceph version is luminous 12.2.5. We created one data 
>> >> >> > pool with these hard disks and created another meta data pool with 3 
>> >> >> > ssd. We created a MDS with 65GB cache size.
>> >> >> >
>> >> >> > But our users are keep complaining that cephFS is too slow. What we 
>> >> >> > observed is cephFS is fast when we switch to a new MDS instance, 
>> >> >> > once the cache fills up (which will happen very fast), client became 
>> >> >> > very slow when performing some basic filesystem operation such as 
>> >> >> > `ls`.
>> >> >> >
>> >> >>
>> >> >> It seems that clients hold lots of unused inodes their icache, which
>> >> >> prevent mds from trimming corresponding objects from its cache.  mimic
>> >> >> has command "ceph daemon mds.x cache drop" to ask client to drop its
>> >> >> cache. I'm also working on a patch that make kclient client release
>> >> >> unused inodes.
>> >> >>
>> >> >> For luminous,  there is not much we can do, except periodically run
>> >> >> "echo 2 > /proc/sys/vm/drop_caches"  on each client.
>> >> >>
>> >> >>
>> >> >> > What we know is our user are putting lots of small files into the 
>> >> >> > cephFS, now there are around 560 Million files. We didn't see high 
>> >> >> > CPU wait on MDS instance and meta data pool just used around 200MB 
>> >> >> > space.
>> >> >> >
>> >> >> > My question is, what is the relationship between the metadata pool 
>> >> >> > and MDS? Is this performance issue caused by the hardware behind 
>> >> >> > meta data pool? Why the meta data pool only used 200MB space, and we 
>> >> >> > saw 3k iops on each of these three ssds, why can't MDS cache all 
>> >> >> > these 200MB into memory?
>> >> >> >
>> >> >> > Thanks very much!
>> >> >> >
>> >> >> >
>> >> >> > Best Regards,
>> >> >> >
>> >> >> > Albert
>> >> >> >
>> >> >> > _______________________________________________
>> >> >> > ceph-users mailing list
>> >> >> > ceph-users@lists.ceph.com
>> >> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS performance issue

Reply via email to