Thanks for the replies.
I'll move all our testbed installation to Luminous and redo the tests.

Cheers,
Tyanko

On 17 October 2017 at 10:14, Yan, Zheng <uker...@gmail.com> wrote:

> On Tue, Oct 17, 2017 at 1:07 AM, Tyanko Aleksiev
> <tyanko.alex...@gmail.com> wrote:
> > Hi,
> >
> > At UZH we are currently evaluating cephfs as a distributed file system
> for
> > the scratch space of an HPC installation. Some slow down of the metadata
> > operations seems to occur under certain circumstances. In particular,
> > commands issued after some big file deletion could take several seconds.
> >
> > Example:
> >
> > dd bs=$((1024*1024*128)) count=2048 if=/dev/zero of=./dd-test
> > 274877906944 bytes (275 GB, 256 GiB) copied, 224.798 s, 1.2 GB/s
> >
> > dd bs=$((1024*1024*128)) count=2048 if=./dd-test of=./dd-test2
> > 274877906944 bytes (275 GB, 256 GiB) copied, 1228.87 s, 224 MB/s
> >
> > ls; time rm dd-test2 ; time ls
> > dd-test  dd-test2
> >
> > real    0m0.004s
> > user    0m0.000s
> > sys     0m0.000s
> > dd-test
> >
> > real    0m8.795s
> > user    0m0.000s
> > sys     0m0.000s
> >
> > Additionally, the time it takes to complete the "ls" command appears to
> be
> > proportional to the size of the deleted file. The issue described above
> is
> > not limited to "ls" but extends to other commands:
> >
> > ls ; time rm dd-test2 ; time du -hs ./*
> > dd-test  dd-test2
> >
> > real    0m0.003s
> > user    0m0.000s
> > sys     0m0.000s
> > 128G    ./dd-test
> >
> > real    0m9.974s
> > user    0m0.000s
> > sys     0m0.000s
> >
> > What might be causing this behavior and eventually how could we improve
> it?
> >
>
> Seems like mds was waiting for journal flush, it can wait up to
> 'mds_tick_interval'. This issue should be fix in  luminous release.
>
> Regards
> Yan, Zheng
>
> > Setup:
> >
> > - ceph version: 10.2.9, OS: Ubuntu 16.04, kernel: 4.8.0-58-generic,
> > - 3 monitors,
> > - 1 mds,
> > - 3 storage nodes with 24 X 4TB disks on each node: 1 OSD/disk (72 OSDs
> in
> > total). 4TB disks are used for the cephfs_data pool. Journaling is on
> SSDs,
> > - we installed an 400GB NVMe disk on each storage node and aggregated the
> > tree disks in crush rule. cephfs_metadata pool was then created using
> that
> > rule and therefore is hosted on the NVMes. Journaling and data are on the
> > same partition here.
> >
> > So far we are using the default ceph configuration settings.
> >
> > Clients are mounting the file system with the kernel driver using the
> > following options (again default):
> > "rw,noatime,name=admin,secret=<hidden>,acl,_netdev".
> >
> > Thank you in advance for the help.
> >
> > Cheers,
> > Tyanko
> >
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to