Thanks for the replies. I'll move all our testbed installation to Luminous and redo the tests.
Cheers, Tyanko On 17 October 2017 at 10:14, Yan, Zheng <uker...@gmail.com> wrote: > On Tue, Oct 17, 2017 at 1:07 AM, Tyanko Aleksiev > <tyanko.alex...@gmail.com> wrote: > > Hi, > > > > At UZH we are currently evaluating cephfs as a distributed file system > for > > the scratch space of an HPC installation. Some slow down of the metadata > > operations seems to occur under certain circumstances. In particular, > > commands issued after some big file deletion could take several seconds. > > > > Example: > > > > dd bs=$((1024*1024*128)) count=2048 if=/dev/zero of=./dd-test > > 274877906944 bytes (275 GB, 256 GiB) copied, 224.798 s, 1.2 GB/s > > > > dd bs=$((1024*1024*128)) count=2048 if=./dd-test of=./dd-test2 > > 274877906944 bytes (275 GB, 256 GiB) copied, 1228.87 s, 224 MB/s > > > > ls; time rm dd-test2 ; time ls > > dd-test dd-test2 > > > > real 0m0.004s > > user 0m0.000s > > sys 0m0.000s > > dd-test > > > > real 0m8.795s > > user 0m0.000s > > sys 0m0.000s > > > > Additionally, the time it takes to complete the "ls" command appears to > be > > proportional to the size of the deleted file. The issue described above > is > > not limited to "ls" but extends to other commands: > > > > ls ; time rm dd-test2 ; time du -hs ./* > > dd-test dd-test2 > > > > real 0m0.003s > > user 0m0.000s > > sys 0m0.000s > > 128G ./dd-test > > > > real 0m9.974s > > user 0m0.000s > > sys 0m0.000s > > > > What might be causing this behavior and eventually how could we improve > it? > > > > Seems like mds was waiting for journal flush, it can wait up to > 'mds_tick_interval'. This issue should be fix in luminous release. > > Regards > Yan, Zheng > > > Setup: > > > > - ceph version: 10.2.9, OS: Ubuntu 16.04, kernel: 4.8.0-58-generic, > > - 3 monitors, > > - 1 mds, > > - 3 storage nodes with 24 X 4TB disks on each node: 1 OSD/disk (72 OSDs > in > > total). 4TB disks are used for the cephfs_data pool. Journaling is on > SSDs, > > - we installed an 400GB NVMe disk on each storage node and aggregated the > > tree disks in crush rule. cephfs_metadata pool was then created using > that > > rule and therefore is hosted on the NVMes. Journaling and data are on the > > same partition here. > > > > So far we are using the default ceph configuration settings. > > > > Clients are mounting the file system with the kernel driver using the > > following options (again default): > > "rw,noatime,name=admin,secret=<hidden>,acl,_netdev". > > > > Thank you in advance for the help. > > > > Cheers, > > Tyanko > > > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com