Nowhere in your test procedure do you mention syncing or flushing the files
to disk. That is almost certainly the cause of the slowness — the client
which wrote the data is required to flush it out before dropping enough
file "capabilities" for the other client to do the rm.
-Greg

On Sun, Oct 7, 2018 at 11:57 PM Dylan McCulloch <d...@unimelb.edu.au> wrote:

>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> * Hi all, We have identified some unexpected blocking behaviour by the
> ceph-fs kernel client. When performing 'rm' on large files (100+GB), there
> appears to be a significant delay of 10 seconds or more, before a 'stat'
> operation can be performed on the same directory on the filesystem. Looking
> at the kernel client's mds inflight-ops, we observe that there are pending
> UNLINK operations corresponding to the deleted files. We have noted some
> correlation between files being in the client page cache and the blocking
> behaviour. For example, if the cache is dropped or the filesystem remounted
> the blocking will not occur. Test scenario below: /mnt/cephfs_mountpoint
> type ceph
> (rw,relatime,name=ceph_filesystem,secret=<hidden>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)
> Test1: 1) unmount & remount: 2) Add 10 x 100GB files to a directory: for i
> in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt
> count=102400 bs=1048576; done 3) Delete all files in directory: for i in
> {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done 4) Immediately
> perform ls on directory: time ls /mnt/cephfs_mountpoint/test1 Result: delay
> ~16 seconds real    0m16.818s user    0m0.000s sys     0m0.002s Test2: 1)
> unmount & remount 2) Add 10 x 100GB files to a directory for i in {1..10};
> do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt count=102400
> bs=1048576; done 3) Either a) unmount & remount; or b) drop caches echo 3
> >/proc/sys/vm/drop_caches 4) Delete files in directory: for i in {1..10};do
> rm -f /mnt/cephfs_mountpoint/file$i.txt; done 5) Immediately perform ls on
> directory: time ls /mnt/cephfs_mountpoint/test1 Result: no delay real
>    0m0.010s user    0m0.000s sys     0m0.001s Our understanding of ceph-fs’
> file deletion mechanism, is that there should be no blocking observed on
> the client. http://docs.ceph.com/docs/mimic/dev/delayed-delete/
> <http://docs.ceph.com/docs/mimic/dev/delayed-delete/> . It appears that if
> files are cached on the client, either by being created or accessed
> recently  it will cause the kernel client to block for reasons we have not
> identified. Is this a known issue, are there any ways to mitigate this
> behaviour? Our production system relies on our client’s processes having
> concurrent access to the file system, and access contention must be
> avoided. An old mailing list post that discusses changes to client’s page
> cache behaviour may be relevant.
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html
> <http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html>
> Client System: OS: RHEL7 Kernel: 4.15.15-1 Cluster: Ceph: Luminous 12.2.8
> Thanks, Dylan *
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to