On Mon, Oct 22, 2018 at 9:46 AM Dylan McCulloch <d...@unimelb.edu.au> wrote: > > On Mon, Oct 8, 2018 at 2:57 PM Dylan McCulloch <d...@unimelb.edu.au>> wrote: > >> > >> Hi all, > >> > >> > >> We have identified some unexpected blocking behaviour by the ceph-fs > >> kernel client. > >> > >> > >> When performing 'rm' on large files (100+GB), there appears to be a > >> significant delay of 10 seconds or more, before a 'stat' operation can be > >> performed on the same directory on the filesystem. > >> > >> > >> Looking at the kernel client's mds inflight-ops, we observe that there are > >> pending > >> > >> UNLINK operations corresponding to the deleted files. > >> > >> > >> We have noted some correlation between files being in the client page > >> cache and the blocking behaviour. For example, if the cache is dropped or > >> the filesystem remounted the blocking will not occur. > >> > >> > >> Test scenario below: > >> > >> > >> /mnt/cephfs_mountpoint type ceph > >> (rw,relatime,name=ceph_filesystem,secret=<hidden>>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1) > >> > >> > >> Test1: > >> > >> 1) unmount & remount: > >> > >> > >> 2) Add 10 x 100GB files to a directory: > >> > >> > >> for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt > >> count=102400 bs=1048576; done > >> > >> > >> 3) Delete all files in directory: > >> > >> > >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done > >> > >> > >> 4) Immediately perform ls on directory: > >> > >> > >> time ls /mnt/cephfs_mountpoint/test1 > >> > >> > >> Result: delay ~16 seconds > >> > >> real 0m16.818s > >> > >> user 0m0.000s > >> > >> sys 0m0.002s > >> > >> > > > Are cephfs metadata pool and data pool on the same set of OSD. Is is > > possible that heavy data IO slowed down metadata IO? > > Test results are from a new pre-production cluster that does not have any > significant data IO. We've also confirmed the same behaviour on another > cluster with similar configuration. Both clusters have separate > device-class/crush rule for metadata pool using NVME OSDs, while the data > pool uses HDD OSDs. > Most metadata operations are unaffected. It appears that it is only metadata > operations on files that exist in client page cache prior to rm that are > delayed. >
Ok. Please enable kernel debug when running 'ls' and send kernel log to us. echo module ceph +p > /sys/kernel/debug/dynamic_debug/control; time /mnt/cephfs_mountpoint/test1 echo module ceph -p > /sys/kernel/debug/dynamic_debug/control; Yan, Zheng > >> > >> Test2: > >> > >> > >> 1) unmount & remount > >> > >> > >> 2) Add 10 x 100GB files to a directory > >> > >> for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt > >> count=102400 bs=1048576; done > >> > >> > >> 3) Either a) unmount & remount; or b) drop caches > >> > >> > >> echo 3 >>/proc/sys/vm/drop_caches > >> > >> > >> 4) Delete files in directory: > >> > >> > >> for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done > >> > >> > >> 5) Immediately perform ls on directory: > >> > >> > >> time ls /mnt/cephfs_mountpoint/test1 > >> > >> > >> Result: no delay > >> > >> real 0m0.010s > >> > >> user 0m0.000s > >> > >> sys 0m0.001s > >> > >> > >> Our understanding of ceph-fs’ file deletion mechanism, is that there > >> should be no blocking observed on the client. > >> http://docs.ceph.com/docs/mimic/dev/delayed-delete/ . > >> > >> It appears that if files are cached on the client, either by being created > >> or accessed recently it will cause the kernel client to block for reasons > >> we have not identified. > >> > >> > >> Is this a known issue, are there any ways to mitigate this behaviour? > >> > >> Our production system relies on our client’s processes having concurrent > >> access to the file system, and access contention must be avoided. > >> > >> > >> An old mailing list post that discusses changes to client’s page cache > >> behaviour may be relevant. > >> > >> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html > >> > >> > >> Client System: > >> > >> > >> OS: RHEL7 > >> > >> Kernel: 4.15.15-1 > >> > >> > >> Cluster: Ceph: Luminous 12.2.8 > >> > >> > > > > >> Thanks, > >> > >> Dylan > >> > >> > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com