Hi Greg,

Nowhere in your test procedure do you mention syncing or flushing the files to 
disk. That is almost certainly the cause of the slowness


We have tested performing sync after file creation and the delay still occurs. 
(See Test3 results below)


To clarify, it appears the delay is observed only when ls is performed on the 
same directory in which the files were removed, provided the files have been 
recently cached.

e.g. rm -f /mnt/cephfs_mountpoint/file*; ls /mnt/cephfs_mountpoint


the client which wrote the data is required to flush it out before dropping 
enough file "capabilities" for the other client to do the rm.


Our tests are performed on the same host.


In Test1 the rm and ls are performed by the same client id. And for other tests 
in which an unmount & remount were performed, I would assume the unmount would 
cause that particular client id to terminate and drop any caps.


Do you still believe held caps are contributing to slowness in these test 
scenarios?


We’ve added 3 additional test cases below.

Test 3) Sync write (delay observed when writing files and syncing)

Test 4) Bypass cache (no delay observed when files are not written to cache)

Test 5) Read test (delay observed when removing files that have been read 
recently in to cache)


Test3: Sync Write - File creation, with sync after write.


1) unmount & remount:


2) Add 5 x 100GB files to a directory:


for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
count=102400 bs=1048576;done


3) sync


4) Delete all files in directory:


for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done


5) Immediately perform ls on directory:


time ls /mnt/cephfs_mountpoint

real    0m8.765s

user    0m0.001s

sys     0m0.000s



Test4: Bypass cache - File creation, with nocache options for dd.


1) unmount & remount:


2) Add 5 x 100GB files to a directory:


for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
count=102400 bs=1048576 oflag=nocache,sync iflag=nocache;done


3) sync


4) Delete all files in directory:


for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done


5) Immediately perform ls on directory:


time ls /mnt/cephfs_mountpoint

real    0m0.003s

user    0m0.000s

sys     0m0.001s



Test5: Read test - Read files into empty page cache, before deletion.

1) unmount & remount


2) Add 5 x 100GB files to a directory:


for i in {1..5}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
count=102400 bs=1048576;done


3) sync


4) unmount & remount #empty cache


5) read files (to add back to cache)

for i in {1..5};do cat /mnt/cephfs_mountpoint/file$i.txt > /dev/null; done


6) Delete all files in directory:


for i in {1..5};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done


5) Immediately perform ls on directory:


time ls /mnt/cephfs_mountpoint

real    0m8.723s

user    0m0.000s

sys     0m0.001s

Regards,

Dylan

________________________________
From: Gregory Farnum <gfar...@redhat.com>
Sent: Wednesday, October 10, 2018 4:37:49 AM
To: Dylan McCulloch
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] cephfs kernel client blocks when removing large files

Nowhere in your test procedure do you mention syncing or flushing the files to 
disk. That is almost certainly the cause of the slowness — the client which 
wrote the data is required to flush it out before dropping enough file 
"capabilities" for the other client to do the rm.
-Greg

On Sun, Oct 7, 2018 at 11:57 PM Dylan McCulloch 
<d...@unimelb.edu.au<mailto:d...@unimelb.edu.au>> wrote:

Hi all,


We have identified some unexpected blocking behaviour by the ceph-fs kernel 
client.


When performing 'rm' on large files (100+GB), there appears to be a significant 
delay of 10 seconds or more, before a 'stat' operation can be performed on the 
same directory on the filesystem.


Looking at the kernel client's mds inflight-ops, we observe that there are 
pending

UNLINK operations corresponding to the deleted files.


We have noted some correlation between files being in the client page cache and 
the blocking behaviour. For example, if the cache is dropped or the filesystem 
remounted the blocking will not occur.


Test scenario below:


/mnt/cephfs_mountpoint type ceph 
(rw,relatime,name=ceph_filesystem,secret=<hidden>,noshare,acl,wsize=16777216,rasize=268439552,caps_wanted_delay_min=1,caps_wanted_delay_max=1)


Test1:

1) unmount & remount:


2) Add 10 x 100GB files to a directory:


for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
count=102400 bs=1048576; done


3) Delete all files in directory:


for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done


4) Immediately perform ls on directory:


time ls /mnt/cephfs_mountpoint/test1


Result: delay ~16 seconds

real    0m16.818s

user    0m0.000s

sys     0m0.002s



Test2:


1) unmount & remount


2) Add 10 x 100GB files to a directory

for i in {1..10}; do dd if=/dev/zero of=/mnt/cephfs_mountpoint/file$i.txt 
count=102400 bs=1048576; done


3) Either a) unmount & remount; or b) drop caches


echo 3 >/proc/sys/vm/drop_caches


4) Delete files in directory:


for i in {1..10};do rm -f /mnt/cephfs_mountpoint/file$i.txt; done


5) Immediately perform ls on directory:


time ls /mnt/cephfs_mountpoint/test1


Result: no delay

real    0m0.010s

user    0m0.000s

sys     0m0.001s


Our understanding of ceph-fs’ file deletion mechanism, is that there should be 
no blocking observed on the client. 
http://docs.ceph.com/docs/mimic/dev/delayed-delete/<http://docs.ceph.com/docs/mimic/dev/delayed-delete/>
 .

It appears that if files are cached on the client, either by being created or 
accessed recently  it will cause the kernel client to block for reasons we have 
not identified.


Is this a known issue, are there any ways to mitigate this behaviour?

Our production system relies on our client’s processes having concurrent access 
to the file system, and access contention must be avoided.


An old mailing list post that discusses changes to client’s page cache 
behaviour may be relevant.

http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html<http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005692.html>


Client System:


OS: RHEL7

Kernel: 4.15.15-1


Cluster: Ceph: Luminous 12.2.8


Thanks,

Dylan

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com<http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to