Hello everyone, something very strange is driving me crazy with CephFS (kernel driver). I copy a large directory on the CephFS from one node. If I try to perform a 'time ls -alR' on that directory it gets executed in less than one second. If I try to do the same 'time ls -alR' from another node it takes several minutes. No matter how many times I repeat the command, the speed is always abysmal. The ls works fine on the node where the initial copy was executed from. This happens with any directory I have tried, no matter what kind of data is inside.
After lots of experimenting I have found that in order to have fast ls speed for that dir from every node I need to flush the Linux cache on the original node: echo 3 > /proc/sys/vm/drop_caches Unmounting and remounting the CephFS on that node does the trick too. Anyone has a clue about what's happening here? Could this be a problem with the writeback fscache for the CephFS? Any help appreciated! Thanks and regards. :) # uname -r 3.10.80-1.el6.elrepo.x86_64 # ceph -v ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) # ceph -s cluster f9ffbbd7-186b-483a-96ea-90cdadb81f2a health HEALTH_OK monmap e1: 3 mons at {[omissis]} election epoch 60, quorum 0,1,2 [omissis] mdsmap e59: 1/1/1 up {0=[omissis]=up:active}, 2 up:standby osdmap e146: 2 osds: 2 up, 2 in pgmap v122287: 256 pgs, 2 pools, 30709 MB data, 75239 objects 62432 MB used, 860 GB / 921 GB avail 256 active+clean
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com