Fixed! At least looks like fixed. It seems that after migrating every node (both servers and clients) from kernel 3.10.80-1 to 4.0.4-1 the issue disappeared. Now I get decent speeds both for reading files and for getting stats from every node.
Thanks everyone! On Tue, Jun 16, 2015 at 1:00 PM, negillen negillen <negil...@gmail.com> wrote: > Thanks everyone, > > update: I tried running on "node A": > # vmtouch -ev /storage/ > # sync; sync > > The problem persisted; one minute needed to 'ls -Ral' the dir (from node > B). > > After that I ran on node A: > # echo 2 > /proc/sys/vm/drop_caches > > And everything became suddenly fast on node B. ls, du, tar, all of them > take a fraction of a second to complete on node B after dropping cache on A. > > > > > On Tue, Jun 16, 2015 at 12:52 PM, Jan Schermer <j...@schermer.cz> wrote: > >> Have you tried just running “sync;sync” on the originating node? Does >> that achieve the same thing or not? (I guess it could/should). >> >> Jan >> >> >> On 16 Jun 2015, at 13:37, negillen negillen <negil...@gmail.com> wrote: >> >> Thanks again, >> >> even 'du' performance is terrible on node B (testing on a directory taken >> from Phoronix): >> >> # time du -hs /storage/test9/installed-tests/pts/pgbench-1.5.1/ >> 73M /storage/test9/installed-tests/pts/pgbench-1.5.1/ >> real 0m21.044s >> user 0m0.010s >> sys 0m0.067s >> >> >> Reading the files from node B doesn't seem to help with subsequent >> accesses in this case: >> >> # time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null >> real 1m47.650s >> user 0m0.041s >> sys 0m0.212s >> >> # time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null >> real 1m45.636s >> user 0m0.042s >> sys 0m0.214s >> >> # time ls -laR /storage/test9/installed-tests/pts/pgbench-1.5.1>/dev/null >> >> real 1m43.180s >> user 0m0.069s >> sys 0m0.236s >> >> >> Of course, once I dismount the CephFS on node A everything gets as fast >> as it can be. >> >> Am I missing something obvious here? >> Yes I could drop the Linux cache as a 'fix' but that would drop the >> entire system's cache, sounds a bit extreme! :P >> Unless is there a way to drop the cache only for that single dir...? >> >> >> On Tue, Jun 16, 2015 at 12:15 PM, Gregory Farnum <g...@gregs42.com> >> wrote: >> >>> On Tue, Jun 16, 2015 at 12:11 PM, negillen negillen <negil...@gmail.com> >>> wrote: >>> > Thank you very much for your reply! >>> > >>> > Is there anything I can do to go around that? e.g. setting access caps >>> to be >>> > released after a short while? Or is there a command to manually release >>> > access caps (so that I could run it in cron)? >>> >>> Well, you can drop the caches. ;) >>> >>> More generally, you're running into a specific hole here. If your >>> clients are actually *accessing* the files then they should go into >>> shared mode and this will be much faster on subsequent accesses. >>> >>> > This is quite a problem because we have several applications that need >>> to >>> > access a large number of files and when we set them to work on CephFS >>> > latency skyrockets. >>> >>> What kind of shared-file access do they have? If you have a bunch of >>> files being shared for read I'd expect this to be very fast. If >>> different clients are writing small amounts to them in round-robin >>> then that's unfortunately not going to work well. :( >>> -Greg >>> >>> > >>> > Thanks again and regards. >>> > >>> > On Tue, Jun 16, 2015 at 10:59 AM, Gregory Farnum <g...@gregs42.com> >>> wrote: >>> >> >>> >> On Mon, Jun 15, 2015 at 11:34 AM, negillen negillen < >>> negil...@gmail.com> >>> >> wrote: >>> >> > Hello everyone, >>> >> > >>> >> > something very strange is driving me crazy with CephFS (kernel >>> driver). >>> >> > I copy a large directory on the CephFS from one node. If I try to >>> >> > perform a >>> >> > 'time ls -alR' on that directory it gets executed in less than one >>> >> > second. >>> >> > If I try to do the same 'time ls -alR' from another node it takes >>> >> > several >>> >> > minutes. No matter how many times I repeat the command, the speed is >>> >> > always >>> >> > abysmal. The ls works fine on the node where the initial copy was >>> >> > executed >>> >> > from. This happens with any directory I have tried, no matter what >>> kind >>> >> > of >>> >> > data is inside. >>> >> > >>> >> > After lots of experimenting I have found that in order to have fast >>> ls >>> >> > speed >>> >> > for that dir from every node I need to flush the Linux cache on the >>> >> > original >>> >> > node: >>> >> > echo 3 > /proc/sys/vm/drop_caches >>> >> > Unmounting and remounting the CephFS on that node does the trick >>> too. >>> >> > >>> >> > Anyone has a clue about what's happening here? Could this be a >>> problem >>> >> > with >>> >> > the writeback fscache for the CephFS? >>> >> > >>> >> > Any help appreciated! Thanks and regards. :) >>> >> >>> >> This is a consequence of the CephFS "file capabilities" that we use to >>> >> do distributed locking on file states. When you copy the directory on >>> >> client A, it has full capabilities on the entire tree. When client B >>> >> tries to do a stat on each file in that tree, it doesn't have any >>> >> capabilities. So it sends a stat request to the MDS, which sends a cap >>> >> update to client A requiring it to pause updates on the file and share >>> >> its current state. Then the MDS tells client A it can keep going and >>> >> sends the stat to client B. >>> >> So that's: >>> >> B -> MDS >>> >> MDS -> A >>> >> A -> MDS >>> >> MDS -> B | MDS -> A >>> >> for every file you touch. >>> >> >>> >> I think the particular oddity you're encountering here is that CephFS >>> >> generally tries not to make clients drop their "exclusive" access caps >>> >> just to satisfy a stat. If you had client B doing something with the >>> >> files (like reading them) you would probably see different behavior. >>> >> I'm not sure if there's something effective we can do here or not >>> >> (it's just a bunch of heuristics when we should or should not drop >>> >> caps), but please file a bug on the tracker (tracker.ceph.com) with >>> >> this case. :) >>> >> -Greg >>> > >>> > >>> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com