Fixed! At least looks like fixed.

It seems that after migrating every node (both servers and clients) from
kernel 3.10.80-1 to 4.0.4-1 the issue disappeared.
Now I get decent speeds both for reading files and for getting stats from
every node.

Thanks everyone!


On Tue, Jun 16, 2015 at 1:00 PM, negillen negillen <negil...@gmail.com>
wrote:

> Thanks everyone,
>
> update: I tried running on "node A":
> # vmtouch -ev /storage/
> # sync; sync
>
> The problem persisted; one minute needed to 'ls -Ral' the dir (from node
> B).
>
> After that I ran on node A:
> # echo 2 > /proc/sys/vm/drop_caches
>
> And everything became suddenly fast on node B. ls, du, tar, all of them
> take a fraction of a second to complete on node B after dropping cache on A.
>
>
>
>
> On Tue, Jun 16, 2015 at 12:52 PM, Jan Schermer <j...@schermer.cz> wrote:
>
>> Have you tried just running “sync;sync” on the originating node? Does
>> that achieve the same thing or not? (I guess it could/should).
>>
>> Jan
>>
>>
>> On 16 Jun 2015, at 13:37, negillen negillen <negil...@gmail.com> wrote:
>>
>> Thanks again,
>>
>> even 'du' performance is terrible on node B (testing on a directory taken
>> from Phoronix):
>>
>> # time du -hs /storage/test9/installed-tests/pts/pgbench-1.5.1/
>> 73M     /storage/test9/installed-tests/pts/pgbench-1.5.1/
>> real    0m21.044s
>> user    0m0.010s
>> sys     0m0.067s
>>
>>
>> Reading the files from node B doesn't seem to help with subsequent
>> accesses in this case:
>>
>> # time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null
>> real    1m47.650s
>> user    0m0.041s
>> sys     0m0.212s
>>
>> # time tar c /storage/test9/installed-tests/pts/pgbench-1.5.1/>/dev/null
>> real    1m45.636s
>> user    0m0.042s
>> sys     0m0.214s
>>
>> # time ls -laR /storage/test9/installed-tests/pts/pgbench-1.5.1>/dev/null
>>
>> real    1m43.180s
>> user    0m0.069s
>> sys     0m0.236s
>>
>>
>> Of course, once I dismount the CephFS on node A everything gets as fast
>> as it can be.
>>
>> Am I missing something obvious here?
>> Yes I could drop the Linux cache as a 'fix' but that would drop the
>> entire system's cache, sounds a bit extreme! :P
>> Unless is there a way to drop the cache only for that single dir...?
>>
>>
>> On Tue, Jun 16, 2015 at 12:15 PM, Gregory Farnum <g...@gregs42.com>
>> wrote:
>>
>>> On Tue, Jun 16, 2015 at 12:11 PM, negillen negillen <negil...@gmail.com>
>>> wrote:
>>> > Thank you very much for your reply!
>>> >
>>> > Is there anything I can do to go around that? e.g. setting access caps
>>> to be
>>> > released after a short while? Or is there a command to manually release
>>> > access caps (so that I could run it in cron)?
>>>
>>> Well, you can drop the caches. ;)
>>>
>>> More generally, you're running into a specific hole here. If your
>>> clients are actually *accessing* the files then they should go into
>>> shared mode and this will be much faster on subsequent accesses.
>>>
>>> > This is quite a problem because we have several applications that need
>>> to
>>> > access a large number of files and when we set them to work on CephFS
>>> > latency skyrockets.
>>>
>>> What kind of shared-file access do they have? If you have a bunch of
>>> files being shared for read I'd expect this to be very fast. If
>>> different clients are writing small amounts to them in round-robin
>>> then that's unfortunately not going to work well. :(
>>> -Greg
>>>
>>> >
>>> > Thanks again and regards.
>>> >
>>> > On Tue, Jun 16, 2015 at 10:59 AM, Gregory Farnum <g...@gregs42.com>
>>> wrote:
>>> >>
>>> >> On Mon, Jun 15, 2015 at 11:34 AM, negillen negillen <
>>> negil...@gmail.com>
>>> >> wrote:
>>> >> > Hello everyone,
>>> >> >
>>> >> > something very strange is driving me crazy with CephFS (kernel
>>> driver).
>>> >> > I copy a large directory on the CephFS from one node. If I try to
>>> >> > perform a
>>> >> > 'time ls -alR' on that directory it gets executed in less than one
>>> >> > second.
>>> >> > If I try to do the same 'time ls -alR' from another node it takes
>>> >> > several
>>> >> > minutes. No matter how many times I repeat the command, the speed is
>>> >> > always
>>> >> > abysmal. The ls works fine on the node where the initial copy was
>>> >> > executed
>>> >> > from. This happens with any directory I have tried, no matter what
>>> kind
>>> >> > of
>>> >> > data is inside.
>>> >> >
>>> >> > After lots of experimenting I have found that in order to have fast
>>> ls
>>> >> > speed
>>> >> > for that dir from every node I need to flush the Linux cache on the
>>> >> > original
>>> >> > node:
>>> >> > echo 3 > /proc/sys/vm/drop_caches
>>> >> > Unmounting and remounting the CephFS on that node does the trick
>>> too.
>>> >> >
>>> >> > Anyone has a clue about what's happening here? Could this be a
>>> problem
>>> >> > with
>>> >> > the writeback fscache for the CephFS?
>>> >> >
>>> >> > Any help appreciated! Thanks and regards. :)
>>> >>
>>> >> This is a consequence of the CephFS "file capabilities" that we use to
>>> >> do distributed locking on file states. When you copy the directory on
>>> >> client A, it has full capabilities on the entire tree. When client B
>>> >> tries to do a stat on each file in that tree, it doesn't have any
>>> >> capabilities. So it sends a stat request to the MDS, which sends a cap
>>> >> update to client A requiring it to pause updates on the file and share
>>> >> its current state. Then the MDS tells client A it can keep going and
>>> >> sends the stat to client B.
>>> >> So that's:
>>> >> B -> MDS
>>> >> MDS -> A
>>> >> A -> MDS
>>> >> MDS -> B | MDS -> A
>>> >> for every file you touch.
>>> >>
>>> >> I think the particular oddity you're encountering here is that CephFS
>>> >> generally tries not to make clients drop their "exclusive" access caps
>>> >> just to satisfy a stat. If you had client B doing something with the
>>> >> files (like reading them) you would probably see different behavior.
>>> >> I'm not sure if there's something effective we can do here or not
>>> >> (it's just a bunch of heuristics when we should or should not drop
>>> >> caps), but please file a bug on the tracker (tracker.ceph.com) with
>>> >> this case. :)
>>> >> -Greg
>>> >
>>> >
>>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to