Thanks for your response on that, Jeff. Pretty sure this is nothing to do
with Ceph or Ganesha, sorry for wasting your time. What I'm seeing is
related to writeback on the client. I can mitigate the behaviour a bit by
playing around with the vm.dirty* parameters.




On Tue, Apr 16, 2019 at 7:07 PM Jeff Layton <jlay...@poochiereds.net> wrote:

> On Tue, Apr 16, 2019 at 10:36 AM David C <dcsysengin...@gmail.com> wrote:
> >
> > Hi All
> >
> > I have a single export of my cephfs using the ceph_fsal [1]. A CentOS 7
> machine mounts a sub-directory of the export [2] and is using it for the
> home directory of a user (e.g everything under ~ is on the server).
> >
> > This works fine until I start a long sequential write into the home
> directory such as:
> >
> > dd if=/dev/zero of=~/deleteme bs=1M count=8096
> >
> > This saturates the 1GbE link on the client which is great but during the
> transfer, apps that are accessing files in home start to lock up. Google
> Chrome for example, which puts it's config in ~/.config/google-chrome/,
> locks up during the transfer, e.g I can't move between tabs, as soon as the
> transfer finishes, Chrome goes back to normal. Essentially the desktop
> environment reacts as I'd expect if the server was to go away. I'm using
> the MATE DE.
> >
> > However, if I mount a separate directory from the same export on the
> machine [3] and do the same write into that directory, my desktop
> experience isn't affected.
> >
> > I hope that makes some sense, it's a bit of a weird one to describe.
> This feels like a locking issue to me, although I can't explain why a
> single write into the root of a mount would affect access to other files
> under that same mount.
> >
>
> It's not a single write. You're doing 8G worth of 1M I/Os. The server
> then has to do all of those to the OSD backing store.
>
> > [1] CephFS export:
> >
> > EXPORT
> > {
> >     Export_ID=100;
> >     Protocols = 4;
> >     Transports = TCP;
> >     Path = /;
> >     Pseudo = /ceph/;
> >     Access_Type = RW;
> >     Attr_Expiration_Time = 0;
> >     Disable_ACL = FALSE;
> >     Manage_Gids = TRUE;
> >     Filesystem_Id = 100.1;
> >     FSAL {
> >         Name = CEPH;
> >     }
> > }
> >
> > [2] Home directory mount:
> >
> > 10.10.10.226:/ceph/homes/username on /homes/username type nfs4
> (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.135,local_lock=none,addr=10.10.10.226)
> >
> > [3] Test directory mount:
> >
> > 10.10.10.226:/ceph/testing on /tmp/testing type nfs4
> (rw,relatime,vers=4.1,rsize=1048576,wsize=1048576,namlen=255,soft,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=10.10.10.135,local_lock=none,addr=10.10.10.226)
> >
> > Versions:
> >
> > Luminous 12.2.10
> > nfs-ganesha-2.7.1-0.1.el7.x86_64
> > nfs-ganesha-ceph-2.7.1-0.1.el7.x86_64
> >
> > Ceph.conf on nfs-ganesha server:
> >
> > [client]
> >         mon host = 10.10.10.210:6789, 10.10.10.211:6789,
> 10.10.10.212:6789
> >         client_oc_size = 8388608000
> >         client_acl_type=posix_acl
> >         client_quota = true
> >         client_quota_df = true
> >
>
> No magic bullets here, I'm afraid.
>
> Sounds like ganesha is probably just too swamped with write requests
> to do much else, but you'll probably want to do the legwork starting
> with the hanging application, and figure out what it's doing that
> takes so long. Is it some syscall? Which one?
>
> From there you can start looking at statistics in the NFS client to
> see what's going on there. Are certain RPCs taking longer than they
> should? Which ones?
>
> Once you know what's going on with the client, you can better tell
> what's going on with the server.
> --
> Jeff Layton <jlay...@poochiereds.net>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to