VMWare can be quite picky about NFS servers.
Some things that you should test before deploying anything with that in
production:

* failover
* reconnects after NFS reboots or outages
* NFS3 vs NFS4
* Kernel NFS (which kernel version? cephfs-fuse or cephfs-kernel?) vs NFS
Ganesha (VFS FSAL vs. Ceph FSAL)
* Stress tests with lots of VMWare clients - we had a setup than ran fine
with 5 big VMWare hypervisors but started to get random deadlocks once we
added 5 more

We are running CephFS + NFS + VMWare in production but we've encountered *a
lot* of problems until we got that stable for a few configurations.
Be prepared to debug NFS problems at a low level with tcpdump and a careful
read of the RFC and NFS server source ;)

Paul

2018-06-29 18:48 GMT+02:00 Nick Fisk <n...@fisk.me.uk>:

> This is for us peeps using Ceph with VMWare.
>
>
>
> My current favoured solution for consuming Ceph in VMWare is via RBD’s
> formatted with XFS and exported via NFS to ESXi. This seems to perform
> better than iSCSI+VMFS which seems to not play nicely with Ceph’s PG
> contention issues particularly if working with thin provisioned VMDK’s.
>
>
>
> I’ve still been noticing some performance issues however, mainly
> noticeable when doing any form of storage migrations. This is largely due
> to the way vSphere transfers VM’s in 64KB IO’s at a QD of 32. vSphere does
> this so Arrays with QOS can balance the IO easier than if larger IO’s were
> submitted. However Ceph’s PG locking means that only one or two of these
> IO’s can happen at a time, seriously lowering throughput. Typically you
> won’t be able to push more than 20-25MB/s during a storage migration
>
>
>
> There is also another issue in that the IO needed for the XFS journal on
> the RBD, can cause contention and effectively also means every NFS write IO
> sends 2 down to Ceph. This can have an impact on latency as well. Due to
> possible PG contention caused by the XFS journal updates when multiple IO’s
> are in flight, you normally end up making more and more RBD’s to try and
> spread the load. This normally means you end up having to do storage
> migrations…..you can see where I’m getting at here.
>
>
>
> I’ve been thinking for a while that CephFS works around a lot of these
> limitations.
>
>
>
> 1.       It supports fancy striping, so should mean there is less per
> object contention
>
> 2.       There is no FS in the middle to maintain a journal and other
> associated IO
>
> 3.       A single large NFS mount should have none of the disadvantages
> seen with a single RBD
>
> 4.       No need to migrate VM’s about because of #3
>
> 5.       No need to fstrim after deleting VM’s
>
> 6.       Potential to do away with pacemaker and use LVS to do
> active/active NFS as ESXi does its own locking with files
>
>
>
> With this in mind I exported a CephFS mount via NFS and then mounted it to
> an ESXi host as a test.
>
>
>
> Initial results are looking very good. I’m seeing storage migrations to
> the NFS mount going at over 200MB/s, which equates to several thousand IO’s
> and seems to be writing at the intended QD32.
>
>
>
> I need to do more testing to make sure everything works as intended, but
> like I say, promising initial results.
>
>
>
> Further testing needs to be done to see what sort of MDS performance is
> required, I would imagine that since we are mainly dealing with large
> files, it might not be that critical. I also need to consider the stability
> of CephFS, RBD is relatively simple and is in use by a large proportion of
> the Ceph community. CephFS is a lot easier to “upset”.
>
>
>
> Nick
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>


-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to