VMWare can be quite picky about NFS servers. Some things that you should test before deploying anything with that in production:
* failover * reconnects after NFS reboots or outages * NFS3 vs NFS4 * Kernel NFS (which kernel version? cephfs-fuse or cephfs-kernel?) vs NFS Ganesha (VFS FSAL vs. Ceph FSAL) * Stress tests with lots of VMWare clients - we had a setup than ran fine with 5 big VMWare hypervisors but started to get random deadlocks once we added 5 more We are running CephFS + NFS + VMWare in production but we've encountered *a lot* of problems until we got that stable for a few configurations. Be prepared to debug NFS problems at a low level with tcpdump and a careful read of the RFC and NFS server source ;) Paul 2018-06-29 18:48 GMT+02:00 Nick Fisk <n...@fisk.me.uk>: > This is for us peeps using Ceph with VMWare. > > > > My current favoured solution for consuming Ceph in VMWare is via RBD’s > formatted with XFS and exported via NFS to ESXi. This seems to perform > better than iSCSI+VMFS which seems to not play nicely with Ceph’s PG > contention issues particularly if working with thin provisioned VMDK’s. > > > > I’ve still been noticing some performance issues however, mainly > noticeable when doing any form of storage migrations. This is largely due > to the way vSphere transfers VM’s in 64KB IO’s at a QD of 32. vSphere does > this so Arrays with QOS can balance the IO easier than if larger IO’s were > submitted. However Ceph’s PG locking means that only one or two of these > IO’s can happen at a time, seriously lowering throughput. Typically you > won’t be able to push more than 20-25MB/s during a storage migration > > > > There is also another issue in that the IO needed for the XFS journal on > the RBD, can cause contention and effectively also means every NFS write IO > sends 2 down to Ceph. This can have an impact on latency as well. Due to > possible PG contention caused by the XFS journal updates when multiple IO’s > are in flight, you normally end up making more and more RBD’s to try and > spread the load. This normally means you end up having to do storage > migrations…..you can see where I’m getting at here. > > > > I’ve been thinking for a while that CephFS works around a lot of these > limitations. > > > > 1. It supports fancy striping, so should mean there is less per > object contention > > 2. There is no FS in the middle to maintain a journal and other > associated IO > > 3. A single large NFS mount should have none of the disadvantages > seen with a single RBD > > 4. No need to migrate VM’s about because of #3 > > 5. No need to fstrim after deleting VM’s > > 6. Potential to do away with pacemaker and use LVS to do > active/active NFS as ESXi does its own locking with files > > > > With this in mind I exported a CephFS mount via NFS and then mounted it to > an ESXi host as a test. > > > > Initial results are looking very good. I’m seeing storage migrations to > the NFS mount going at over 200MB/s, which equates to several thousand IO’s > and seems to be writing at the intended QD32. > > > > I need to do more testing to make sure everything works as intended, but > like I say, promising initial results. > > > > Further testing needs to be done to see what sort of MDS performance is > required, I would imagine that since we are mainly dealing with large > files, it might not be that critical. I also need to consider the stability > of CephFS, RBD is relatively simple and is in use by a large proportion of > the Ceph community. CephFS is a lot easier to “upset”. > > > > Nick > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com