[ceph-users] Re: CephFS+NFS For VMWare

Maged Mokhtar Thu, 05 Sep 2019 10:15:49 -0700

this is an old thread, but could be useful for others, i found out thediscrepancy in VMware vmotion speed under iSCSI is probably due the"emulate_3pc" config attribute for the LIO target. if set to 0, then yesVMWare will issue io in 64KB blocks, so the bandwidth will indeed bearound 25 MB/s. If emulate_3pc is set to 1, this will trigger VMWare touse vaai extended copy, which activates LIO's xcopy functionality whichuses 512KB block sizes by default. We also bumped the xcopy block sizeto 4M (rbd object size) which gives around 400 MB/s vmotion speed, thesame speed can also be achieved via Veeam backups.


/Maged


On 02/07/2018 14:36, Maged Mokhtar wrote:

Hi Nick,
With iSCSI we reach over 150 MB/s vmotion for single vm, 1 GB/s for7-8 vm migrations. Since these are 64KB block sizes, latency/iops is alarge factor, you need either controllers with write back cache or allflash . hdds without write cache will suffer even with external wal/dbon ssds, giving around 80 MB/s vmotion migration. Potentially it maybe possible to get higher vmotion speeds by using fancy striping but iwould not recommend this unless your total queue depths in all yourvms is small compared to the number of osds.
Regarding thin provisioning, a vmdk provisioned as lazy zeroed doeshave an "initial" large impact on random write performance, could beup to 10x slower. If you are writing a random 64KB to an un-allocatedvmfs block, vmfs will first write 1MB to fill the block with zerosthen write the 64KB client data, so although a lot of data is beingwritten the perceived client bandwidth is very low. The performancewill gradually get better with time until the disk is fullyprovisioned. It is also possible to thick eager zero the vmdk disk atcreation time. Again this is more apparent with random writes ratherthan sequential or vmotion load.
Maged

On 2018-06-29 18:48, Nick Fisk wrote:
This is for us peeps using Ceph with VMWare.
My current favoured solution for consuming Ceph in VMWare is viaRBD’s formatted with XFS and exported via NFS to ESXi. This seems toperform better than iSCSI+VMFS which seems to not play nicely withCeph’s PG contention issues particularly if working with thinprovisioned VMDK’s.
I’ve still been noticing some performance issues however, mainlynoticeable when doing any form of storage migrations. This is largelydue to the way vSphere transfers VM’s in 64KB IO’s at a QD of 32.vSphere does this so Arrays with QOS can balance the IO easier thanif larger IO’s were submitted. However Ceph’s PG locking means thatonly one or two of these IO’s can happen at a time, seriouslylowering throughput. Typically you won’t be able to push more than20-25MB/s during a storage migration
There is also another issue in that the IO needed for the XFS journalon the RBD, can cause contention and effectively also means every NFSwrite IO sends 2 down to Ceph. This can have an impact on latency aswell. Due to possible PG contention caused by the XFS journal updateswhen multiple IO’s are in flight, you normally end up making more andmore RBD’s to try and spread the load. This normally means you end uphaving to do storage migrations…..you can see where I’m getting at here.
I’ve been thinking for a while that CephFS works around a lot ofthese limitations.
1.It supports fancy striping, so should mean there is less per objectcontention
2.There is no FS in the middle to maintain a journal and otherassociated IO
3.A single large NFS mount should have none of the disadvantages seenwith a single RBD
4.No need to migrate VM’s about because of #3

5.No need to fstrim after deleting VM’s
6.Potential to do away with pacemaker and use LVS to do active/activeNFS as ESXi does its own locking with files
With this in mind I exported a CephFS mount via NFS and then mountedit to an ESXi host as a test.
Initial results are looking very good. I’m seeing storage migrationsto the NFS mount going at over 200MB/s, which equates to severalthousand IO’s and seems to be writing at the intended QD32.
I need to do more testing to make sure everything works as intended,but like I say, promising initial results.
Further testing needs to be done to see what sort of MDS performanceis required, I would imagine that since we are mainly dealing withlarge files, it might not be that critical. I also need to considerthe stability of CephFS, RBD is relatively simple and is in use by alarge proportion of the Ceph community. CephFS is a lot easier to“upset”.
Nick


_______________________________________________
ceph-users mailing list
ceph-us...@lists.ceph.com <mailto:ceph-us...@lists.ceph.com>
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CephFS+NFS For VMWare

Reply via email to