This is for us peeps using Ceph with VMWare.

 

My current favoured solution for consuming Ceph in VMWare is via RBD's 
formatted with XFS and exported via NFS to ESXi. This seems
to perform better than iSCSI+VMFS which seems to not play nicely with Ceph's PG 
contention issues particularly if working with thin
provisioned VMDK's.

 

I've still been noticing some performance issues however, mainly noticeable 
when doing any form of storage migrations. This is
largely due to the way vSphere transfers VM's in 64KB IO's at a QD of 32. 
vSphere does this so Arrays with QOS can balance the IO
easier than if larger IO's were submitted. However Ceph's PG locking means that 
only one or two of these IO's can happen at a time,
seriously lowering throughput. Typically you won't be able to push more than 
20-25MB/s during a storage migration

 

There is also another issue in that the IO needed for the XFS journal on the 
RBD, can cause contention and effectively also means
every NFS write IO sends 2 down to Ceph. This can have an impact on latency as 
well. Due to possible PG contention caused by the XFS
journal updates when multiple IO's are in flight, you normally end up making 
more and more RBD's to try and spread the load. This
normally means you end up having to do storage migrations...you can see where 
I'm getting at here.

 

I've been thinking for a while that CephFS works around a lot of these 
limitations. 

 

1.       It supports fancy striping, so should mean there is less per object 
contention

2.       There is no FS in the middle to maintain a journal and other 
associated IO

3.       A single large NFS mount should have none of the disadvantages seen 
with a single RBD

4.       No need to migrate VM's about because of #3

5.       No need to fstrim after deleting VM's

6.       Potential to do away with pacemaker and use LVS to do active/active 
NFS as ESXi does its own locking with files

 

With this in mind I exported a CephFS mount via NFS and then mounted it to an 
ESXi host as a test.

 

Initial results are looking very good. I'm seeing storage migrations to the NFS 
mount going at over 200MB/s, which equates to
several thousand IO's and seems to be writing at the intended QD32.

 

I need to do more testing to make sure everything works as intended, but like I 
say, promising initial results. 

 

Further testing needs to be done to see what sort of MDS performance is 
required, I would imagine that since we are mainly dealing
with large files, it might not be that critical. I also need to consider the 
stability of CephFS, RBD is relatively simple and is in
use by a large proportion of the Ceph community. CephFS is a lot easier to 
"upset".

 

Nick

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to