Re: [ceph-users] VMware + Ceph using NFS sync/async ?

Nick Fisk Wed, 16 Aug 2017 07:13:37 -0700

Hi Matt,


Well behaved applications are the problem here. ESXi sends all writes as sync 
writes. So although OS’s will still do their own buffering, any ESXi level 
operation is all done as sync. This is probably seen the greatest when 
migrating vm’s between datastores, everything gets done as sync 64KB ios 
meaning, copying a 1TB VM can often take nearly 24 hours.

 

Osama, can you describe the difference in performance you see between Openstack 
and ESXi and what type of operations are these? Sync writes should be the same 
no matter the client, except in the NFS case you will have an extra network hop 
and potentially a little bit of PG congestion around the FS journal on the RBd 
device.

 

Osama, you can’t compare Ceph to a SAN. Just in terms of network latency you 
have an extra 2 hops. In ideal scenario you might be able to get Ceph write 
latency down to 0.5-1ms for a 4kb io, compared to to about 0.1-0.3 for a 
storage array. However, what you will find with Ceph is that other things start 
to increase this average long before you would start to see this on storage 
arrays. 

 

The migration is a good example of this. As I said, ESXi migrates a vm in 64KB 
io’s, but does 32 of these blocks in parallel at a time. On storage arrays, 
these 64KB io’s are coalesced in the battery protected write cached into bigger 
IO’s before being persisted to disk. The storage array can also accept all 32 
of these requests at once.

 

A similar thing happens in Ceph/RBD/NFS via the Ceph filestore journal, but 
that coalescing is now an extra 2 hops away and with a bit of extra latency 
introduced by the Ceph code, we are already a bit slower. But here’s the 
killer, PG locking!!! You can’t write 32 IO’s in parallel to the same 
object/PG, each one has to be processed sequentially because of the locks. 
(Please someone correct me if I’m wrong here). If your 64KB write latency is 
2ms, then you can only do 500 64KB IO’s a second. 64KB*500=~30MB/s vs a Storage 
Array which would be doing the operation in the hundreds of MB/s range.

 

Note: When proper iSCSI for RBD support is finished, you might be able to use 
the VAAI offloads, which would dramatically increase performance for migrations 
as well.

 

Also once persistent SSD write caching for librbd becomes available, a lot of 
these problems will go away, as the SSD will behave like a storage array’s 
write cache and will only be 1 hop away from the client as well.

 

From: Matt Benjamin [mailto:mbenj...@redhat.com] 
Sent: 16 August 2017 14:49
To: Osama Hasebou <osama.hase...@csc.fi>
Cc: n...@fisk.me.uk; ceph-users <ceph-users@lists.ceph.com>
Subject: Re: [ceph-users] VMware + Ceph using NFS sync/async ?

 

Hi Osama,

I don't have a clear sense of the the application workflow here--and Nick 
appears to--but I thought it worth noting that NFSv3 and NFSv4 clients 
shouldn't normally need the sync mount option to achieve i/o stability with 
well-behaved applications.  In both versions of the protocol, an application 
write that is synchronous (or, more typically, the equivalent application sync 
barrier) should not succeed until an NFS-protocol COMMIT (or in some cases 
w/NFSv4, WRITE w/stable flag set) has been acknowledged by the NFS server.  If 
the NFS i/o stability model is insufficient for a your workflow, moreover, I'd 
be worried that -osync writes (which might be incompletely applied during a 
failure event) may not be correctly enforcing your invariant, either.

 

Matt

 

On Wed, Aug 16, 2017 at 8:33 AM, Osama Hasebou <osama.hase...@csc.fi 
<mailto:osama.hase...@csc.fi> > wrote:

Hi Nick,

 

Thanks for replying! If Ceph is combined with Openstack then, does that mean 
that actually when openstack writes are happening, it is not fully sync'd (as 
in written to disks) before it starts receiving more data, so acting as async ? 
In that scenario there is a chance for data loss if things go bad, i.e power 
outage or something like that ?

 

As for the slow operations, reading is quite fine when I compare it to a SAN 
storage system connected to VMware. It is writing data, small chunks or big 
ones, that suffer when trying to use the sync option with FIO for benchmarking.

 

In that case, I wonder, is no one using CEPH with VMware in a production 
environment ?

 

Cheers.

 

Regards,
Ossi

 

 

 

Hi Osama,

 

This is a known problem with many software defined storage stacks, but 
potentially slightly worse with Ceph due to extra overheads. Sync writes have 
to wait until all copies of the data are written to disk by the OSD and 
acknowledged back to the client. The extra network hops for replication and NFS 
gateways add significant latency which impacts the time it takes to carry out 
small writes. The Ceph code also takes time to process each IO request.

 

What particular operations are you finding slow? Storage vmotions are just bad, 
and I don’t think there is much that can be done about them as they are split 
into lots of 64kb IO’s.

 

One thing you can try is to force the CPU’s on your OSD nodes to run at C1 
cstate and force their minimum frequency to 100%. This can have quite a large 
impact on latency. Also you don’t specify your network, but 10G is a must.

 

Nick

 

 

From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com 
<mailto:ceph-users-boun...@lists.ceph.com> ] On Behalf Of Osama Hasebou
Sent: 14 August 2017 12:27
To: ceph-users <ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> >
Subject: [ceph-users] VMware + Ceph using NFS sync/async ?

 

Hi Everyone,

 

We started testing the idea of using Ceph storage with VMware, the idea was to 
provide Ceph storage through open stack to VMware, by creating a virtual 
machine coming from Ceph + Openstack , which acts as an NFS gateway, then mount 
that storage on top of VMware cluster.

 

When mounting the NFS exports using the sync option, we noticed a huge 
degradation in performance which makes it very slow to use it in production, 
the async option makes it much better but then there is the risk of it being 
risky that in case a failure shall happen, some data might be lost in that 
Scenario.

 

Now I understand that some people in the ceph community are using Ceph with 
VMware using NFS gateways, so if you can kindly shed some light on your 
experience, and if you do use it in production purpose, that would be great and 
how did you mitigate the sync/async options and keep write performance.

 

 

Thanks you!!!

 

Regards,
Ossi





_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> 
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] VMware + Ceph using NFS sync/async ?

Reply via email to