> I'd be interested in details of this small versus large bit.

The smaller shares is just simply to distribute the workload over more RBDs so 
the bottleneck doesn’t become the RBD device. The size itself doesn’t 
particularly matter but just the idea to distribute VMs across many shares 
rather than a few large datastores.

We originally started with 10TB shares, just because we had the space - but we 
found performance was running out before capacity did.  But it's been apparent 
that the limitation appears to be at the RBD level, particularly with writes.  
So under heavy usage with say VMWare snapshot backups VMs gets impacted by 
higher latency to the point that some VMs become unresponsive for small 
periods.  The ceph cluster itself has plenty of performance available and 
handles far higher workload periods, but individual RBD devices just seem to 
hit the wall.

For example, one of our shares will sit there all day happily doing 3-400 IOPS 
read at very low latencies.  During the backup period we get heavier writes as 
snapshots are created and cleaned up.   That increased write activity pushes 
the RBD to 100% busy and read latencies go up from 1-2ms to 20-30ms, even 
though the number of reads doesn’t change that much.   The devices though can 
handle more, I can see periods of up to 1800 IOPS read and 800 write.

There is probably more tuning that can be applied at the XFS/NFS level, but for 
the moment that’s the direction we are taking - creating more shares.

>
> Would you say that the IOPS starvation is more an issue of the large
> filesystem than the underlying Ceph/RBD?

As above - I think its more to do with an IOPS limitation at the RBD device 
level - likely due to sync write latency limiting the number of effective IOs.  
That might be XFS as well but I have not had the chance to dial that in more.

> With a cache-tier in place I'd expect all hot FS objects (inodes, etc) to be
> there and thus be as fast as it gets from a Ceph perspective.

Yeah - the cache teir takes a fair bit of the heat and improves the response 
considerably for the SATA environments - it makes a significant difference.  
The SSD only pool images behave in a similar way but operate to a much higher 
performance level before they start showing issues.

> OTOH lots of competing accesses to same journal, inodes would be a
> limitation inherent to the FS.

Its likely there is tuning there to improve the XFS performance, but from the 
stats of the RBD device they are showing the latencies going up, there might be 
more impact further up the stack, but the underlying device shows the change in 
performance.

>
> Christian
>
> >
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> > Of Osama Hasebou
> > Sent: Wednesday, 16 August 2017 10:34 PM
> > To: n...@fisk.me.uk
> > Cc: ceph-users <ceph-users@lists.ceph.com>
> > Subject: Re: [ceph-users] VMware + Ceph using NFS sync/async ?
> >
> > Hi Nick,
> >
> > Thanks for replying! If Ceph is combined with Openstack then, does that
> mean that actually when openstack writes are happening, it is not fully sync'd
> (as in written to disks) before it starts receiving more data, so acting as 
> async
> ? In that scenario there is a chance for data loss if things go bad, i.e power
> outage or something like that ?
> >
> > As for the slow operations, reading is quite fine when I compare it to a SAN
> storage system connected to VMware. It is writing data, small chunks or big
> ones, that suffer when trying to use the sync option with FIO for
> benchmarking.
> >
> > In that case, I wonder, is no one using CEPH with VMware in a production
> environment ?
> >
> > Cheers.
> >
> > Regards,
> > Ossi
> >
> >
> >
> > Hi Osama,
> >
> > This is a known problem with many software defined storage stacks, but
> potentially slightly worse with Ceph due to extra overheads. Sync writes
> have to wait until all copies of the data are written to disk by the OSD and
> acknowledged back to the client. The extra network hops for replication and
> NFS gateways add significant latency which impacts the time it takes to carry
> out small writes. The Ceph code also takes time to process each IO request.
> >
> > What particular operations are you finding slow? Storage vmotions are just
> bad, and I don’t think there is much that can be done about them as they are
> split into lots of 64kb IO’s.
> >
> > One thing you can try is to force the CPU’s on your OSD nodes to run at C1
> cstate and force their minimum frequency to 100%. This can have quite a
> large impact on latency. Also you don’t specify your network, but 10G is a
> must.
> >
> > Nick
> >
> >
> > From: ceph-users [mailto:ceph-users-boun...@lists.ceph.com] On Behalf
> > Of Osama Hasebou
> > Sent: 14 August 2017 12:27
> > To: ceph-users
> > <ceph-users@lists.ceph.com<mailto:ceph-users@lists.ceph.com>>
> > Subject: [ceph-users] VMware + Ceph using NFS sync/async ?
> >
> > Hi Everyone,
> >
> > We started testing the idea of using Ceph storage with VMware, the idea
> was to provide Ceph storage through open stack to VMware, by creating a
> virtual machine coming from Ceph + Openstack , which acts as an NFS
> gateway, then mount that storage on top of VMware cluster.
> >
> > When mounting the NFS exports using the sync option, we noticed a huge
> degradation in performance which makes it very slow to use it in production,
> the async option makes it much better but then there is the risk of it being
> risky that in case a failure shall happen, some data might be lost in that
> Scenario.
> >
> > Now I understand that some people in the ceph community are using Ceph
> with VMware using NFS gateways, so if you can kindly shed some light on
> your experience, and if you do use it in production purpose, that would be
> great and how did you mitigate the sync/async options and keep write
> performance.
> >
> >
> > Thanks you!!!
> >
> > Regards,
> > Ossi
> >
> >
> > Confidentiality: This email and any attachments are confidential and may be
> subject to copyright, legal or some other professional privilege. They are
> intended solely for the attention and use of the named addressee(s). They
> may only be copied, distributed or disclosed with the consent of the
> copyright owner. If you have received this email by mistake or by breach of
> the confidentiality clause, please notify the sender immediately by return
> email and delete or destroy all copies of the email. Any confidentiality,
> privilege or copyright is not waived or lost because this email has been sent
> to you by mistake.
>
>
>
>
> --
> Christian Balzer        Network/Systems Engineer
> ch...@gol.com   Rakuten Communications
Confidentiality: This email and any attachments are confidential and may be 
subject to copyright, legal or some other professional privilege. They are 
intended solely for the attention and use of the named addressee(s). They may 
only be copied, distributed or disclosed with the consent of the copyright 
owner. If you have received this email by mistake or by breach of the 
confidentiality clause, please notify the sender immediately by return email 
and delete or destroy all copies of the email. Any confidentiality, privilege 
or copyright is not waived or lost because this email has been sent to you by 
mistake.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to