Re: [ceph-users] Performance tuning for SAN SSD config

Vasu Kulkarni Fri, 06 Jul 2018 12:04:20 -0700

On Fri, Jul 6, 2018 at 11:19 AM, Matthew Stroud
<mattstr...@overstock.com> wrote:
> Good to note about the replica set, we will stick with 3. We really aren't 
> concerned about the overhead, but the additional IO that occurs during writes 
> that have an additional copy.
>
> To be clear, we aren't using ceph in place of FC, nor the other way around. 
> We have discovered that SAN storage is cheaper (this one was surprising to 
> me) and better performant than direct attached storage (DAS) on the small 
> scale that we are building things (20T to about 100T). I'm sure that would 
> switch if we were much larger, but for now SAN is better. In summary we are 
> using SAN pretty much as a DAS and ceph uses those SAN disks for OSDs.
That is interesting to know.
>
> The biggest issue we see is slow requests during rebuilds or node/osd 
> failures but the disks and network just aren't being to their fullest. That 
> would lead me to believe that there are some host and/or osd process 
> bottlenecks going on. Other than that, just increasing the performance of our 
> ceph cluster would be a plus and that is what I'm exploring.
>
> As per test numbers, I can't run that right now because the systems we have 
> are in prod and I don't want to impact that for io testing. However, we do 
> have a new cluster coming online shortly and I could do some benchmarking 
> there and get that back to you.
no problem, thanks.


>
> However as memory serves, we were only getting something about 90-100k iops 
> and about 15 - 50 ms latency with 10 servers running fio with 50% of random 
> and sequential workloads. With a single vm, we were getting about 14k iops 
> with about 10 - 30 ms of latency.
you will have to account for the network traffic/bandwidth too as it
has to replicate across nodes.
>
> Thanks,
> Matthew Stroud
>
> On 7/6/18, 11:12 AM, "Vasu Kulkarni" <vakul...@redhat.com> wrote:
>
>     On Fri, Jul 6, 2018 at 8:38 AM, Matthew Stroud <mattstr...@overstock.com> 
> wrote:
>     >
>     > Thanks for the reply.
>     >
>     >
>     >
>     > Actually we are using fiber channel (it’s so much more performant than 
> iscsi in our tests) as the primary storage and this is serving up traffic for 
> RBD for openstack, so this isn’t for backups.
>     >
>     >
>     >
>     > Our biggest bottle neck is trying utilize the host and/or osd process 
> correctly. The disks are running at sub-millisecond, with about 90% of the IO 
> being pulled from the array’s cache (a.k.a. not even hitting the disks). 
> According to the host, we never get north of 20% disk utilization, unless 
> there is a deep scrub going on.
>     >
>     >
>     >
>     > We have debated about putting the replica size to 2 instead of 3. 
> However this isn’t much of a win for the purestorage which dedupes on the 
> backend, so having copies of data are relatively free for that unit. 1 
> wouldn’t work because this is hosting a production work load.
>
>     It is a mistake to use replica of 2 for production, when one of the
>     copy is corrupted its hard to fix things. if you are concerned about
>     storage overhead there is an option to use EC pools in luminous.  To
>     get back to your original question if you are comparing the
>     network/disk utilization with FC numbers than that is wrong
>     comparison,  They are 2 different storage systems with different
>     purposes, Ceph is scale out object storage system unlike FC systems
>     where you can use commodity hardware and grow as you need, you
>     generally dont need hba/fc enclosed disks but nothing stopping you
>     from using your existing system. Also you generally dont need any raid
>     mirroring configurations in the backend since ceph will handle the
>     redundancy for you. scale out systems have more work to do than
>     traditional FC systems. There are minimal configuration options for
>     bluestore , what kind of disk/network utilization slowdown you are
>     seeing? can you publish your numbers and test data?
>
>     >
>     >
>     > Thanks,
>     >
>     > Matthew Stroud
>     >
>     >
>     >
>     > From: Maged Mokhtar <mmokh...@petasan.org>
>     > Date: Friday, July 6, 2018 at 7:01 AM
>     > To: Matthew Stroud <mattstr...@overstock.com>
>     > Cc: ceph-users <ceph-users@lists.ceph.com>
>     > Subject: Re: [ceph-users] Performance tuning for SAN SSD config
>     >
>     >
>     >
>     >
>     >
>     > On 2018-06-29 18:30, Matthew Stroud wrote:
>     >
>     > We back some of our ceph clusters with SAN SSD disk, particularly VSP 
> G/F and Purestorage. I'm curious what are some settings we should look into 
> modifying to take advantage of our SAN arrays. We had to manually set the 
> class for the luns to SSD class which was a big improvement. However we still 
> see situations where we get slow requests and the underlying disks and 
> network are underutilized.
>     >
>     >
>     >
>     > More info about our setup. We are running centos 7 with Luminous as our 
> ceph release. We have 4 osd nodes that have 5x2TB disks each and they are 
> setup as bluestore. Our ceph.conf is attached with some information removed 
> for security reasons.
>     >
>     >
>     >
>     > Thanks ahead of time.
>     >
>     >
>     >
>     > Thanks,
>     >
>     > Matthew Stroud
>     >
>     >
>     >
>     > ________________________________
>     >
>     >
>     > CONFIDENTIALITY NOTICE: This message is intended only for the use and 
> review of the individual or entity to which it is addressed and may contain 
> information that is privileged and confidential. If the reader of this 
> message is not the intended recipient, or the employee or agent responsible 
> for delivering the message solely to the intended recipient, you are hereby 
> notified that any dissemination, distribution or copying of this 
> communication is strictly prohibited. If you have received this communication 
> in error, please notify sender immediately by telephone or return email. 
> Thank you.
>     >
>     > _______________________________________________
>     > ceph-users mailing list
>     > ceph-users@lists.ceph.com
>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >
>     >
>     >
>     > If i understand correctly, you are using luns (via iSCSI) from your 
> external SAN as OSDs and created a separate pool with these OSDs with device 
> class SSD, you are using this pool for backup.
>     >
>     > Some comments:
>     >
>     > Using external disks as OSDs is probably not that common. It may be 
> better to keep the SAN and Ceph cluster separate and have your backup tool 
> access both, it will also be safer in case of disaster to the cluster your 
> backup will be on a separate system.
>     > What backup tool/script are you using ? it is better that this tool 
> uses high queue depth, large block sizes and memory/page cache to increase 
> performance during copies.
>     > To try to pin down where your current bottleneck is, i would run 
> benchmarks (eg fio) using the block sizes used by your backup tool on the raw 
> luns before being added as OSDs (as pure iSCSI disks) as well as on both the 
> main and backup pools. Have a resource tool (eg atop/systat/collectl) run 
> during these tests to check for resources: disks %busy/cores %busy/io_wait
>     > You probably can use replica count of 1 for the SAN OSDs since they 
> include their own RAID redundancy.
>     >
>     > Maged
>     >
>     >
>     > ________________________________
>     >
>     > CONFIDENTIALITY NOTICE: This message is intended only for the use and 
> review of the individual or entity to which it is addressed and may contain 
> information that is privileged and confidential. If the reader of this 
> message is not the intended recipient, or the employee or agent responsible 
> for delivering the message solely to the intended recipient, you are hereby 
> notified that any dissemination, distribution or copying of this 
> communication is strictly prohibited. If you have received this communication 
> in error, please notify sender immediately by telephone or return email. 
> Thank you.
>     >
>     > _______________________________________________
>     > ceph-users mailing list
>     > ceph-users@lists.ceph.com
>     > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>     >
>
>
>
> ________________________________
>
> CONFIDENTIALITY NOTICE: This message is intended only for the use and review 
> of the individual or entity to which it is addressed and may contain 
> information that is privileged and confidential. If the reader of this 
> message is not the intended recipient, or the employee or agent responsible 
> for delivering the message solely to the intended recipient, you are hereby 
> notified that any dissemination, distribution or copying of this 
> communication is strictly prohibited. If you have received this communication 
> in error, please notify sender immediately by telephone or return email. 
> Thank you.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Performance tuning for SAN SSD config

Reply via email to