On Fri, Jul 6, 2018 at 11:19 AM, Matthew Stroud <mattstr...@overstock.com> wrote: > Good to note about the replica set, we will stick with 3. We really aren't > concerned about the overhead, but the additional IO that occurs during writes > that have an additional copy. > > To be clear, we aren't using ceph in place of FC, nor the other way around. > We have discovered that SAN storage is cheaper (this one was surprising to > me) and better performant than direct attached storage (DAS) on the small > scale that we are building things (20T to about 100T). I'm sure that would > switch if we were much larger, but for now SAN is better. In summary we are > using SAN pretty much as a DAS and ceph uses those SAN disks for OSDs. That is interesting to know. > > The biggest issue we see is slow requests during rebuilds or node/osd > failures but the disks and network just aren't being to their fullest. That > would lead me to believe that there are some host and/or osd process > bottlenecks going on. Other than that, just increasing the performance of our > ceph cluster would be a plus and that is what I'm exploring. > > As per test numbers, I can't run that right now because the systems we have > are in prod and I don't want to impact that for io testing. However, we do > have a new cluster coming online shortly and I could do some benchmarking > there and get that back to you. no problem, thanks.
> > However as memory serves, we were only getting something about 90-100k iops > and about 15 - 50 ms latency with 10 servers running fio with 50% of random > and sequential workloads. With a single vm, we were getting about 14k iops > with about 10 - 30 ms of latency. you will have to account for the network traffic/bandwidth too as it has to replicate across nodes. > > Thanks, > Matthew Stroud > > On 7/6/18, 11:12 AM, "Vasu Kulkarni" <vakul...@redhat.com> wrote: > > On Fri, Jul 6, 2018 at 8:38 AM, Matthew Stroud <mattstr...@overstock.com> > wrote: > > > > Thanks for the reply. > > > > > > > > Actually we are using fiber channel (it’s so much more performant than > iscsi in our tests) as the primary storage and this is serving up traffic for > RBD for openstack, so this isn’t for backups. > > > > > > > > Our biggest bottle neck is trying utilize the host and/or osd process > correctly. The disks are running at sub-millisecond, with about 90% of the IO > being pulled from the array’s cache (a.k.a. not even hitting the disks). > According to the host, we never get north of 20% disk utilization, unless > there is a deep scrub going on. > > > > > > > > We have debated about putting the replica size to 2 instead of 3. > However this isn’t much of a win for the purestorage which dedupes on the > backend, so having copies of data are relatively free for that unit. 1 > wouldn’t work because this is hosting a production work load. > > It is a mistake to use replica of 2 for production, when one of the > copy is corrupted its hard to fix things. if you are concerned about > storage overhead there is an option to use EC pools in luminous. To > get back to your original question if you are comparing the > network/disk utilization with FC numbers than that is wrong > comparison, They are 2 different storage systems with different > purposes, Ceph is scale out object storage system unlike FC systems > where you can use commodity hardware and grow as you need, you > generally dont need hba/fc enclosed disks but nothing stopping you > from using your existing system. Also you generally dont need any raid > mirroring configurations in the backend since ceph will handle the > redundancy for you. scale out systems have more work to do than > traditional FC systems. There are minimal configuration options for > bluestore , what kind of disk/network utilization slowdown you are > seeing? can you publish your numbers and test data? > > > > > > > Thanks, > > > > Matthew Stroud > > > > > > > > From: Maged Mokhtar <mmokh...@petasan.org> > > Date: Friday, July 6, 2018 at 7:01 AM > > To: Matthew Stroud <mattstr...@overstock.com> > > Cc: ceph-users <ceph-users@lists.ceph.com> > > Subject: Re: [ceph-users] Performance tuning for SAN SSD config > > > > > > > > > > > > On 2018-06-29 18:30, Matthew Stroud wrote: > > > > We back some of our ceph clusters with SAN SSD disk, particularly VSP > G/F and Purestorage. I'm curious what are some settings we should look into > modifying to take advantage of our SAN arrays. We had to manually set the > class for the luns to SSD class which was a big improvement. However we still > see situations where we get slow requests and the underlying disks and > network are underutilized. > > > > > > > > More info about our setup. We are running centos 7 with Luminous as our > ceph release. We have 4 osd nodes that have 5x2TB disks each and they are > setup as bluestore. Our ceph.conf is attached with some information removed > for security reasons. > > > > > > > > Thanks ahead of time. > > > > > > > > Thanks, > > > > Matthew Stroud > > > > > > > > ________________________________ > > > > > > CONFIDENTIALITY NOTICE: This message is intended only for the use and > review of the individual or entity to which it is addressed and may contain > information that is privileged and confidential. If the reader of this > message is not the intended recipient, or the employee or agent responsible > for delivering the message solely to the intended recipient, you are hereby > notified that any dissemination, distribution or copying of this > communication is strictly prohibited. If you have received this communication > in error, please notify sender immediately by telephone or return email. > Thank you. > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > If i understand correctly, you are using luns (via iSCSI) from your > external SAN as OSDs and created a separate pool with these OSDs with device > class SSD, you are using this pool for backup. > > > > Some comments: > > > > Using external disks as OSDs is probably not that common. It may be > better to keep the SAN and Ceph cluster separate and have your backup tool > access both, it will also be safer in case of disaster to the cluster your > backup will be on a separate system. > > What backup tool/script are you using ? it is better that this tool > uses high queue depth, large block sizes and memory/page cache to increase > performance during copies. > > To try to pin down where your current bottleneck is, i would run > benchmarks (eg fio) using the block sizes used by your backup tool on the raw > luns before being added as OSDs (as pure iSCSI disks) as well as on both the > main and backup pools. Have a resource tool (eg atop/systat/collectl) run > during these tests to check for resources: disks %busy/cores %busy/io_wait > > You probably can use replica count of 1 for the SAN OSDs since they > include their own RAID redundancy. > > > > Maged > > > > > > ________________________________ > > > > CONFIDENTIALITY NOTICE: This message is intended only for the use and > review of the individual or entity to which it is addressed and may contain > information that is privileged and confidential. If the reader of this > message is not the intended recipient, or the employee or agent responsible > for delivering the message solely to the intended recipient, you are hereby > notified that any dissemination, distribution or copying of this > communication is strictly prohibited. If you have received this communication > in error, please notify sender immediately by telephone or return email. > Thank you. > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > ________________________________ > > CONFIDENTIALITY NOTICE: This message is intended only for the use and review > of the individual or entity to which it is addressed and may contain > information that is privileged and confidential. If the reader of this > message is not the intended recipient, or the employee or agent responsible > for delivering the message solely to the intended recipient, you are hereby > notified that any dissemination, distribution or copying of this > communication is strictly prohibited. If you have received this communication > in error, please notify sender immediately by telephone or return email. > Thank you. _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com