Going to resurrect this thread to provide another option: LVM-cache, ie putting a cache device in-front of the bluestore-LVM LV.
I only mention this because I noticed it in the SUSE documentation for SES6 (based on Nautilus) here: https://documentation.suse.com/ses/6/html/ses-all/lvmcache.html <https://documentation.suse.com/ses/6/html/ses-all/lvmcache.html> > If you plan to use a fast drive as an LVM cache for multiple OSDs, be aware > that all OSD operations (including replication) will go through the caching > device. All reads will be queried from the caching device, and are only > served from the slow device in case of a cache miss. Writes are always > applied to the caching device first, and are flushed to the slow device at a > later time ('writeback' is the default caching mode). > When deciding whether to utilize an LVM cache, verify whether the fast drive > can serve as a front for multiple OSDs while still providing an acceptable > amount of IOPS. You can test it by measuring the maximum amount of IOPS that > the fast device can serve, and then dividing the result by the number of OSDs > behind the fast device. If the result is lower or close to the maximum amount > of IOPS that the OSD can provide without the cache, LVM cache is probably not > suited for this setup. > The interaction of the LVM cache device with OSDs is important. Writes are > periodically flushed from the caching device to the slow device. If the > incoming traffic is sustained and significant, the caching device will > struggle to keep up with incoming requests as well as the flushing process, > resulting in performance drop. Unless the fast device can provide much more > IOPS with better latency than the slow device, do not use LVM cache with a > sustained high volume workload. Traffic in a burst pattern is more suited for > LVM cache as it gives the cache time to flush its dirty data without > interfering with client traffic. For a sustained low traffic workload, it is > difficult to guess in advance whether using LVM cache will improve > performance. The best test is to benchmark and compare the LVM cache setup > against the WAL/DB setup. Moreover, as small writes are heavy on the WAL > partition, it is suggested to use the fast device for the DB and/or WAL > instead of an LVM cache. So it sounds like you could partition your NVMe for either LVM-cache, DB/WAL, or both? Just figured this sounded a bit more akin to what you were looking for in your original post and figured I would share. I don't use this, but figured I would share it. Reed > On Apr 4, 2020, at 9:12 AM, jes...@krogh.cc wrote: > > Hi. > > We have a need for "bulk" storage - but with decent write latencies. > Normally we would do this with a DAS with a Raid5 with 2GB Battery > backed write cache in front - As cheap as possible but still getting the > features of scalability of ceph. > > In our "first" ceph cluster we did the same - just stuffed in BBWC > in the OSD nodes and we're fine - but now we're onto the next one and > systems like: > https://www.supermicro.com/en/products/system/1U/6119/SSG-6119P-ACR12N4L.cfm > Does not support a Raid controller like that - but is branded as for "Ceph > Storage Solutions". > > It do however support 4 NVMe slots in the front - So - some level of > "tiering" using the NVMe drives should be what is "suggested" - but what > do people do? What is recommeneded. I see multiple options: > > Ceph tiering at the "pool - layer": > https://docs.ceph.com/docs/master/rados/operations/cache-tiering/ > And rumors that it is "deprectated: > https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2.0/html/release_notes/deprecated_functionality > > Pro: Abstract layer > Con: Deprecated? - Lots of warnings? > > Offloading the block.db on NVMe / SSD: > https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/ > > Pro: Easy to deal with - seem heavily supported. > Con: As far as I can tell - this will only benefit the metadata of the > osd- not actual data. Thus a data-commit to the osd til still be dominated > by the writelatency of the underlying - very slow HDD. > > Bcache: > http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027713.html > > Pro: Closest to the BBWC mentioned above - but with way-way larger cache > sizes. > Con: It is hard to see if I end up being the only one on the planet using > this > solution. > > Eat it - Writes will be as slow as hitting dead-rust - anything that > cannot live > with that need to be entirely on SSD/NVMe. > > Other? > > Thanks for your input. > > Jesper > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io
smime.p7s
Description: S/MIME cryptographic signature
_______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io