Going to resurrect this thread to provide another option:

LVM-cache, ie putting a cache device in-front of the bluestore-LVM LV.

I only mention this because I noticed it in the SUSE documentation for SES6 
(based on Nautilus) here: 
https://documentation.suse.com/ses/6/html/ses-all/lvmcache.html 
<https://documentation.suse.com/ses/6/html/ses-all/lvmcache.html>

>  If you plan to use a fast drive as an LVM cache for multiple OSDs, be aware 
> that all OSD operations (including replication) will go through the caching 
> device. All reads will be queried from the caching device, and are only 
> served from the slow device in case of a cache miss. Writes are always 
> applied to the caching device first, and are flushed to the slow device at a 
> later time ('writeback' is the default caching mode).
> When deciding whether to utilize an LVM cache, verify whether the fast drive 
> can serve as a front for multiple OSDs while still providing an acceptable 
> amount of IOPS. You can test it by measuring the maximum amount of IOPS that 
> the fast device can serve, and then dividing the result by the number of OSDs 
> behind the fast device. If the result is lower or close to the maximum amount 
> of IOPS that the OSD can provide without the cache, LVM cache is probably not 
> suited for this setup.
> The interaction of the LVM cache device with OSDs is important. Writes are 
> periodically flushed from the caching device to the slow device. If the 
> incoming traffic is sustained and significant, the caching device will 
> struggle to keep up with incoming requests as well as the flushing process, 
> resulting in performance drop. Unless the fast device can provide much more 
> IOPS with better latency than the slow device, do not use LVM cache with a 
> sustained high volume workload. Traffic in a burst pattern is more suited for 
> LVM cache as it gives the cache time to flush its dirty data without 
> interfering with client traffic. For a sustained low traffic workload, it is 
> difficult to guess in advance whether using LVM cache will improve 
> performance. The best test is to benchmark and compare the LVM cache setup 
> against the WAL/DB setup. Moreover, as small writes are heavy on the WAL 
> partition, it is suggested to use the fast device for the DB and/or WAL 
> instead of an LVM cache.


So it sounds like you could partition your NVMe for either LVM-cache, DB/WAL, 
or both?

Just figured this sounded a bit more akin to what you were looking for in your 
original post and figured I would share.

I don't use this, but figured I would share it.

Reed

> On Apr 4, 2020, at 9:12 AM, jes...@krogh.cc wrote:
> 
> Hi.
> 
> We have a need for "bulk" storage - but with decent write latencies.
> Normally we would do this with a DAS with a Raid5 with 2GB Battery
> backed write cache in front - As cheap as possible but still getting the
> features of scalability of ceph.
> 
> In our "first" ceph cluster we did the same - just stuffed in BBWC
> in the OSD nodes and we're fine - but now we're onto the next one and
> systems like:
> https://www.supermicro.com/en/products/system/1U/6119/SSG-6119P-ACR12N4L.cfm
> Does not support a Raid controller like that - but is branded as for "Ceph
> Storage Solutions".
> 
> It do however support 4 NVMe slots in the front - So - some level of
> "tiering" using the NVMe drives should be what is "suggested" - but what
> do people do? What is recommeneded. I see multiple options:
> 
> Ceph tiering at the "pool - layer":
> https://docs.ceph.com/docs/master/rados/operations/cache-tiering/
> And rumors that it is "deprectated:
> https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/2.0/html/release_notes/deprecated_functionality
> 
> Pro: Abstract layer
> Con: Deprecated? - Lots of warnings?
> 
> Offloading the block.db on NVMe / SSD:
> https://docs.ceph.com/docs/mimic/rados/configuration/bluestore-config-ref/
> 
> Pro: Easy to deal with - seem heavily supported.
> Con: As far as I can tell - this will only benefit the metadata of the
> osd- not actual data. Thus a data-commit to the osd til still be dominated
> by the writelatency of the underlying - very slow HDD.
> 
> Bcache:
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-June/027713.html
> 
> Pro: Closest to the BBWC mentioned above - but with way-way larger cache
> sizes.
> Con: It is hard to see if I end up being the only one on the planet using
> this
> solution.
> 
> Eat it - Writes will be as slow as hitting dead-rust - anything that
> cannot live
> with that need to be entirely on SSD/NVMe.
> 
> Other?
> 
> Thanks for your input.
> 
> Jesper
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to