ceph version: 17.2.0 on Ubuntu 22.04 non-containerized ceph from Ubuntu repos cluster started on luminous
I have been using bcache on filestore on rotating disks for many years without problems. Now converting OSDs to bluestore, there are some strange effects. If I create the bcache device, set its rotational flag to '1', then do ceph-volume lvm create ... --crush-device-class=hdd the OSD comes up with the right parameters and much improved latency compared to OSD directly on /dev/sdX. ceph osd metatdata ... shows "bluestore_bdev_type": "hdd", "rotational": "1" But after reboot, bcache rotational flag is set '0' again, and the OSD now comes up with "rotational": "0" Latency immediately starts to increase (and continually increases over the next days, possibly due to accumulating fragmention). These wrong settings stay in place even if I stop the OSD, set the bcache rotational flag to '1' again and restart the OSD. I have found no way to get back to the original settings other than destroying and recreating the OSD. I guess I am just not seeing something obvious, like from where these settings get pulled at OSD startup. I even created udev rules to set bcache rotational=1 at boot time, before any ceph daemon starts, but it did not help. Something running after these rules reset the bcache rotationl flags back to 0. Haven't found the culprit yet, but not sure if it even matters. Are these OSD settings (bluestore_bdev_type, rotational) persisted somewhere and can they be edited and pinned? Alternatively, can I manually set and persist the relevant bluestore tunables (per OSD / per device class) so as to make the bcache rotational flag irrelevant after the OSD is first created? Regards Matthias On Fri, Apr 08, 2022 at 03:05:38PM +0300, Igor Fedotov wrote: > Hi Frank, > > in fact this parameter impacts OSD behavior at both build-time and during > regular operationing. It simply substitutes hdd/ssd auto-detection with > manual specification. And hence relevant config parameters are applied. If > e.g. min_alloc_size is persistent after OSD creation - it wouldn't be > updated. But if specific setting allows at run-time - it would be altered. > > So the proper usage would definitely be manual ssd/hdd mode selection before > the first OSD creation and keeping it in that mode along the whole OSD > lifecycle. But technically one can change the mode at any arbitrary point in > time which would result in run-rime setting being out-of-sync with creation > ones. With some unclear side-effects.. > > Please also note that this setting was orignally intended mostly for > development/testing purposes not regular usage. Hence it's flexible but > rather unsafe if used improperly. > > > Thanks, > > Igor > > On 4/7/2022 2:40 PM, Frank Schilder wrote: > > Hi Richard and Igor, > > > > are these tweaks required at build-time (osd prepare) only or are they > > required for every restart? > > > > Is this setting "bluestore debug enforce settings=hdd" in the ceph config > > data base or set somewhere else? How does this work if deploying HDD- and > > SSD-OSDs at the same time? > > > > Ideally, all these tweaks should be applicable and settable at creation > > time only without affecting generic settings (that is, at the ceph-volume > > command line and not via config side effects). Otherwise it becomes really > > tedious to manage these. > > > > For example, would the following work-flow apply the correct settings > > *permanently* across restarts: > > > > 1) Prepare OSD on fresh HDD with ceph-volume lvm batch --prepare ... > > 2) Assign dm_cache to logical OSD volume created in step 1 > > 3) Start OSD, restart OSDs, boot server ... > > > > I would assume that the HDD settings are burned into the OSD in step 1 and > > will be used in all future (re-)starts without the need to do anything > > despite the device being detected as non-rotational after step 2. Is this > > assumption correct? > > > > Thanks and best regards, > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > ________________________________________ > > From: Richard Bade <hitr...@gmail.com> > > Sent: 06 April 2022 00:43:48 > > To: Igor Fedotov > > Cc: Ceph Users > > Subject: [Warning Possible spam] [ceph-users] Re: Ceph Bluestore tweaks > > for Bcache > > > > Just for completeness for anyone that is following this thread. Igor > > added that setting in Octopus, so unfortunately I am unable to use it > > as I am still on Nautilus. > > > > Thanks, > > Rich > > > > On Wed, 6 Apr 2022 at 10:01, Richard Bade <hitr...@gmail.com> wrote: > > > Thanks Igor for the tip. I'll see if I can use this to reduce the > > > number of tweaks I need. > > > > > > Rich > > > > > > On Tue, 5 Apr 2022 at 21:26, Igor Fedotov <igor.fedo...@croit.io> wrote: > > > > Hi Richard, > > > > > > > > just FYI: one can use "bluestore debug enforce settings=hdd" config > > > > parameter to manually enforce HDD-related settings for a BlueStore > > > > > > > > > > > > Thanks, > > > > > > > > Igor > > > > > > > > On 4/5/2022 1:07 AM, Richard Bade wrote: > > > > > Hi Everyone, > > > > > I just wanted to share a discovery I made about running bluestore on > > > > > top of Bcache in case anyone else is doing this or considering it. > > > > > We've run Bcache under Filestore for a long time with good results but > > > > > recently rebuilt all the osds on bluestore. This caused some > > > > > degradation in performance that I couldn't quite put my finger on. > > > > > Bluestore osds have some smarts where they detect the disk type. > > > > > Unfortunately in the case of Bcache it detects as SSD, when in fact > > > > > the HDD parameters are better suited. > > > > > I changed the following parameters to match the HDD default values and > > > > > immediately saw my average osd latency during normal workload drop > > > > > from 6ms to 2ms. Peak performance didn't change really, but a test > > > > > machine that I have running a constant iops workload was much more > > > > > stable as was the average latency. > > > > > Performance has returned to Filestore or better levels. > > > > > Here are the parameters. > > > > > > > > > > ; Make sure that we use values appropriate for HDD not SSD - Bcache > > > > > gets detected as SSD > > > > > bluestore_prefer_deferred_size = 32768 > > > > > bluestore_compression_max_blob_size = 524288 > > > > > bluestore_deferred_batch_ops = 64 > > > > > bluestore_max_blob_size = 524288 > > > > > bluestore_min_alloc_size = 65536 > > > > > bluestore_throttle_cost_per_io = 670000 > > > > > > > > > > ; Try to improve responsiveness when some disks are fully utilised > > > > > osd_op_queue = wpq > > > > > osd_op_queue_cut_off = high > > > > > > > > > > Hopefully someone else finds this useful. > > > > > _______________________________________________ > > > > > ceph-users mailing list -- ceph-users@ceph.io > > > > > To unsubscribe send an email to ceph-users-le...@ceph.io > > > > -- > > > > Igor Fedotov > > > > Ceph Lead Developer > > > > > > > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > > > > > > > croit GmbH, Freseniusstr. 31h, 81247 Munich > > > > CEO: Martin Verges - VAT-ID: DE310638492 > > > > Com. register: Amtsgericht Munich HRB 231263 > > > > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > > > > > _______________________________________________ > > ceph-users mailing list -- ceph-users@ceph.io > > To unsubscribe send an email to ceph-users-le...@ceph.io > > -- > Igor Fedotov > Ceph Lead Developer > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH, Freseniusstr. 31h, 81247 Munich > CEO: Martin Verges - VAT-ID: DE310638492 > Com. register: Amtsgericht Munich HRB 231263 > Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io