Ok, so I did some testing on each of these parameters one by one; removing them from the config, watching the latency for a few minutes then adding them back again. None of them had any conclusive, statistically significant impact on the latency except bluestore_prefer_deferred_size. I removed it like this: sudo ceph config rm osd/class:hdd bluestore_prefer_deferred_size
and my latency immediately increased from 2ms to 6ms. So I added it back again: sudo ceph config set osd/class:hdd bluestore_prefer_deferred_size 32768 latency immediately dropped back to 2ms. So this parameter is definitely able to be applied at runtime and makes a difference to how my osds perform. As I am using separate db partitions on the ssd this is to be expected when more is being pushed through the wal, which I believe is what this parameter is doing. I also notice the activity on the wal increases across these osds. I also tested the other way, by removing all the parameters I mentioned earlier and just adding this one. The results were the same. So I guess an update to my original post is that when using bcache make sure that you tweak the bluestore_prefer_deferred_size at least. The bluestore_prefer_deferred_size_hdd value of 32768 works well for me but there may be other values that are better. Rich On Mon, 11 Apr 2022 at 09:23, Richard Bade <hitr...@gmail.com> wrote: > > Hi Frank, > Thanks for your insight on this. I had done a bunch of testing on this > over a year ago and found improvements with these settings. I then > applied them all at once to our production cluster and confirmed the > 3x reduction in latency, however I did not test the settings > individually. > It could well be, as you say, that the settings cannot be changed at > runtime and that in fact only the other settings such as op queue and > throttle cost are making the difference. I'll attempt to test the > settings again this week and see which ones are actually affecting > latency during runtime setting. > > > I'm not sure why with your OSD creation procedure the data part is created > > with the correct HDD parameters. > I believe that at prepare time my osds get all SSD parameters. That's > why I manually change the class and these runtime settings. > > Rich > > On Sat, 9 Apr 2022 at 00:22, Frank Schilder <fr...@dtu.dk> wrote: > > > > Hi Richard, > > > > thanks for the additional info, now I understand the whole scenario and > > what might be different when using lvm and dm_cache. > > > > > In my process, bcache is added before osd creation as bcache creates a > > > disk device called /dev/bcache0 for example. This is used for the data > > > > This is an important detail. As far as I know, dm_cache is transparent. It > > can be added/removed at run time and doesn't create a new device. However, > > I don't know if it changes the rotational attribute of the LVM device. > > > > I'm not sure why with your OSD creation procedure the data part is created > > with the correct HDD parameters. I believe at least these if not more > > parameters are used at prepare time only and cannot be changed after the > > OSD is created: > > > > bluestore_prefer_deferred_size = 32768 > > bluestore_compression_max_blob_size = 524288 > > bluestore_max_blob_size = 524288 > > bluestore_min_alloc_size = 65536 > > > > If you set these for osd/class:hdd they should *not* be used if the initial > > device class is ssd. If I understood you correctly, you create an OSD with > > class=ssd and then change its class to class=hdd. At this point, however, > > it is too late, the hard-coded ssd options should persist. I wonder if > > using a command like > > > > ceph-volume lvm batch --crush-device-class hdd ... > > > > will select the right parameters irrespective of the rotational flag. How > > did you do it? I believe the only way to get the burned-in bluestore values > > was to start an OSD with high debug logging. The "config show" commands > > will show what is in the config DB and not what is burned onto disk (and > > actually used). > > > > Best regards, > > ================= > > Frank Schilder > > AIT Risø Campus > > Bygning 109, rum S14 > > > > ________________________________________ > > From: Richard Bade <hitr...@gmail.com> > > Sent: 08 April 2022 00:08 > > To: Frank Schilder > > Cc: Igor Fedotov; Ceph Users > > Subject: [Warning Possible spam] Re: [Warning Possible spam] Re: [Warning > > Possible spam] [ceph-users] Re: Ceph Bluestore tweaks for Bcache > > > > Hi Frank, > > Yes, I think you have got to the crux of the issue. > > > - some_config_value_hdd is used for "rotational=0" devices and > > > - osd/class:hdd values are used for "device_class=hdd" OSDs, > > > > The class is something that is user defined and you can actually > > define your own class names. By default the class is set to ssd for > > rotational=0 and hdd for rotational=1. I override this so my osds end > > up in the right pools as my pools are class based. I also have another > > class called nvme for all nvme storage. > > So the rotational=0 and the class=ssd are actually disconnected and > > used for two different purposes. > > > > > Or are you observing that an HDD+bcache OSD comes up in device class hdd > > > *but* bluestore thinks it is an ssd and applies SSD defaults > > > (some_config_value_ssd) *unless* you explicitly set the config option for > > > device class hdd? > > > > Yes, this is what I am observing, because I am manually changing the > > device class to HDD. > > > > > - OSD is prepared on HDD and put into device class hdd (with correct > > > persistent prepare-time options) > > > - bcache is added *after* OSD creation (???) > > > - after this, on (re-)start the OSD comes up in device class hdd but > > > bluestore thinks now its an SSD and uses some incorrect run-time config > > > option defaults > > > - to fix the incorrect run-time options, you explicitly copy some > > > hdd-defaults to the config data base with filter "osd/class:hdd" > > > > In my process, bcache is added before osd creation as bcache creates a > > disk device called /dev/bcache0 for example. This is used for the data > > disk. As you have surmised bluestore thinks my disks are ssd and > > applies settings as such. I set the class to HDD and then I correct > > runtime settings based on the class. > > > > > There is actually an interesting follow up on this. With bcache/dm_cache > > > large enough it should make sense to use SSD rocks-DB settings, because > > > the data base will fit into the cache. Are there any recommendations for > > > tweaking the prepare-time config options, in particular, the rocks-db > > > options for such hybrid drives? > > > > In my case, this doesn't apply as I have used volumes on the ssd > > specifically for the db. This means I know the db will always be on > > the fast storage. > > But yes, a larger cache size may change the performance and make it > > closer to what ceph expects from an ssd. In my experience the ssd > > settings made performance considerably worse than the hdd settings (3x > > average latency) on bcache. > > > > Regards, > > Rich > > > > On Fri, 8 Apr 2022 at 02:03, Frank Schilder <fr...@dtu.dk> wrote: > > > > > > Hi Richard, > > > > > > so you are tweaking run-time config values, not OSD prepare-time config > > > values. There is something I don't understand here: > > > > > > > What I do for my settings is to set them for the hdd class (ceph config > > > > set osd/class:hdd bluestore_setting_blah=blahblah. > > > > I think that's the correct syntax, but I'm not currently at a computer) > > > > in the config database. > > > > > > If the OSD comes up as class=hdd, then the hdd defaults should be applied > > > any way and there is no point setting these values explicitly to their > > > defaults. How do you make the OSD come up in class hdd, wasn't it your > > > original problem that the OSDs came up in class ssd? Or are you observing > > > that an HDD+bcache OSD comes up in device class hdd *but* bluestore > > > thinks it is an ssd and applies SSD defaults (some_config_value_ssd) > > > *unless* you explicitly set the config option for device class hdd? > > > > > > I think I am confused about the OSD device class, the drive type detected > > > by bluestore and what options are used if there is a mis-match - if there > > > is any. If I understand you correctly, it seems you observe that: > > > > > > - OSD is prepared on HDD and put into device class hdd (with correct > > > persistent prepare-time options) > > > - bcache is added *after* OSD creation (???) > > > - after this, on (re-)start the OSD comes up in device class hdd but > > > bluestore thinks now its an SSD and uses some incorrect run-time config > > > option defaults > > > - to fix the incorrect run-time options, you explicitly copy some > > > hdd-defaults to the config data base with filter "osd/class:hdd" > > > > > > If this is correct, then I believe the underlying issue is that: > > > > > > - some_config_value_hdd is used for "rotational=0" devices and > > > - osd/class:hdd values are used for "device_class=hdd" OSDs, > > > > > > which is not the same despite the string "hdd" indicating that it is. > > > > > > There is actually an interesting follow up on this. With bcache/dm_cache > > > large enough it should make sense to use SSD rocks-DB settings, because > > > the data base will fit into the cache. Are there any recommendations for > > > tweaking the prepare-time config options, in particular, the rocks-db > > > options for such hybrid drives? > > > > > > Best regards, > > > ================= > > > Frank Schilder > > > AIT Risø Campus > > > Bygning 109, rum S14 _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io