Ok, so I did some testing on each of these parameters one by one;
removing them from the config, watching the latency for a few minutes
then adding them back again.
None of them had any conclusive, statistically significant impact on
the latency except bluestore_prefer_deferred_size.
I removed it like this:
sudo ceph config rm osd/class:hdd bluestore_prefer_deferred_size

and my latency immediately increased from 2ms to 6ms. So I added it back again:
sudo ceph config set osd/class:hdd bluestore_prefer_deferred_size 32768

latency immediately dropped back to 2ms.
So this parameter is definitely able to be applied at runtime and
makes a difference to how my osds perform. As I am using separate db
partitions on the ssd this is to be expected when more is being pushed
through the wal, which I believe is what this parameter is doing.
I also notice the activity on the wal increases across these osds.

I also tested the other way, by removing all the parameters I
mentioned earlier and just adding this one. The results were the same.

So I guess an update to my original post is that when using bcache
make sure that you tweak the bluestore_prefer_deferred_size at least.
The bluestore_prefer_deferred_size_hdd value of 32768 works well for
me but there may be other values that are better.

Rich

On Mon, 11 Apr 2022 at 09:23, Richard Bade <hitr...@gmail.com> wrote:
>
> Hi Frank,
> Thanks for your insight on this. I had done a bunch of testing on this
> over a year ago and found improvements with these settings. I then
> applied them all at once to our production cluster and confirmed the
> 3x reduction in latency, however I did not test the settings
> individually.
> It could well be, as you say, that the settings cannot be changed at
> runtime and that in fact only the other settings such as op queue and
> throttle cost are making the difference. I'll attempt to test the
> settings again this week and see which ones are actually affecting
> latency during runtime setting.
>
> > I'm not sure why with your OSD creation procedure the data part is created 
> > with the correct HDD parameters.
> I believe that at prepare time my osds get all SSD parameters. That's
> why I manually change the class and these runtime settings.
>
> Rich
>
> On Sat, 9 Apr 2022 at 00:22, Frank Schilder <fr...@dtu.dk> wrote:
> >
> > Hi Richard,
> >
> > thanks for the additional info, now I understand the whole scenario and 
> > what might be different when using lvm and dm_cache.
> >
> > > In my process, bcache is added before osd creation as bcache creates a
> > > disk device called /dev/bcache0 for example. This is used for the data
> >
> > This is an important detail. As far as I know, dm_cache is transparent. It 
> > can be added/removed at run time and doesn't create a new device. However, 
> > I don't know if it changes the rotational attribute of the LVM device.
> >
> > I'm not sure why with your OSD creation procedure the data part is created 
> > with the correct HDD parameters. I believe at least these if not more 
> > parameters are used at prepare time only and cannot be changed after the 
> > OSD is created:
> >
> > bluestore_prefer_deferred_size = 32768
> > bluestore_compression_max_blob_size = 524288
> > bluestore_max_blob_size = 524288
> > bluestore_min_alloc_size = 65536
> >
> > If you set these for osd/class:hdd they should *not* be used if the initial 
> > device class is ssd. If I understood you correctly, you create an OSD with 
> > class=ssd and then change its class to class=hdd. At this point, however, 
> > it is too late, the hard-coded ssd options should persist. I wonder if 
> > using a command like
> >
> > ceph-volume lvm batch --crush-device-class hdd ...
> >
> > will select the right parameters irrespective of the rotational flag. How 
> > did you do it? I believe the only way to get the burned-in bluestore values 
> > was to start an OSD with high debug logging. The "config show" commands 
> > will show what is in the config DB and not what is burned onto disk (and 
> > actually used).
> >
> > Best regards,
> > =================
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > ________________________________________
> > From: Richard Bade <hitr...@gmail.com>
> > Sent: 08 April 2022 00:08
> > To: Frank Schilder
> > Cc: Igor Fedotov; Ceph Users
> > Subject: [Warning Possible spam]  Re: [Warning Possible spam] Re: [Warning 
> > Possible spam] [ceph-users] Re: Ceph Bluestore tweaks for Bcache
> >
> > Hi Frank,
> > Yes, I think you have got to the crux of the issue.
> > > - some_config_value_hdd is used for "rotational=0" devices and
> > > - osd/class:hdd values are used for "device_class=hdd" OSDs,
> >
> > The class is something that is user defined and you can actually
> > define your own class names. By default the class is set to ssd for
> > rotational=0 and hdd for rotational=1. I override this so my osds end
> > up in the right pools as my pools are class based. I also have another
> > class called nvme for all nvme storage.
> > So the rotational=0 and the class=ssd are actually disconnected and
> > used for two different purposes.
> >
> > > Or are you observing that an HDD+bcache OSD comes up in device class hdd 
> > > *but* bluestore thinks it is an ssd and applies SSD defaults 
> > > (some_config_value_ssd) *unless* you explicitly set the config option for 
> > > device class hdd?
> >
> > Yes, this is what I am observing, because I am manually changing the
> > device class to HDD.
> >
> > > - OSD is prepared on HDD and put into device class hdd (with correct 
> > > persistent prepare-time options)
> > > - bcache is added *after* OSD creation (???)
> > > - after this, on (re-)start the OSD comes up in device class hdd but 
> > > bluestore thinks now its an SSD and uses some incorrect run-time config 
> > > option defaults
> > > - to fix the incorrect run-time options, you explicitly copy some 
> > > hdd-defaults to the config data base with filter "osd/class:hdd"
> >
> > In my process, bcache is added before osd creation as bcache creates a
> > disk device called /dev/bcache0 for example. This is used for the data
> > disk. As you have surmised bluestore thinks my disks are ssd and
> > applies settings as such. I set the class to HDD and then I correct
> > runtime settings based on the class.
> >
> > > There is actually an interesting follow up on this. With bcache/dm_cache 
> > > large enough it should make sense to use SSD rocks-DB settings, because 
> > > the data base will fit into the cache. Are there any recommendations for 
> > > tweaking the prepare-time config options, in particular, the rocks-db 
> > > options for such hybrid drives?
> >
> > In my case, this doesn't apply as I have used volumes on the ssd
> > specifically for the db. This means I know the db will always be on
> > the fast storage.
> > But yes, a larger cache size may change the performance and make it
> > closer to what ceph expects from an ssd. In my experience the ssd
> > settings made performance considerably worse than the hdd settings (3x
> > average latency) on bcache.
> >
> > Regards,
> > Rich
> >
> > On Fri, 8 Apr 2022 at 02:03, Frank Schilder <fr...@dtu.dk> wrote:
> > >
> > > Hi Richard,
> > >
> > > so you are tweaking run-time config values, not OSD prepare-time config 
> > > values. There is something I don't understand here:
> > >
> > > > What I do for my settings is to set them for the hdd class (ceph config 
> > > > set osd/class:hdd bluestore_setting_blah=blahblah.
> > > > I think that's the correct syntax, but I'm not currently at a computer) 
> > > > in the config database.
> > >
> > > If the OSD comes up as class=hdd, then the hdd defaults should be applied 
> > > any way and there is no point setting these values explicitly to their 
> > > defaults. How do you make the OSD come up in class hdd, wasn't it your 
> > > original problem that the OSDs came up in class ssd? Or are you observing 
> > > that an HDD+bcache OSD comes up in device class hdd *but* bluestore 
> > > thinks it is an ssd and applies SSD defaults (some_config_value_ssd) 
> > > *unless* you explicitly set the config option for device class hdd?
> > >
> > > I think I am confused about the OSD device class, the drive type detected 
> > > by bluestore and what options are used if there is a mis-match - if there 
> > > is any. If I understand you correctly, it seems you observe that:
> > >
> > > - OSD is prepared on HDD and put into device class hdd (with correct 
> > > persistent prepare-time options)
> > > - bcache is added *after* OSD creation (???)
> > > - after this, on (re-)start the OSD comes up in device class hdd but 
> > > bluestore thinks now its an SSD and uses some incorrect run-time config 
> > > option defaults
> > > - to fix the incorrect run-time options, you explicitly copy some 
> > > hdd-defaults to the config data base with filter "osd/class:hdd"
> > >
> > > If this is correct, then I believe the underlying issue is that:
> > >
> > > - some_config_value_hdd is used for "rotational=0" devices and
> > > - osd/class:hdd values are used for "device_class=hdd" OSDs,
> > >
> > > which is not the same despite the string "hdd" indicating that it is.
> > >
> > > There is actually an interesting follow up on this. With bcache/dm_cache 
> > > large enough it should make sense to use SSD rocks-DB settings, because 
> > > the data base will fit into the cache. Are there any recommendations for 
> > > tweaking the prepare-time config options, in particular, the rocks-db 
> > > options for such hybrid drives?
> > >
> > > Best regards,
> > > =================
> > > Frank Schilder
> > > AIT Risø Campus
> > > Bygning 109, rum S14
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to