Dear Maarten, For a cluster that size, I would not immediately enable the autoscaler but first enabled it in "warn" mode to sanity check what it would plan to do:
# ceph osd pool set <pool> pg_autoscale_mode warn Please share the output of "ceph osd pool autoscale-status" so we can help guide what you do next. Also, you should be aware that there are some rare but unpleasant bugs that may be related to PG splitting (autoscaling). See https://tracker.ceph.com/issues/53729 You may want to wait until that issue is resolved before permanently enabling the autoscaler. Best Regards, Dan > On 02/07/2022 12:31 PM Maarten van Ingen <maarten.vanin...@surf.nl> wrote: > > > Hi, > > We are about to enable the PG autoscaler on CEPH. Currently we are running > the latest subrelease of Nautilus with Bluestore and LVM. The current status > of the autoscaler is that it’s turned off on all pools and the module is > enabled. > > To make sure we do not kill anything, performance and/or data. I’d like some > advice on how to proceed. > > We have about 11PiB of raw HDD storage and 40ish% is in use and about 550TiB > of NVMe storage. In total we have about 1250 OSD’s of which about 300 are > NVMe only OSD’s. We have crush rules to allow for NVMe only storage pools and > HDD only storage pools > > For every pool we have we have set a target-size to guide the autoscaler a > bit and also we have set a minimum of 256 PG’s per pool. > What now happens is that a few pools will have their amount of PG’s changed > ranging from 4x to 16x. We have never changed the amount of PG’s in a pool > with these factors (no more than 2x in a single go) and also with a lot less > data. So we have no clear idea of what will happen when we enable the > autoscaler. > > For example, one pool which has about 1PiB of user data will grow from 4k > PG’s to 16k PG’s. This of course will involve a lot of data movement. Another > pool with 100TiB of data will grow from 512 to 8k PG’s > > All pools are set with a size of 3 and thus the abovementioned 1PiB is 3PiB > of raw data, we currently have no erasure coding pools. > > Can somebody help me out on how to proceed on a safe way to enable the > autoscaler or tell me of it’s OK just to enable the autoscaler. We will > enable it per pool to limit affected pools. > > Met vriendelijke groet, > > Kind Regards, > Maarten van Ingen > > > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io