Just to update the case for others: Setting ceph config set osd/class:ssd osd_recovery_sleep 0.001 ceph config set osd/class:hdd osd_recovery_sleep 0.05
had the desired effect. I'm running another massive rebalancing operation right now and these settings seem to help. It would be nice if one could use a pool name in a filter though (osd/pool:NAME). I have 2 different pools on the same SSDs and only objects from one of these pools require the lower sleep setting. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Joachim Kraftmayer <joachim.kraftma...@clyso.com> Sent: 03 December 2020 16:49:51 To: 胡 玮文; Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph-users] Re: Increase number of objects in flight during recovery Hi Frank, this values we used to reduce the recovery impact before luminous. #reduce recovery impact osd max backfills osd recovery max active osd recovery max single start osd recovery op priority osd recovery threads osd backfill scan max osd backfill scan min I do not know how many osds and pgs you have in your cluster. But the backfill performance depends on osds, pgs and objects/pg. Regards, Joachim ___________________________________ Clyso GmbH Am 03.12.2020 um 12:35 schrieb 胡 玮文: > Hi, > > There is a “OSD recovery priority” dialog box in web dashboard. > Configurations it will change includes: > > osd_max_backfill > osd_recovery_max_active > osd_recovery_max_single_start > osd_recovery_sleep > > Tune these config may helps. “High” priority corresponding to 4, 4, 4, 0, > respectively. Some of these also have a _ssd/_hdd variant. > >> 在 2020年12月3日,17:11,Frank Schilder <fr...@dtu.dk> 写道: >> >> Hi all, >> >> I have the opposite problem as discussed in "slow down keys/s in recovery". >> I need to increase the number of objects in flight during rebalance. It is >> already all remapped PGs in state backfilling, but it looks like no more >> than 8 objects/sec are transferred per PG at a time. The pools sits on >> high-performance SSDs and could easily handle a transfer of 100 or more >> objects/sec simultaneously. Is there any way to increase the number of >> transfers/sec or simultaneous transfers? Increasing the options >> osd_max_backfills and osd_recovery_max_active has no effect. >> >> Background: The pool in question (con-fs2-meta2) is the default data pool of >> a ceph fs, which stores exclusively the kind of meta data that goes into >> this pool. Storage consumption is reported as 0, but the number of objects >> is huge: >> >> NAME ID USED %USED MAX AVAIL >> OBJECTS >> con-fs2-meta1 12 216 MiB 0.02 933 GiB >> 13311115 >> con-fs2-meta2 13 0 B 0 933 GiB >> 118389897 >> con-fs2-data 14 698 TiB 72.15 270 TiB >> 286826739 >> >> Unfortunately, there were no recommendations on dimensioning PG numbers for >> this pool, so I used the same for con-fs2-meta1, and con-fs2-meta2. In >> hindsight, this was potentially a bad idea, the meta2 pool should have a >> much higher PG count or a much more aggressive recovery policy. >> >> I now need to rebalance PGs on meta2 and it is going way too slow compared >> with the performance of the SSDs it is located on. In a way, I would like to >> keep the PG count where it is, but increase the recovery rate for this pool >> by a factor of 10. Please let me know what options I have. >> >> Best regards, >> ================= >> Frank Schilder >> AIT Risø Campus >> Bygning 109, rum S14 >> _______________________________________________ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io