[ceph-users] Questions about tweaking ceph rebalancing activities

ceph-users Tue, 19 Oct 2021 09:33:06 -0700

Hello all,

I am in the progress of adding and removing a number of OSDs in my cluster and 
I'm running in to some issues where it would be good to be able to control the 
system a bit better. I've tried the documentation and google-fu but have come 
up short.


This is the background/scenario: I have a cluster that is/was working fine, had 
HEALTH_OK. I've added a number of new OSDs to the cluster, starting a lot of 
rebalancing. I also want to remove a number of OSDs from the cluster. Some of 
these OSDs have been marked out. The cluster has been rebalancing for more than 
two weeks and in state HEALTH_WARN.

Inter-related issue 1
While the cluster is rebalancing, I would like to prioritize migrating PGs from 
the OSDs that have been marked out. Even though they are marked as out, I can't 
stop them (down) and remove them (destroy/purge), since they still have 
remaining PGs. For instance, I've had about eight OSDs with between 3 and 7 PGs 
remaining (ceph osd safe-to-destroy <osd-id>) for over a week. As long as these 
handful of PGs are there, I can't remove those OSDs. I have set 
osd_max_backfulls, osd_recovery_max_active, osd_recovery_single_start and 
osd_recovery_sleep on the particular OSDs with no apparent affect, i.e. the PGs 
are still remaining.

Is there a way to prioritize particular OSDs/PGs for rebalancing?

Inter-related issue 2
An alternative would be to just destroy the almost empty OSDs anyway, creating 
recovery activity instead of rebalancing. It doesn't seem like the recovery 
activity is prioritized over the rebalancing activity.

Is there a way to ensure recovery activities are prioritized over rebalancing 
activities.

Inter-related issue 3
I spun up another OSD, marked it as up and out. This caused many additional PGs 
to become misplaced. Stopping and destroying the new, empty OSD again changed 
the number of misplaced PGs (returning to the previous amount/percentage).

Can I prevent this by reweighting the OSDs to 0 in addition to marking them as 
out, or are there any other ways of preventing an OSD marked out to impact the 
balancing?


Inter-related issue 4
During rebalancing, several smaller OSDs have become near full. Then one became 
full (>95%). This changed the cluster from HEALTH_WARN to HEALTH_ERR, stopping 
client activities. Reweighting the OSD and the near full OSDs did not change 
the cluster status. In essence, as far as I have understood it, all the data is 
there and available, the cluster is in the process of a massive rebalancing, 
PGs on the full OSD were misplaced and supposed to be moved elsewhere (in any 
case after the manual reweighting), so there should be no reason for the 
cluster to go to ERR. Also as a consequence of the cluster rebalancing for a 
long time, the balancer module is prevented from reweighting OSDs which could 
have prevented the ERR state (if the reweighting had had an impact). My 
solution, which had to be performed by manual intervention, was to mark the 
full OSD as out. The cluster changed back to HEALTH_WARN, client operations 
resumed and the rebalancing could continue in the background.

Is there another way to handle a situation like this (an OSD becomes full, 
while having misplaced PGs on it, blocking the cluster)?

Apologies for so many questions in the same email! They are all part of the 
same management activity for me.

Many thanks!

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Questions about tweaking ceph rebalancing activities

Reply via email to