Den ons 24 okt. 2018 kl 13:09 skrev Florent B <>: > On a Luminous cluster having some misplaced and degraded objects after > outage : > > health: HEALTH_WARN > 22100/2496241 objects misplaced (0.885%) > Degraded data redundancy: 964/2496241 objects degraded > (0.039%), 3 p > gs degraded > > I can that Ceph gives priority on replacing objects instead of repairing > degraded ones. > > Number of misplaced objects is decreasing, while number of degraded > objects does not decrease. > Is it expected ?
I think it is. It can even increase. My theory is that you have a certain PG (or many) that is misplaced during outage, the cluster runs on with the replicas of the PG taking reads and writes during recovery. As long as there only exist reads, the PG (and the % of objects it holds) will only be misplaced, and as the cluster slowly gets stuff back to where it belongs (or making a new copy in a new OSD) this will decrease the % misplaced. This takes non-zero time, and if there are writes to the PG (or other queueing PGs) while the move is running, ceph will know that not only is this PG lacking one or more replicas, the data that was recently written is available in less-than-optimal numbers. I guess a PG has some kind of timestamp saying "last write was at time xyz", so when it recovers, a stream job makes a new empty PG, does a copy of all data upto zyx into it and after that is done, checks to see if the original PG still is at version xyz in which case it just jumps into service directly, or if the PG is at version xyz+10 then it asks for the last 10 changes, and repeats the check again. Since there is a queue which is limited to max_recovery or max_backfills, the longer the repair takes to complete, the bigger the chance to see degraded aswell as misplaced, but as the number of misplaced goes down close to zero, the degraded number will shrink really fast. -- May the most significant bit of your life be positive. _______________________________________________ ceph-users mailing list