Hi ,
Whats the reason for not allow balancer PG if objects are inactive/misplaced at least in nautilus 14.2.2 ? <https://github.com/ceph/ceph/blob/master/src/pybind/mgr/balancer/module.py# L874> https://github.com/ceph/ceph/blob/master/src/pybind/mgr/balancer/module.py#L 874 if unknown > 0.0: detail = 'Some PGs (%f) are unknown; try again later' % unknown self.log.info(detail) return -errno.EAGAIN, detail elif degraded > 0.0: detail = 'Some objects (%f) are degraded; try again later' % degraded self.log.info(detail) return -errno.EAGAIN, detail elif inactive > 0.0: detail = 'Some PGs (%f) are inactive; try again later' % inactive self.log.info(detail) return -errno.EAGAIN, detail elif misplaced >= max_misplaced: detail = 'Too many objects (%f > %f) are misplaced; ' \ 'try again later' % (misplaced, max_misplaced) self.log.info(detail) return -errno.EAGAIN, detail A lot of time, objects are misplaced and degraded because balancer just run in healthy periods , but from my point of view , there're states "misplaced" & degraded where balancer become a must, because finally ceph admin need to do manually a ceph reweight to do balancer job and allow our cluster to be healthy for allow balancer start working. We can understood that balancer cant work with unknow pgs states and inactive states. But. missing and misplaced. Hope some developer can clarify that. This lines cause a lot of problem at least in nautilus 14.2.2 Case example: * Pool Size 1, upgraded to Size 2. Cluster become Warning with misplaced and degraded. Some objects are don't recovery from degraded state due "OSD backfullfill_toofull "due OSDs became full instead of even distributed and balanced, because balancer code exclude it. * Solution manual reweight. but have not sense Regards Manuel
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com