Hi All, Is it possible to safely identify objects that should be purged from a CephFS pool, and can we purge them manually?
Background: ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e) luminous (stable) We were running 2 MDS, 1 active & 1 standby-replay. A couple of months ago, after triggering an MDS failover, we hit a purgequeue bug [1] which prevented either MDS from becoming active. We followed steps in [2] to delete metadata objects in the purge queue and bring both of the MDS back online. Today it became clear that the usage of a CephFS data pool was much higher than the usage shown by clients. e.g. ls on client shows ~5.2TB used, while ceph fs status shows 146T used. After reading bug report [3] (which appears to be related to bug reports [1] & [4]), we set 'mds standby replay = false' and restarted both MDS. This appears to have stopped the persistent climb in usage on the OSDs, but usage remains critically high on several OSDs (~89%). So, it looks like we have a problem with CephFS not recovering space and therefore have a large number of objects that need to be purged. Is there any possible method to do so safely? Also possibly relevant: I've been periodically running the following command throughout today: rados -p <metadata_pool> ls | grep "^500\." That command currently lists ~1670 metadata objects (500.XXXXXXXX), and the list of objects produced by that command is quite consistent. i.e. about 1669 objects are the same each time. [1]https://tracker.ceph.com/issues/21749 [2]http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021386.html [3]https://tracker.ceph.com/issues/21551 [4]https://tracker.ceph.com/issues/19593 Regards, Dylan
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com