Hi Michael, I also think it would be safe to delete. The object count might be an incorrect reference count of lost objects that didn't get decremented. This might be fixed by running a deep scrub over all PGs in that pool.
I don't know rados well enough to find out where such an object count comes from. However, ceph df is known to be imperfect. Maybe its just an accounting bug there. I think there were a couple of cases where people deleted all objects in a pool and ceph df would still report non-zero usage. Best regards, ================= Frank Schilder AIT Risø Campus Bygning 109, rum S14 ________________________________________ From: Michael Thomas <w...@caltech.edu> Sent: 12 February 2021 22:35:25 To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Removing secondary data pool from mds Hi Frank, We're not using snapshots. I was able to run: ceph daemon mds.ceph1 dump cache /tmp/cache.txt ...and scan for the stray object to find the cap id that was accessing the object. I matched this with the entity name in: ceph daemon mds.ceph1 session ls ...to determine the client host. The strays went away after I rebooted the offending client. With all access to the objects now cleared, I ran: ceph pg X.Y mark_unfound_lost delete ...on any remaining rados objects. At this point (at long last) the pool was able to return to the 'HEALTHY' status. However, there is one remaining bit that I don't understand. 'ceph df' returns 355 objects for the pool (fs.data.archive.frames): https://pastebin.com/vbZLhQmC ...but 'rados -p fs.data.archive.frames ls --all' returns no objects. So I'm not sure what these 355 objects were. Because of that, I haven't removed the pool from cephfs quite yet, even though I think it would be safe to do so. --Mike On 2/10/21 4:20 PM, Frank Schilder wrote: > Hi Michael, > > out of curiosity, did the pool go away or did it put up a fight? > > I don't remember exactly, its a long time ago, but I believe stray objects on > fs pools come from files still in snapshots but were deleted on the fs level. > Such files are moved to special stray pools until the snapshot containing > them is deleted as well. Not sure if this applies here though, there might be > other occasions when objects go to stray. > > I updated the case concerning the underlying problem, but not too much > progress either: https://tracker.ceph.com/issues/46847#change-184710 . I had > PG degradation even using the recovery technique with before- and after crush > maps. I was just lucky that I lost only 1 shard per object and ordinary > recovery could fix it. > > Best regards, > ================= > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > ________________________________________ > From: Michael Thomas <w...@caltech.edu> > Sent: 21 December 2020 23:12:09 > To: ceph-users@ceph.io > Subject: [ceph-users] Removing secondary data pool from mds > > I have a cephfs secondary (non-root) data pool with unfound and degraded > objects that I have not been able to recover[1]. I created an > additional data pool and used "setfattr -n ceph.dir.layout.pool' and a > very long rsync to move the files off of the degraded pool and onto the > new pool. This has completed, and using find + 'getfattr -n > ceph.file.layout.pool', I verified that no files are using the old pool > anymore. No ceph.dir.layout.pool attributes point to the old pool either. > > However, the old pool still reports that there are objects in the old > pool, likely the same ones that were unfound/degraded from before: > https://pastebin.com/qzVA7eZr > > Based on a old message from the mailing list[2], I checked the MDS for > stray objects (ceph daemon mds.ceph4 dump cache file.txt ; grep -i stray > file.txt) and found 36 stray entries in the cache: > https://pastebin.com/MHkpw3DV. However, I'm not certain how to map > these stray cache objects to clients that may be accessing them. > > 'rados -p fs.data.archive.frames ls' shows 145 objects. Looking at the > parent of each object shows 2 strays: > > for obj in $(cat rados.ls.txt) ; do echo $obj ; rados -p > fs.data.archive.frames getxattr $obj parent | strings ; done > > > [...] > 10000020fa1.00000000 > 10000020fa1 > stray6 > 10000020fbc.00000000 > 10000020fbc > stray6 > [...] > > ...before getting stuck on one object for over 5 minutes (then I gave up): > > 1000005b1af.00000083 > > What can I do to make sure this pool is ready to be safely deleted from > cephfs (ceph fs rm_data_pool archive fs.data.archive.frames)? > > --Mike > > [1]https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/QHFOGEKXK7VDNNSKR74BA6IIMGGIXBXA/#7YQ6SSTESM5LTFVLQK3FSYFW5FDXJ5CF > > [2]http://lists.ceph.com/pipermail/ceph-users-ceph.com/2015-October/005233.html > _______________________________________________ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io > _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io