> Op 7 december 2016 om 20:54 schreef John Spray <jsp...@redhat.com>: > > > On Wed, Dec 7, 2016 at 7:47 PM, Wido den Hollander <w...@42on.com> wrote: > > > >> Op 7 december 2016 om 16:53 schreef John Spray <jsp...@redhat.com>: > >> > >> > >> On Wed, Dec 7, 2016 at 3:46 PM, Wido den Hollander <w...@42on.com> wrote: > >> > > >> >> Op 7 december 2016 om 16:38 schreef John Spray <jsp...@redhat.com>: > >> >> > >> >> > >> >> On Wed, Dec 7, 2016 at 3:28 PM, Wido den Hollander <w...@42on.com> > >> >> wrote: > >> >> > (I think John knows the answer, but sending to ceph-users for > >> >> > archival purposes) > >> >> > > >> >> > Hi John, > >> >> > > >> >> > A Ceph cluster lost a PG with CephFS metadata in there and it is > >> >> > currently doing a CephFS disaster recovery as described here: > >> >> > http://docs.ceph.com/docs/master/cephfs/disaster-recovery/ > >> >> > >> >> I wonder if this has any relation to your thread about size=2 pools ;-) > >> > > >> > Yes, it does! > >> > > >> >> > >> >> > This data pool has 1.4B objects and currently has 16 concurrent > >> >> > scan_extents scans running: > >> >> > > >> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 0 > >> >> > --worker_m 16 cephfs_metadata > >> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 1 > >> >> > --worker_m 16 cephfs_metadata > >> >> > .. > >> >> > .. > >> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 15 > >> >> > --worker_m 16 cephfs_metadata > >> >> > > >> >> > According to the source in DataScan.cc: > >> >> > * worker_n: Worker number > >> >> > * worker_m: Worker count > >> >> > > >> >> > So with the commands above I have 16 workers running, correct? For > >> >> > the scan_inodes I want to scale out to 32 workers to speed up the > >> >> > process even more. > >> >> > > >> >> > Just to double-check before I send a new PR to update the docs, this > >> >> > is the right way to run the tool, correct? > >> >> > >> >> It looks like you're targeting cephfs_metadata instead of your data > >> >> pool. > >> >> > >> >> scan_extents and scan_inodes operate on data pools, even if your goal > >> >> is to rebuild your metadata pool (the argument is what you are > >> >> scanning, not what you are writing to). > >> > > >> > That was a typo of me when typing this e-mail. It is scanning the *data* > >> > pool at the moment. > >> > > >> > Can you confirm that the worker_n and worker_m arguments are the correct > >> > ones? > >> > >> Yep, they look right to me. > > > > Ok, great. I pushed a PR to update the docs and help. Care to review it? > > > > https://github.com/ceph/ceph/pull/12370 > > > >> > >> >> > >> >> There is also a "scan_frags" command that operates on a metadata pool. > >> > > >> > Didn't know that. In this case the metadata pool is missing objects due > >> > to that lost PG. > >> > > >> > I think the scan_extents and scan_inodes on the *data* pool is the > >> > correct way to rebuild the metadata pool if it is missing objects, right? > >> > >> In general you'd use both scan_frags (to re-link any orphaned > >> directories that might have been orphaned if they had an ancestor > >> dirfrag in the lost PG) and then scan_extents+scan_inodes (to re-link > >> any orphaned files that might have been orphaned because their > >> immediate parent dirfrag was in the lost PG). > >> > >> However scan_extents+scan_inodes is generally doing the lion's share > >> of the work because anything that scan_frags would have caught would > >> probably also have appeared somewhere in a backtrace path and got > >> linked in by scan_inodes as a result, so you should probably just skip > >> scan_frags in this instance. > >> > >> BTW, you've probably already realised this, but be *very* cautious > >> about using the recovered filesystem: our testing of these tools is > >> mostly verifying that after recovery we can see and read the files > >> (i.e. well enough to extract them somewhere else), not that the > >> filesystem is necessarily working well for writes etc after being > >> recovered. If it's possible, then it's always better to recover your > >> files to a separate location, and then rebuild your filesystem with > >> fresh pools -- that way you're not risking that there as anything > >> strange left behind by the recovery process. > >> > > > > I'm aware of this. Currently trying to make the best out of this situation > > and get the FS up and running. > > > > The MDS was running fine for about 24 hours, but started to assert on > > missing RADOS objects in the metadata pool. So we had to resort to this > > scan which takes a long, very long time. > > > > 2016-12-07 08:29:58.852595 7f3d74c96700 -1 log_channel(cluster) log [ERR] : > > dir 10011a4767b object missing on disk; some files may be lost > > 2016-12-07 08:29:58.855070 7f3d74c96700 -1 mds/MDCache.cc: In function > > 'virtual void C_MDC_OpenInoTraverseDir::finish(int)' thread 7f3d74c96700 > > time 2016-12-07 08:29:58.852637 > > mds/MDCache.cc: 8213: FAILED assert(r >= 0) > > Oops, that's a bug. These cases are supposed to make the MDS record > damage and EIO requests to things beneath that path, not assert out. > Could you open a ticket please?
Done! http://tracker.ceph.com/issues/18179 Thanks again for the quick responses, very much appreciated! Wido > > John > > > > > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) > > > > Wido > > > >> John > >> > >> > Wido > >> > > >> >> > >> >> John > >> >> > >> >> > If not, before sending the PR and starting scan_inodes on this > >> >> > cluster, what is the correct way to invoke the tool? > >> >> > > >> >> > Thanks! > >> >> > > >> >> > Wido _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com