On Wed, Dec 7, 2016 at 7:47 PM, Wido den Hollander <w...@42on.com> wrote: > >> Op 7 december 2016 om 16:53 schreef John Spray <jsp...@redhat.com>: >> >> >> On Wed, Dec 7, 2016 at 3:46 PM, Wido den Hollander <w...@42on.com> wrote: >> > >> >> Op 7 december 2016 om 16:38 schreef John Spray <jsp...@redhat.com>: >> >> >> >> >> >> On Wed, Dec 7, 2016 at 3:28 PM, Wido den Hollander <w...@42on.com> wrote: >> >> > (I think John knows the answer, but sending to ceph-users for archival >> >> > purposes) >> >> > >> >> > Hi John, >> >> > >> >> > A Ceph cluster lost a PG with CephFS metadata in there and it is >> >> > currently doing a CephFS disaster recovery as described here: >> >> > http://docs.ceph.com/docs/master/cephfs/disaster-recovery/ >> >> >> >> I wonder if this has any relation to your thread about size=2 pools ;-) >> > >> > Yes, it does! >> > >> >> >> >> > This data pool has 1.4B objects and currently has 16 concurrent >> >> > scan_extents scans running: >> >> > >> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 0 >> >> > --worker_m 16 cephfs_metadata >> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 1 >> >> > --worker_m 16 cephfs_metadata >> >> > .. >> >> > .. >> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 15 >> >> > --worker_m 16 cephfs_metadata >> >> > >> >> > According to the source in DataScan.cc: >> >> > * worker_n: Worker number >> >> > * worker_m: Worker count >> >> > >> >> > So with the commands above I have 16 workers running, correct? For the >> >> > scan_inodes I want to scale out to 32 workers to speed up the process >> >> > even more. >> >> > >> >> > Just to double-check before I send a new PR to update the docs, this is >> >> > the right way to run the tool, correct? >> >> >> >> It looks like you're targeting cephfs_metadata instead of your data pool. >> >> >> >> scan_extents and scan_inodes operate on data pools, even if your goal >> >> is to rebuild your metadata pool (the argument is what you are >> >> scanning, not what you are writing to). >> > >> > That was a typo of me when typing this e-mail. It is scanning the *data* >> > pool at the moment. >> > >> > Can you confirm that the worker_n and worker_m arguments are the correct >> > ones? >> >> Yep, they look right to me. > > Ok, great. I pushed a PR to update the docs and help. Care to review it? > > https://github.com/ceph/ceph/pull/12370 > >> >> >> >> >> There is also a "scan_frags" command that operates on a metadata pool. >> > >> > Didn't know that. In this case the metadata pool is missing objects due to >> > that lost PG. >> > >> > I think the scan_extents and scan_inodes on the *data* pool is the correct >> > way to rebuild the metadata pool if it is missing objects, right? >> >> In general you'd use both scan_frags (to re-link any orphaned >> directories that might have been orphaned if they had an ancestor >> dirfrag in the lost PG) and then scan_extents+scan_inodes (to re-link >> any orphaned files that might have been orphaned because their >> immediate parent dirfrag was in the lost PG). >> >> However scan_extents+scan_inodes is generally doing the lion's share >> of the work because anything that scan_frags would have caught would >> probably also have appeared somewhere in a backtrace path and got >> linked in by scan_inodes as a result, so you should probably just skip >> scan_frags in this instance. >> >> BTW, you've probably already realised this, but be *very* cautious >> about using the recovered filesystem: our testing of these tools is >> mostly verifying that after recovery we can see and read the files >> (i.e. well enough to extract them somewhere else), not that the >> filesystem is necessarily working well for writes etc after being >> recovered. If it's possible, then it's always better to recover your >> files to a separate location, and then rebuild your filesystem with >> fresh pools -- that way you're not risking that there as anything >> strange left behind by the recovery process. >> > > I'm aware of this. Currently trying to make the best out of this situation > and get the FS up and running. > > The MDS was running fine for about 24 hours, but started to assert on missing > RADOS objects in the metadata pool. So we had to resort to this scan which > takes a long, very long time. > > 2016-12-07 08:29:58.852595 7f3d74c96700 -1 log_channel(cluster) log [ERR] : > dir 10011a4767b object missing on disk; some files may be lost > 2016-12-07 08:29:58.855070 7f3d74c96700 -1 mds/MDCache.cc: In function > 'virtual void C_MDC_OpenInoTraverseDir::finish(int)' thread 7f3d74c96700 time > 2016-12-07 08:29:58.852637 > mds/MDCache.cc: 8213: FAILED assert(r >= 0)
Oops, that's a bug. These cases are supposed to make the MDS record damage and EIO requests to things beneath that path, not assert out. Could you open a ticket please? John > > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b) > > Wido > >> John >> >> > Wido >> > >> >> >> >> John >> >> >> >> > If not, before sending the PR and starting scan_inodes on this cluster, >> >> > what is the correct way to invoke the tool? >> >> > >> >> > Thanks! >> >> > >> >> > Wido _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com