Re: [ceph-users] CephFS recovery from missing metadata objects questions

Wido den Hollander Wed, 07 Dec 2016 12:08:27 -0800

> Op 7 december 2016 om 20:54 schreef John Spray <jsp...@redhat.com>:
> 
> 
> On Wed, Dec 7, 2016 at 7:47 PM, Wido den Hollander <w...@42on.com> wrote:
> >
> >> Op 7 december 2016 om 16:53 schreef John Spray <jsp...@redhat.com>:
> >>
> >>
> >> On Wed, Dec 7, 2016 at 3:46 PM, Wido den Hollander <w...@42on.com> wrote:
> >> >
> >> >> Op 7 december 2016 om 16:38 schreef John Spray <jsp...@redhat.com>:
> >> >>
> >> >>
> >> >> On Wed, Dec 7, 2016 at 3:28 PM, Wido den Hollander <w...@42on.com> 
> >> >> wrote:
> >> >> > (I think John knows the answer, but sending to ceph-users for 
> >> >> > archival purposes)
> >> >> >
> >> >> > Hi John,
> >> >> >
> >> >> > A Ceph cluster lost a PG with CephFS metadata in there and it is 
> >> >> > currently doing a CephFS disaster recovery as described here: 
> >> >> > http://docs.ceph.com/docs/master/cephfs/disaster-recovery/
> >> >>
> >> >> I wonder if this has any relation to your thread about size=2 pools ;-)
> >> >
> >> > Yes, it does!
> >> >
> >> >>
> >> >> > This data pool has 1.4B objects and currently has 16 concurrent 
> >> >> > scan_extents scans running:
> >> >> >
> >> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 0 
> >> >> > --worker_m 16 cephfs_metadata
> >> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 1 
> >> >> > --worker_m 16 cephfs_metadata
> >> >> > ..
> >> >> > ..
> >> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 15 
> >> >> > --worker_m 16 cephfs_metadata
> >> >> >
> >> >> > According to the source in DataScan.cc:
> >> >> > * worker_n: Worker number
> >> >> > * worker_m: Worker count
> >> >> >
> >> >> > So with the commands above I have 16 workers running, correct? For 
> >> >> > the scan_inodes I want to scale out to 32 workers to speed up the 
> >> >> > process even more.
> >> >> >
> >> >> > Just to double-check before I send a new PR to update the docs, this 
> >> >> > is the right way to run the tool, correct?
> >> >>
> >> >> It looks like you're targeting cephfs_metadata instead of your data 
> >> >> pool.
> >> >>
> >> >> scan_extents and scan_inodes operate on data pools, even if your goal
> >> >> is to rebuild your metadata pool (the argument is what you are
> >> >> scanning, not what you are writing to).
> >> >
> >> > That was a typo of me when typing this e-mail. It is scanning the *data* 
> >> > pool at the moment.
> >> >
> >> > Can you confirm that the worker_n and worker_m arguments are the correct 
> >> > ones?
> >>
> >> Yep, they look right to me.
> >
> > Ok, great. I pushed a PR to update the docs and help. Care to review it?
> >
> > https://github.com/ceph/ceph/pull/12370
> >
> >>
> >> >>
> >> >> There is also a "scan_frags" command that operates on a metadata pool.
> >> >
> >> > Didn't know that. In this case the metadata pool is missing objects due 
> >> > to that lost PG.
> >> >
> >> > I think the scan_extents and scan_inodes on the *data* pool is the 
> >> > correct way to rebuild the metadata pool if it is missing objects, right?
> >>
> >> In general you'd use both scan_frags (to re-link any orphaned
> >> directories that might have been orphaned if they had an ancestor
> >> dirfrag in the lost PG) and then scan_extents+scan_inodes (to re-link
> >> any orphaned files that might have been orphaned because their
> >> immediate parent dirfrag was in the lost PG).
> >>
> >> However scan_extents+scan_inodes is generally doing the lion's share
> >> of the work because anything that scan_frags would have caught would
> >> probably also have appeared somewhere in a backtrace path and got
> >> linked in by scan_inodes as a result, so you should probably just skip
> >> scan_frags in this instance.
> >>
> >> BTW, you've probably already realised this, but be *very* cautious
> >> about using the recovered filesystem: our testing of these tools is
> >> mostly verifying that after recovery we can see and read the files
> >> (i.e. well enough to extract them somewhere else), not that the
> >> filesystem is necessarily working well for writes etc after being
> >> recovered.  If it's possible, then it's always better to recover your
> >> files to a separate location, and then rebuild your filesystem with
> >> fresh pools -- that way you're not risking that there as anything
> >> strange left behind by the recovery process.
> >>
> >
> > I'm aware of this. Currently trying to make the best out of this situation 
> > and get the FS up and running.
> >
> > The MDS was running fine for about 24 hours, but started to assert on 
> > missing RADOS objects in the metadata pool. So we had to resort to this 
> > scan which takes a long, very long time.
> >
> > 2016-12-07 08:29:58.852595 7f3d74c96700 -1 log_channel(cluster) log [ERR] : 
> > dir 10011a4767b object missing on disk; some files may be lost
> > 2016-12-07 08:29:58.855070 7f3d74c96700 -1 mds/MDCache.cc: In function 
> > 'virtual void C_MDC_OpenInoTraverseDir::finish(int)' thread 7f3d74c96700 
> > time 2016-12-07 08:29:58.852637
> > mds/MDCache.cc: 8213: FAILED assert(r >= 0)
> 
> Oops, that's a bug.  These cases are supposed to make the MDS record
> damage and EIO requests to things beneath that path, not assert out.
> Could you open a ticket please?


Done! http://tracker.ceph.com/issues/18179

Thanks again for the quick responses, very much appreciated!

Wido

> 
> John
> 
> >
> > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
> >
> > Wido
> >
> >> John
> >>
> >> > Wido
> >> >
> >> >>
> >> >> John
> >> >>
> >> >> > If not, before sending the PR and starting scan_inodes on this 
> >> >> > cluster, what is the correct way to invoke the tool?
> >> >> >
> >> >> > Thanks!
> >> >> >
> >> >> > Wido
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS recovery from missing metadata objects questions

Reply via email to