On Wed, Dec 7, 2016 at 7:47 PM, Wido den Hollander <w...@42on.com> wrote:
>
>> Op 7 december 2016 om 16:53 schreef John Spray <jsp...@redhat.com>:
>>
>>
>> On Wed, Dec 7, 2016 at 3:46 PM, Wido den Hollander <w...@42on.com> wrote:
>> >
>> >> Op 7 december 2016 om 16:38 schreef John Spray <jsp...@redhat.com>:
>> >>
>> >>
>> >> On Wed, Dec 7, 2016 at 3:28 PM, Wido den Hollander <w...@42on.com> wrote:
>> >> > (I think John knows the answer, but sending to ceph-users for archival 
>> >> > purposes)
>> >> >
>> >> > Hi John,
>> >> >
>> >> > A Ceph cluster lost a PG with CephFS metadata in there and it is 
>> >> > currently doing a CephFS disaster recovery as described here: 
>> >> > http://docs.ceph.com/docs/master/cephfs/disaster-recovery/
>> >>
>> >> I wonder if this has any relation to your thread about size=2 pools ;-)
>> >
>> > Yes, it does!
>> >
>> >>
>> >> > This data pool has 1.4B objects and currently has 16 concurrent 
>> >> > scan_extents scans running:
>> >> >
>> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 0 
>> >> > --worker_m 16 cephfs_metadata
>> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 1 
>> >> > --worker_m 16 cephfs_metadata
>> >> > ..
>> >> > ..
>> >> > # cephfs-data-scan --debug-rados=10 scan_extents --worker_n 15 
>> >> > --worker_m 16 cephfs_metadata
>> >> >
>> >> > According to the source in DataScan.cc:
>> >> > * worker_n: Worker number
>> >> > * worker_m: Worker count
>> >> >
>> >> > So with the commands above I have 16 workers running, correct? For the 
>> >> > scan_inodes I want to scale out to 32 workers to speed up the process 
>> >> > even more.
>> >> >
>> >> > Just to double-check before I send a new PR to update the docs, this is 
>> >> > the right way to run the tool, correct?
>> >>
>> >> It looks like you're targeting cephfs_metadata instead of your data pool.
>> >>
>> >> scan_extents and scan_inodes operate on data pools, even if your goal
>> >> is to rebuild your metadata pool (the argument is what you are
>> >> scanning, not what you are writing to).
>> >
>> > That was a typo of me when typing this e-mail. It is scanning the *data* 
>> > pool at the moment.
>> >
>> > Can you confirm that the worker_n and worker_m arguments are the correct 
>> > ones?
>>
>> Yep, they look right to me.
>
> Ok, great. I pushed a PR to update the docs and help. Care to review it?
>
> https://github.com/ceph/ceph/pull/12370
>
>>
>> >>
>> >> There is also a "scan_frags" command that operates on a metadata pool.
>> >
>> > Didn't know that. In this case the metadata pool is missing objects due to 
>> > that lost PG.
>> >
>> > I think the scan_extents and scan_inodes on the *data* pool is the correct 
>> > way to rebuild the metadata pool if it is missing objects, right?
>>
>> In general you'd use both scan_frags (to re-link any orphaned
>> directories that might have been orphaned if they had an ancestor
>> dirfrag in the lost PG) and then scan_extents+scan_inodes (to re-link
>> any orphaned files that might have been orphaned because their
>> immediate parent dirfrag was in the lost PG).
>>
>> However scan_extents+scan_inodes is generally doing the lion's share
>> of the work because anything that scan_frags would have caught would
>> probably also have appeared somewhere in a backtrace path and got
>> linked in by scan_inodes as a result, so you should probably just skip
>> scan_frags in this instance.
>>
>> BTW, you've probably already realised this, but be *very* cautious
>> about using the recovered filesystem: our testing of these tools is
>> mostly verifying that after recovery we can see and read the files
>> (i.e. well enough to extract them somewhere else), not that the
>> filesystem is necessarily working well for writes etc after being
>> recovered.  If it's possible, then it's always better to recover your
>> files to a separate location, and then rebuild your filesystem with
>> fresh pools -- that way you're not risking that there as anything
>> strange left behind by the recovery process.
>>
>
> I'm aware of this. Currently trying to make the best out of this situation 
> and get the FS up and running.
>
> The MDS was running fine for about 24 hours, but started to assert on missing 
> RADOS objects in the metadata pool. So we had to resort to this scan which 
> takes a long, very long time.
>
> 2016-12-07 08:29:58.852595 7f3d74c96700 -1 log_channel(cluster) log [ERR] : 
> dir 10011a4767b object missing on disk; some files may be lost
> 2016-12-07 08:29:58.855070 7f3d74c96700 -1 mds/MDCache.cc: In function 
> 'virtual void C_MDC_OpenInoTraverseDir::finish(int)' thread 7f3d74c96700 time 
> 2016-12-07 08:29:58.852637
> mds/MDCache.cc: 8213: FAILED assert(r >= 0)

Oops, that's a bug.  These cases are supposed to make the MDS record
damage and EIO requests to things beneath that path, not assert out.
Could you open a ticket please?

John

>
> ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
>
> Wido
>
>> John
>>
>> > Wido
>> >
>> >>
>> >> John
>> >>
>> >> > If not, before sending the PR and starting scan_inodes on this cluster, 
>> >> > what is the correct way to invoke the tool?
>> >> >
>> >> > Thanks!
>> >> >
>> >> > Wido
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to