Hi John,

Thanks for your pointers, I have extracted the onmap_keys and onmap_values
for an object I found in the metadata pool called '600.00000000' and
dropped them at the below location

https://www.dropbox.com/sh/wg6irrjg7kie95p/AABk38IB4PXsn2yINpNa9Js5a?dl=0

Could you explain how is it possible to identify stray directory fragments?

Thanks

On Thu, Dec 8, 2016 at 6:30 PM, John Spray <jsp...@redhat.com> wrote:

> On Thu, Dec 8, 2016 at 3:45 PM, Sean Redmond <sean.redmo...@gmail.com>
> wrote:
> > Hi,
> >
> > We had no changes going on with the ceph pools or ceph servers at the
> time.
> >
> > We have however been hitting this in the last week and it maybe related:
> >
> > http://tracker.ceph.com/issues/17177
>
> Oh, okay -- so you've got corruption in your metadata pool as a result
> of hitting that issue, presumably.
>
> I think in the past people have managed to get past this by taking
> their MDSs offline and manually removing the omap entries in their
> stray directory fragments (i.e. using the `rados` cli on the objects
> starting "600.").
>
> John
>
>
>
> > Thanks
> >
> > On Thu, Dec 8, 2016 at 3:34 PM, John Spray <jsp...@redhat.com> wrote:
> >>
> >> On Thu, Dec 8, 2016 at 3:11 PM, Sean Redmond <sean.redmo...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > I have a CephFS cluster that is currently unable to start the mds
> server
> >> > as
> >> > it is hitting an assert, the extract from the mds log is below, any
> >> > pointers
> >> > are welcome:
> >> >
> >> > ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
> >> >
> >> > 2016-12-08 14:50:18.577038 7f7d9faa3700  1 mds.0.47077 handle_mds_map
> >> > state
> >> > change up:rejoin --> up:active
> >> > 2016-12-08 14:50:18.577048 7f7d9faa3700  1 mds.0.47077 recovery_done
> --
> >> > successful recovery!
> >> > 2016-12-08 14:50:18.577166 7f7d9faa3700  1 mds.0.47077 active_start
> >> > 2016-12-08 14:50:19.460208 7f7d9faa3700  1 mds.0.47077 cluster
> >> > recovered.
> >> > 2016-12-08 14:50:19.495685 7f7d9abfc700 -1 mds/CDir.cc: In function
> >> > 'void
> >> > CDir::try_remove_dentries_for_stray()' thread 7f7d9abfc700 time
> >> > 2016-12-08
> >> > 14:50:19
> >> > .494508
> >> > mds/CDir.cc: 699: FAILED assert(dn->get_linkage()->is_null())
> >> >
> >> >  ceph version 10.2.3 (ecc23778eb545d8dd55e2e4735b53cc93f92e65b)
> >> >  1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> >> > const*)+0x80) [0x55f0f789def0]
> >> >  2: (CDir::try_remove_dentries_for_stray()+0x1a0) [0x55f0f76666c0]
> >> >  3: (StrayManager::__eval_stray(CDentry*, bool)+0x8c9)
> [0x55f0f75e7799]
> >> >  4: (StrayManager::eval_stray(CDentry*, bool)+0x22) [0x55f0f75e7cf2]
> >> >  5: (MDCache::scan_stray_dir(dirfrag_t)+0x16d) [0x55f0f753b30d]
> >> >  6: (MDSInternalContextBase::complete(int)+0x18b) [0x55f0f76e93db]
> >> >  7: (MDSRank::_advance_queues()+0x6a7) [0x55f0f749bf27]
> >> >  8: (MDSRank::ProgressThread::entry()+0x4a) [0x55f0f749c45a]
> >> >  9: (()+0x770a) [0x7f7da6bdc70a]
> >> >  10: (clone()+0x6d) [0x7f7da509d82d]
> >> >  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
> >> > needed to
> >> > interpret this.
> >>
> >> Last time someone had this issue they had tried to create a filesystem
> >> using pools that had another filesystem's old objects in:
> >> http://tracker.ceph.com/issues/16829
> >>
> >> What was going on on your system before you hit this?
> >>
> >> John
> >>
> >> > Thanks
> >> >
> >> > _______________________________________________
> >> > ceph-users mailing list
> >> > ceph-users@lists.ceph.com
> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >> >
> >
> >
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to