Re: [ceph-users] MDS damaged

Adam Tygart Fri, 13 Jul 2018 05:01:40 -0700

Bluestore.

On Fri, Jul 13, 2018, 05:56 Dan van der Ster <d...@vanderster.com> wrote:


> Hi Adam,
>
> Are your osds bluestore or filestore?
>
> -- dan
>
>
> On Fri, Jul 13, 2018 at 7:38 AM Adam Tygart <mo...@ksu.edu> wrote:
> >
> > I've hit this today with an upgrade to 12.2.6 on my backup cluster.
> > Unfortunately there were issues with the logs (in that the files
> > weren't writable) until after the issue struck.
> >
> > 2018-07-13 00:16:54.437051 7f5a0a672700 -1 log_channel(cluster) log
> > [ERR] : 5.255 full-object read crc 0x4e97b4e != expected 0x6cfe829d on
> > 5:aa448500:::500.00000000:head
> >
> > It is a backup cluster and I can keep it around or blow away the data
> > (in this instance) as needed for testing purposes.
> >
> > --
> > Adam
> >
> > On Thu, Jul 12, 2018 at 10:39 AM, Alessandro De Salvo
> > <alessandro.desa...@roma1.infn.it> wrote:
> > > Some progress, and more pain...
> > >
> > > I was able to recover the 200.00000000 using the ceph-objectstore-tool
> for
> > > one of the OSDs (all identical copies) but trying to re-inject it just
> with
> > > rados put was giving no error while the get was still giving the same
> I/O
> > > error. So the solution was to rm the object and the put it again, that
> > > worked.
> > >
> > > However, after restarting one of the MDSes and seeting it to repaired,
> I've
> > > hit another, similar problem:
> > >
> > >
> > > 2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log
> [ERR] :
> > > error reading table object 'mds0_inotable' -5 ((5) Input/output error)
> > >
> > >
> > > Can I safely try to do the same as for object 200.00000000? Should I
> check
> > > something before trying it? Again, checking the copies of the object,
> they
> > > have identical md5sums on all the replicas.
> > >
> > > Thanks,
> > >
> > >
> > >     Alessandro
> > >
> > >
> > > Il 12/07/18 16:46, Alessandro De Salvo ha scritto:
> > >
> > > Unfortunately yes, all the OSDs were restarted a few times, but no
> change.
> > >
> > > Thanks,
> > >
> > >
> > >     Alessandro
> > >
> > >
> > > Il 12/07/18 15:55, Paul Emmerich ha scritto:
> > >
> > > This might seem like a stupid suggestion, but: have you tried to
> restart the
> > > OSDs?
> > >
> > > I've also encountered some random CRC errors that only showed up when
> trying
> > > to read an object,
> > > but not on scrubbing, that magically disappeared after restarting the
> OSD.
> > >
> > > However, in my case it was clearly related to
> > > https://tracker.ceph.com/issues/22464 which doesn't
> > > seem to be the issue here.
> > >
> > > Paul
> > >
> > > 2018-07-12 13:53 GMT+02:00 Alessandro De Salvo
> > > <alessandro.desa...@roma1.infn.it>:
> > >>
> > >>
> > >> Il 12/07/18 11:20, Alessandro De Salvo ha scritto:
> > >>
> > >>>
> > >>>
> > >>> Il 12/07/18 10:58, Dan van der Ster ha scritto:
> > >>>>
> > >>>> On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum <gfar...@redhat.com
> >
> > >>>> wrote:
> > >>>>>
> > >>>>> On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo
> > >>>>> <alessandro.desa...@roma1.infn.it> wrote:
> > >>>>>>
> > >>>>>> OK, I found where the object is:
> > >>>>>>
> > >>>>>>
> > >>>>>> ceph osd map cephfs_metadata 200.00000000
> > >>>>>> osdmap e632418 pool 'cephfs_metadata' (10) object '200.00000000'
> -> pg
> > >>>>>> 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18],
> p23)
> > >>>>>>
> > >>>>>>
> > >>>>>> So, looking at the osds 23, 35 and 18 logs in fact I see:
> > >>>>>>
> > >>>>>>
> > >>>>>> osd.23:
> > >>>>>>
> > >>>>>> 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster)
> log
> > >>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected
> 0x9ef2b41b
> > >>>>>> on
> > >>>>>> 10:292cf221:::200.00000000:head
> > >>>>>>
> > >>>>>>
> > >>>>>> osd.35:
> > >>>>>>
> > >>>>>> 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster)
> log
> > >>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected
> 0x9ef2b41b
> > >>>>>> on
> > >>>>>> 10:292cf221:::200.00000000:head
> > >>>>>>
> > >>>>>>
> > >>>>>> osd.18:
> > >>>>>>
> > >>>>>> 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster)
> log
> > >>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected
> 0x9ef2b41b
> > >>>>>> on
> > >>>>>> 10:292cf221:::200.00000000:head
> > >>>>>>
> > >>>>>>
> > >>>>>> So, basically the same error everywhere.
> > >>>>>>
> > >>>>>> I'm trying to issue a repair of the pg 10.14, but I'm not sure if
> it
> > >>>>>> may
> > >>>>>> help.
> > >>>>>>
> > >>>>>> No SMART errors (the fileservers are SANs, in RAID6 + LVM
> volumes),
> > >>>>>> and
> > >>>>>> no disk problems anywhere. No relevant errors in syslogs, the
> hosts
> > >>>>>> are
> > >>>>>> just fine. I cannot exclude an error on the RAID controllers, but
> 2 of
> > >>>>>> the OSDs with 10.14 are on a SAN system and one on a different
> one, so
> > >>>>>> I
> > >>>>>> would tend to exclude they both had (silent) errors at the same
> time.
> > >>>>>
> > >>>>>
> > >>>>> That's fairly distressing. At this point I'd probably try
> extracting
> > >>>>> the object using ceph-objectstore-tool and seeing if it decodes
> properly as
> > >>>>> an mds journal. If it does, you might risk just putting it back in
> place to
> > >>>>> overwrite the crc.
> > >>>>>
> > >>>> Wouldn't it be easier to scrub repair the PG to fix the crc?
> > >>>
> > >>>
> > >>> this is what I already instructed the cluster to do, a deep scrub,
> but
> > >>> I'm not sure it could repair in case all replicas are bad, as it
> seems to be
> > >>> the case.
> > >>
> > >>
> > >> I finally managed (with the help of Dan), to perform the deep-scrub
> on pg
> > >> 10.14, but the deep scrub did not detect anything wrong. Also trying
> to
> > >> repair 10.14 has no effect.
> > >> Still, trying to access the object I get in the OSDs:
> > >>
> > >> 2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log
> [ERR]
> > >> : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
> > >> 10:292cf221:::200.00000000:head
> > >>
> > >> Was deep-scrub supposed to detect the wrong crc? If yes, them it
> sounds
> > >> like a bug.
> > >> Can I force the repair someway?
> > >> Thanks,
> > >>
> > >>    Alessandro
> > >>
> > >>>
> > >>>>
> > >>>> Alessandro, did you already try a deep-scrub on pg 10.14?
> > >>>
> > >>>
> > >>> I'm waiting for the cluster to do that, I've sent it earlier this
> > >>> morning.
> > >>>
> > >>>>   I expect
> > >>>> it'll show an inconsistent object. Though, I'm unsure if repair will
> > >>>> correct the crc given that in this case *all* replicas have a bad
> crc.
> > >>>
> > >>>
> > >>> Exactly, this is what I wonder too.
> > >>> Cheers,
> > >>>
> > >>>     Alessandro
> > >>>
> > >>>>
> > >>>> --Dan
> > >>>>
> > >>>>> However, I'm also quite curious how it ended up that way, with a
> > >>>>> checksum mismatch but identical data (and identical checksums!)
> across the
> > >>>>> three replicas. Have you previously done some kind of scrub repair
> on the
> > >>>>> metadata pool? Did the PG perhaps get backfilled due to cluster
> changes?
> > >>>>> -Greg
> > >>>>>
> > >>>>>>
> > >>>>>> Thanks,
> > >>>>>>
> > >>>>>>
> > >>>>>>       Alessandro
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> Il 11/07/18 18:56, John Spray ha scritto:
> > >>>>>>>
> > >>>>>>> On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo
> > >>>>>>> <alessandro.desa...@roma1.infn.it> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi John,
> > >>>>>>>>
> > >>>>>>>> in fact I get an I/O error by hand too:
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> rados get -p cephfs_metadata 200.00000000 200.00000000
> > >>>>>>>> error getting cephfs_metadata/200.00000000: (5) Input/output
> error
> > >>>>>>>
> > >>>>>>> Next step would be to go look for corresponding errors on your
> OSD
> > >>>>>>> logs, system logs, and possibly also check things like the SMART
> > >>>>>>> counters on your hard drives for possible root causes.
> > >>>>>>>
> > >>>>>>> John
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>> Can this be recovered someway?
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>>        Alessandro
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> Il 11/07/18 18:33, John Spray ha scritto:
> > >>>>>>>>>
> > >>>>>>>>> On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo
> > >>>>>>>>> <alessandro.desa...@roma1.infn.it> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Hi,
> > >>>>>>>>>>
> > >>>>>>>>>> after the upgrade to luminous 12.2.6 today, all our MDSes have
> > >>>>>>>>>> been
> > >>>>>>>>>> marked as damaged. Trying to restart the instances only
> result in
> > >>>>>>>>>> standby MDSes. We currently have 2 filesystems active and 2
> MDSes
> > >>>>>>>>>> each.
> > >>>>>>>>>>
> > >>>>>>>>>> I found the following error messages in the mon:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> mds.0 <node1_IP>:6800/2412911269 down:damaged
> > >>>>>>>>>> mds.1 <node2_IP>:6800/830539001 down:damaged
> > >>>>>>>>>> mds.0 <node3_IP>:6800/4080298733 down:damaged
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Whenever I try to force the repaired state with ceph mds
> repaired
> > >>>>>>>>>> <fs_name>:<rank> I get something like this in the MDS logs:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> 2018-07-11 13:20:41.597970 7ff7e010e700  0
> > >>>>>>>>>> mds.1.journaler.mdlog(ro)
> > >>>>>>>>>> error getting journal off disk
> > >>>>>>>>>> 2018-07-11 13:20:41.598173 7ff7df90d700 -1
> log_channel(cluster)
> > >>>>>>>>>> log
> > >>>>>>>>>> [ERR] : Error recovering journal 0x201: (5) Input/output error
> > >>>>>>>>>
> > >>>>>>>>> An EIO reading the journal header is pretty scary. The MDS
> itself
> > >>>>>>>>> probably can't tell you much more about this: you need to dig
> down
> > >>>>>>>>> into the RADOS layer.  Try reading the 200.00000000 object
> (that
> > >>>>>>>>> happens to be the rank 0 journal header, every CephFS
> filesystem
> > >>>>>>>>> should have one) using the `rados` command line tool.
> > >>>>>>>>>
> > >>>>>>>>> John
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>
> > >>>>>>>>>> Any attempt of running the journal export results in errors,
> like
> > >>>>>>>>>> this one:
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> cephfs-journal-tool --rank=cephfs:0 journal export backup.bin
> > >>>>>>>>>> Error ((5) Input/output error)2018-07-11 17:01:30.631571
> > >>>>>>>>>> 7f94354fff00 -1
> > >>>>>>>>>> Header 200.00000000 is unreadable
> > >>>>>>>>>>
> > >>>>>>>>>> 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export:
> Journal
> > >>>>>>>>>> not
> > >>>>>>>>>> readable, attempt object-by-object dump with `rados`
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Same happens for recover_dentries
> > >>>>>>>>>>
> > >>>>>>>>>> cephfs-journal-tool --rank=cephfs:0 event recover_dentries
> summary
> > >>>>>>>>>> Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1
> Header
> > >>>>>>>>>> 200.00000000 is unreadable
> > >>>>>>>>>> Errors:
> > >>>>>>>>>> 0
> > >>>>>>>>>>
> > >>>>>>>>>> Is there something I could try to do to have the cluster back?
> > >>>>>>>>>>
> > >>>>>>>>>> I was able to dump the contents of the metadata pool with
> rados
> > >>>>>>>>>> export
> > >>>>>>>>>> -p cephfs_metadata <filename> and I'm currently trying the
> > >>>>>>>>>> procedure
> > >>>>>>>>>> described in
> > >>>>>>>>>>
> > >>>>>>>>>>
> http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery
> > >>>>>>>>>> but I'm not sure if it will work as it's apparently doing
> nothing
> > >>>>>>>>>> at the
> > >>>>>>>>>> moment (maybe it's just very slow).
> > >>>>>>>>>>
> > >>>>>>>>>> Any help is appreciated, thanks!
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>         Alessandro
> > >>>>>>>>>>
> > >>>>>>>>>> _______________________________________________
> > >>>>>>>>>> ceph-users mailing list
> > >>>>>>>>>> ceph-users@lists.ceph.com
> > >>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>>>>>
> > >>>>>> _______________________________________________
> > >>>>>> ceph-users mailing list
> > >>>>>> ceph-users@lists.ceph.com
> > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>>>>
> > >>>>> _______________________________________________
> > >>>>> ceph-users mailing list
> > >>>>> ceph-users@lists.ceph.com
> > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>>
> > >>>
> > >>> _______________________________________________
> > >>> ceph-users mailing list
> > >>> ceph-users@lists.ceph.com
> > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >>
> > >>
> > >> _______________________________________________
> > >> ceph-users mailing list
> > >> ceph-users@lists.ceph.com
> > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > >
> > >
> > >
> > > --
> > > Paul Emmerich
> > >
> > > Looking for help with your Ceph cluster? Contact us at
> https://croit.io
> > >
> > > croit GmbH
> > > Freseniusstr. 31h
> > > 81247 München
> > > www.croit.io
> > > Tel: +49 89 1896585 90
> > >
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > >
> > >
> > > _______________________________________________
> > > ceph-users mailing list
> > > ceph-users@lists.ceph.com
> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> > >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS damaged

Reply via email to