Re: [ceph-users] MDS damaged

Yan, Zheng Fri, 13 Jul 2018 05:55:48 -0700

On Thu, Jul 12, 2018 at 11:39 PM Alessandro De Salvo
<alessandro.desa...@roma1.infn.it> wrote:
>
> Some progress, and more pain...
>
> I was able to recover the 200.00000000 using the ceph-objectstore-tool for 
> one of the OSDs (all identical copies) but trying to re-inject it just with 
> rados put was giving no error while the get was still giving the same I/O 
> error. So the solution was to rm the object and the put it again, that worked.
>
> However, after restarting one of the MDSes and seeting it to repaired, I've 
> hit another, similar problem:
>
>
> 2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log [ERR] : 
> error reading table object 'mds0_inotable' -5 ((5) Input/output error)
>
>
> Can I safely try to do the same as for object 200.00000000? Should I check 
> something before trying it? Again, checking the copies of the object, they 
> have identical md5sums on all the replicas.
>


Yes, It should be safe. you also need to the same for several other
objects. full object list are:

200.00000000
mds0_inotable
100.00000000.inode
mds_snaptable
1.00000000.inode

The first three objects are per-mds-rank.  Ff you have enabled
multi-active mds, you also need to update objects of other ranks. For
mds.1, object names are 201.00000000, mds1_inotable and
101.00000000.inode.



> Thanks,
>
>
>     Alessandro
>
>
> Il 12/07/18 16:46, Alessandro De Salvo ha scritto:
>
> Unfortunately yes, all the OSDs were restarted a few times, but no change.
>
> Thanks,
>
>
>     Alessandro
>
>
> Il 12/07/18 15:55, Paul Emmerich ha scritto:
>
> This might seem like a stupid suggestion, but: have you tried to restart the 
> OSDs?
>
> I've also encountered some random CRC errors that only showed up when trying 
> to read an object,
> but not on scrubbing, that magically disappeared after restarting the OSD.
>
> However, in my case it was clearly related to 
> https://tracker.ceph.com/issues/22464 which doesn't
> seem to be the issue here.
>
> Paul
>
> 2018-07-12 13:53 GMT+02:00 Alessandro De Salvo 
> <alessandro.desa...@roma1.infn.it>:
>>
>>
>> Il 12/07/18 11:20, Alessandro De Salvo ha scritto:
>>
>>>
>>>
>>> Il 12/07/18 10:58, Dan van der Ster ha scritto:
>>>>
>>>> On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum <gfar...@redhat.com> wrote:
>>>>>
>>>>> On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo 
>>>>> <alessandro.desa...@roma1.infn.it> wrote:
>>>>>>
>>>>>> OK, I found where the object is:
>>>>>>
>>>>>>
>>>>>> ceph osd map cephfs_metadata 200.00000000
>>>>>> osdmap e632418 pool 'cephfs_metadata' (10) object '200.00000000' -> pg
>>>>>> 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], p23)
>>>>>>
>>>>>>
>>>>>> So, looking at the osds 23, 35 and 18 logs in fact I see:
>>>>>>
>>>>>>
>>>>>> osd.23:
>>>>>>
>>>>>> 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) log
>>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
>>>>>> 10:292cf221:::200.00000000:head
>>>>>>
>>>>>>
>>>>>> osd.35:
>>>>>>
>>>>>> 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) log
>>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
>>>>>> 10:292cf221:::200.00000000:head
>>>>>>
>>>>>>
>>>>>> osd.18:
>>>>>>
>>>>>> 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) log
>>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on
>>>>>> 10:292cf221:::200.00000000:head
>>>>>>
>>>>>>
>>>>>> So, basically the same error everywhere.
>>>>>>
>>>>>> I'm trying to issue a repair of the pg 10.14, but I'm not sure if it may
>>>>>> help.
>>>>>>
>>>>>> No SMART errors (the fileservers are SANs, in RAID6 + LVM volumes), and
>>>>>> no disk problems anywhere. No relevant errors in syslogs, the hosts are
>>>>>> just fine. I cannot exclude an error on the RAID controllers, but 2 of
>>>>>> the OSDs with 10.14 are on a SAN system and one on a different one, so I
>>>>>> would tend to exclude they both had (silent) errors at the same time.
>>>>>
>>>>>
>>>>> That's fairly distressing. At this point I'd probably try extracting the 
>>>>> object using ceph-objectstore-tool and seeing if it decodes properly as 
>>>>> an mds journal. If it does, you might risk just putting it back in place 
>>>>> to overwrite the crc.
>>>>>
>>>> Wouldn't it be easier to scrub repair the PG to fix the crc?
>>>
>>>
>>> this is what I already instructed the cluster to do, a deep scrub, but I'm 
>>> not sure it could repair in case all replicas are bad, as it seems to be 
>>> the case.
>>
>>
>> I finally managed (with the help of Dan), to perform the deep-scrub on pg 
>> 10.14, but the deep scrub did not detect anything wrong. Also trying to 
>> repair 10.14 has no effect.
>> Still, trying to access the object I get in the OSDs:
>>
>> 2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log [ERR] : 
>> 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on 
>> 10:292cf221:::200.00000000:head
>>
>> Was deep-scrub supposed to detect the wrong crc? If yes, them it sounds like 
>> a bug.
>> Can I force the repair someway?
>> Thanks,
>>
>>    Alessandro
>>
>>>
>>>>
>>>> Alessandro, did you already try a deep-scrub on pg 10.14?
>>>
>>>
>>> I'm waiting for the cluster to do that, I've sent it earlier this morning.
>>>
>>>>   I expect
>>>> it'll show an inconsistent object. Though, I'm unsure if repair will
>>>> correct the crc given that in this case *all* replicas have a bad crc.
>>>
>>>
>>> Exactly, this is what I wonder too.
>>> Cheers,
>>>
>>>     Alessandro
>>>
>>>>
>>>> --Dan
>>>>
>>>>> However, I'm also quite curious how it ended up that way, with a checksum 
>>>>> mismatch but identical data (and identical checksums!) across the three 
>>>>> replicas. Have you previously done some kind of scrub repair on the 
>>>>> metadata pool? Did the PG perhaps get backfilled due to cluster changes?
>>>>> -Greg
>>>>>
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>>
>>>>>>       Alessandro
>>>>>>
>>>>>>
>>>>>>
>>>>>> Il 11/07/18 18:56, John Spray ha scritto:
>>>>>>>
>>>>>>> On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo
>>>>>>> <alessandro.desa...@roma1.infn.it> wrote:
>>>>>>>>
>>>>>>>> Hi John,
>>>>>>>>
>>>>>>>> in fact I get an I/O error by hand too:
>>>>>>>>
>>>>>>>>
>>>>>>>> rados get -p cephfs_metadata 200.00000000 200.00000000
>>>>>>>> error getting cephfs_metadata/200.00000000: (5) Input/output error
>>>>>>>
>>>>>>> Next step would be to go look for corresponding errors on your OSD
>>>>>>> logs, system logs, and possibly also check things like the SMART
>>>>>>> counters on your hard drives for possible root causes.
>>>>>>>
>>>>>>> John
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Can this be recovered someway?
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>>
>>>>>>>>
>>>>>>>>        Alessandro
>>>>>>>>
>>>>>>>>
>>>>>>>> Il 11/07/18 18:33, John Spray ha scritto:
>>>>>>>>>
>>>>>>>>> On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo
>>>>>>>>> <alessandro.desa...@roma1.infn.it> wrote:
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>>
>>>>>>>>>> after the upgrade to luminous 12.2.6 today, all our MDSes have been
>>>>>>>>>> marked as damaged. Trying to restart the instances only result in
>>>>>>>>>> standby MDSes. We currently have 2 filesystems active and 2 MDSes 
>>>>>>>>>> each.
>>>>>>>>>>
>>>>>>>>>> I found the following error messages in the mon:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> mds.0 <node1_IP>:6800/2412911269 down:damaged
>>>>>>>>>> mds.1 <node2_IP>:6800/830539001 down:damaged
>>>>>>>>>> mds.0 <node3_IP>:6800/4080298733 down:damaged
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Whenever I try to force the repaired state with ceph mds repaired
>>>>>>>>>> <fs_name>:<rank> I get something like this in the MDS logs:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2018-07-11 13:20:41.597970 7ff7e010e700  0 mds.1.journaler.mdlog(ro)
>>>>>>>>>> error getting journal off disk
>>>>>>>>>> 2018-07-11 13:20:41.598173 7ff7df90d700 -1 log_channel(cluster) log
>>>>>>>>>> [ERR] : Error recovering journal 0x201: (5) Input/output error
>>>>>>>>>
>>>>>>>>> An EIO reading the journal header is pretty scary. The MDS itself
>>>>>>>>> probably can't tell you much more about this: you need to dig down
>>>>>>>>> into the RADOS layer.  Try reading the 200.00000000 object (that
>>>>>>>>> happens to be the rank 0 journal header, every CephFS filesystem
>>>>>>>>> should have one) using the `rados` command line tool.
>>>>>>>>>
>>>>>>>>> John
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> Any attempt of running the journal export results in errors, like 
>>>>>>>>>> this one:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> cephfs-journal-tool --rank=cephfs:0 journal export backup.bin
>>>>>>>>>> Error ((5) Input/output error)2018-07-11 17:01:30.631571 
>>>>>>>>>> 7f94354fff00 -1
>>>>>>>>>> Header 200.00000000 is unreadable
>>>>>>>>>>
>>>>>>>>>> 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: Journal 
>>>>>>>>>> not
>>>>>>>>>> readable, attempt object-by-object dump with `rados`
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Same happens for recover_dentries
>>>>>>>>>>
>>>>>>>>>> cephfs-journal-tool --rank=cephfs:0 event recover_dentries summary
>>>>>>>>>> Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 Header
>>>>>>>>>> 200.00000000 is unreadable
>>>>>>>>>> Errors:
>>>>>>>>>> 0
>>>>>>>>>>
>>>>>>>>>> Is there something I could try to do to have the cluster back?
>>>>>>>>>>
>>>>>>>>>> I was able to dump the contents of the metadata pool with rados 
>>>>>>>>>> export
>>>>>>>>>> -p cephfs_metadata <filename> and I'm currently trying the procedure
>>>>>>>>>> described in
>>>>>>>>>> http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery
>>>>>>>>>> but I'm not sure if it will work as it's apparently doing nothing at 
>>>>>>>>>> the
>>>>>>>>>> moment (maybe it's just very slow).
>>>>>>>>>>
>>>>>>>>>> Any help is appreciated, thanks!
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>         Alessandro
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> ceph-users mailing list
>>>>>>>>>> ceph-users@lists.ceph.com
>>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>>
>>>>>> _______________________________________________
>>>>>> ceph-users mailing list
>>>>>> ceph-users@lists.ceph.com
>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>>
>>>>> _______________________________________________
>>>>> ceph-users mailing list
>>>>> ceph-users@lists.ceph.com
>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] MDS damaged

Reply via email to