Bluestore. On Fri, Jul 13, 2018, 05:56 Dan van der Ster <d...@vanderster.com> wrote:
> Hi Adam, > > Are your osds bluestore or filestore? > > -- dan > > > On Fri, Jul 13, 2018 at 7:38 AM Adam Tygart <mo...@ksu.edu> wrote: > > > > I've hit this today with an upgrade to 12.2.6 on my backup cluster. > > Unfortunately there were issues with the logs (in that the files > > weren't writable) until after the issue struck. > > > > 2018-07-13 00:16:54.437051 7f5a0a672700 -1 log_channel(cluster) log > > [ERR] : 5.255 full-object read crc 0x4e97b4e != expected 0x6cfe829d on > > 5:aa448500:::500.00000000:head > > > > It is a backup cluster and I can keep it around or blow away the data > > (in this instance) as needed for testing purposes. > > > > -- > > Adam > > > > On Thu, Jul 12, 2018 at 10:39 AM, Alessandro De Salvo > > <alessandro.desa...@roma1.infn.it> wrote: > > > Some progress, and more pain... > > > > > > I was able to recover the 200.00000000 using the ceph-objectstore-tool > for > > > one of the OSDs (all identical copies) but trying to re-inject it just > with > > > rados put was giving no error while the get was still giving the same > I/O > > > error. So the solution was to rm the object and the put it again, that > > > worked. > > > > > > However, after restarting one of the MDSes and seeting it to repaired, > I've > > > hit another, similar problem: > > > > > > > > > 2018-07-12 17:04:41.999136 7f54c3f4e700 -1 log_channel(cluster) log > [ERR] : > > > error reading table object 'mds0_inotable' -5 ((5) Input/output error) > > > > > > > > > Can I safely try to do the same as for object 200.00000000? Should I > check > > > something before trying it? Again, checking the copies of the object, > they > > > have identical md5sums on all the replicas. > > > > > > Thanks, > > > > > > > > > Alessandro > > > > > > > > > Il 12/07/18 16:46, Alessandro De Salvo ha scritto: > > > > > > Unfortunately yes, all the OSDs were restarted a few times, but no > change. > > > > > > Thanks, > > > > > > > > > Alessandro > > > > > > > > > Il 12/07/18 15:55, Paul Emmerich ha scritto: > > > > > > This might seem like a stupid suggestion, but: have you tried to > restart the > > > OSDs? > > > > > > I've also encountered some random CRC errors that only showed up when > trying > > > to read an object, > > > but not on scrubbing, that magically disappeared after restarting the > OSD. > > > > > > However, in my case it was clearly related to > > > https://tracker.ceph.com/issues/22464 which doesn't > > > seem to be the issue here. > > > > > > Paul > > > > > > 2018-07-12 13:53 GMT+02:00 Alessandro De Salvo > > > <alessandro.desa...@roma1.infn.it>: > > >> > > >> > > >> Il 12/07/18 11:20, Alessandro De Salvo ha scritto: > > >> > > >>> > > >>> > > >>> Il 12/07/18 10:58, Dan van der Ster ha scritto: > > >>>> > > >>>> On Wed, Jul 11, 2018 at 10:25 PM Gregory Farnum <gfar...@redhat.com > > > > >>>> wrote: > > >>>>> > > >>>>> On Wed, Jul 11, 2018 at 9:23 AM Alessandro De Salvo > > >>>>> <alessandro.desa...@roma1.infn.it> wrote: > > >>>>>> > > >>>>>> OK, I found where the object is: > > >>>>>> > > >>>>>> > > >>>>>> ceph osd map cephfs_metadata 200.00000000 > > >>>>>> osdmap e632418 pool 'cephfs_metadata' (10) object '200.00000000' > -> pg > > >>>>>> 10.844f3494 (10.14) -> up ([23,35,18], p23) acting ([23,35,18], > p23) > > >>>>>> > > >>>>>> > > >>>>>> So, looking at the osds 23, 35 and 18 logs in fact I see: > > >>>>>> > > >>>>>> > > >>>>>> osd.23: > > >>>>>> > > >>>>>> 2018-07-11 15:49:14.913771 7efbee672700 -1 log_channel(cluster) > log > > >>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected > 0x9ef2b41b > > >>>>>> on > > >>>>>> 10:292cf221:::200.00000000:head > > >>>>>> > > >>>>>> > > >>>>>> osd.35: > > >>>>>> > > >>>>>> 2018-07-11 18:01:19.989345 7f760291a700 -1 log_channel(cluster) > log > > >>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected > 0x9ef2b41b > > >>>>>> on > > >>>>>> 10:292cf221:::200.00000000:head > > >>>>>> > > >>>>>> > > >>>>>> osd.18: > > >>>>>> > > >>>>>> 2018-07-11 18:18:06.214933 7fabaf5c1700 -1 log_channel(cluster) > log > > >>>>>> [ERR] : 10.14 full-object read crc 0x976aefc5 != expected > 0x9ef2b41b > > >>>>>> on > > >>>>>> 10:292cf221:::200.00000000:head > > >>>>>> > > >>>>>> > > >>>>>> So, basically the same error everywhere. > > >>>>>> > > >>>>>> I'm trying to issue a repair of the pg 10.14, but I'm not sure if > it > > >>>>>> may > > >>>>>> help. > > >>>>>> > > >>>>>> No SMART errors (the fileservers are SANs, in RAID6 + LVM > volumes), > > >>>>>> and > > >>>>>> no disk problems anywhere. No relevant errors in syslogs, the > hosts > > >>>>>> are > > >>>>>> just fine. I cannot exclude an error on the RAID controllers, but > 2 of > > >>>>>> the OSDs with 10.14 are on a SAN system and one on a different > one, so > > >>>>>> I > > >>>>>> would tend to exclude they both had (silent) errors at the same > time. > > >>>>> > > >>>>> > > >>>>> That's fairly distressing. At this point I'd probably try > extracting > > >>>>> the object using ceph-objectstore-tool and seeing if it decodes > properly as > > >>>>> an mds journal. If it does, you might risk just putting it back in > place to > > >>>>> overwrite the crc. > > >>>>> > > >>>> Wouldn't it be easier to scrub repair the PG to fix the crc? > > >>> > > >>> > > >>> this is what I already instructed the cluster to do, a deep scrub, > but > > >>> I'm not sure it could repair in case all replicas are bad, as it > seems to be > > >>> the case. > > >> > > >> > > >> I finally managed (with the help of Dan), to perform the deep-scrub > on pg > > >> 10.14, but the deep scrub did not detect anything wrong. Also trying > to > > >> repair 10.14 has no effect. > > >> Still, trying to access the object I get in the OSDs: > > >> > > >> 2018-07-12 13:40:32.711732 7efbee672700 -1 log_channel(cluster) log > [ERR] > > >> : 10.14 full-object read crc 0x976aefc5 != expected 0x9ef2b41b on > > >> 10:292cf221:::200.00000000:head > > >> > > >> Was deep-scrub supposed to detect the wrong crc? If yes, them it > sounds > > >> like a bug. > > >> Can I force the repair someway? > > >> Thanks, > > >> > > >> Alessandro > > >> > > >>> > > >>>> > > >>>> Alessandro, did you already try a deep-scrub on pg 10.14? > > >>> > > >>> > > >>> I'm waiting for the cluster to do that, I've sent it earlier this > > >>> morning. > > >>> > > >>>> I expect > > >>>> it'll show an inconsistent object. Though, I'm unsure if repair will > > >>>> correct the crc given that in this case *all* replicas have a bad > crc. > > >>> > > >>> > > >>> Exactly, this is what I wonder too. > > >>> Cheers, > > >>> > > >>> Alessandro > > >>> > > >>>> > > >>>> --Dan > > >>>> > > >>>>> However, I'm also quite curious how it ended up that way, with a > > >>>>> checksum mismatch but identical data (and identical checksums!) > across the > > >>>>> three replicas. Have you previously done some kind of scrub repair > on the > > >>>>> metadata pool? Did the PG perhaps get backfilled due to cluster > changes? > > >>>>> -Greg > > >>>>> > > >>>>>> > > >>>>>> Thanks, > > >>>>>> > > >>>>>> > > >>>>>> Alessandro > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> Il 11/07/18 18:56, John Spray ha scritto: > > >>>>>>> > > >>>>>>> On Wed, Jul 11, 2018 at 4:49 PM Alessandro De Salvo > > >>>>>>> <alessandro.desa...@roma1.infn.it> wrote: > > >>>>>>>> > > >>>>>>>> Hi John, > > >>>>>>>> > > >>>>>>>> in fact I get an I/O error by hand too: > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> rados get -p cephfs_metadata 200.00000000 200.00000000 > > >>>>>>>> error getting cephfs_metadata/200.00000000: (5) Input/output > error > > >>>>>>> > > >>>>>>> Next step would be to go look for corresponding errors on your > OSD > > >>>>>>> logs, system logs, and possibly also check things like the SMART > > >>>>>>> counters on your hard drives for possible root causes. > > >>>>>>> > > >>>>>>> John > > >>>>>>> > > >>>>>>> > > >>>>>>> > > >>>>>>>> Can this be recovered someway? > > >>>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Alessandro > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> Il 11/07/18 18:33, John Spray ha scritto: > > >>>>>>>>> > > >>>>>>>>> On Wed, Jul 11, 2018 at 4:10 PM Alessandro De Salvo > > >>>>>>>>> <alessandro.desa...@roma1.infn.it> wrote: > > >>>>>>>>>> > > >>>>>>>>>> Hi, > > >>>>>>>>>> > > >>>>>>>>>> after the upgrade to luminous 12.2.6 today, all our MDSes have > > >>>>>>>>>> been > > >>>>>>>>>> marked as damaged. Trying to restart the instances only > result in > > >>>>>>>>>> standby MDSes. We currently have 2 filesystems active and 2 > MDSes > > >>>>>>>>>> each. > > >>>>>>>>>> > > >>>>>>>>>> I found the following error messages in the mon: > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> mds.0 <node1_IP>:6800/2412911269 down:damaged > > >>>>>>>>>> mds.1 <node2_IP>:6800/830539001 down:damaged > > >>>>>>>>>> mds.0 <node3_IP>:6800/4080298733 down:damaged > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Whenever I try to force the repaired state with ceph mds > repaired > > >>>>>>>>>> <fs_name>:<rank> I get something like this in the MDS logs: > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> 2018-07-11 13:20:41.597970 7ff7e010e700 0 > > >>>>>>>>>> mds.1.journaler.mdlog(ro) > > >>>>>>>>>> error getting journal off disk > > >>>>>>>>>> 2018-07-11 13:20:41.598173 7ff7df90d700 -1 > log_channel(cluster) > > >>>>>>>>>> log > > >>>>>>>>>> [ERR] : Error recovering journal 0x201: (5) Input/output error > > >>>>>>>>> > > >>>>>>>>> An EIO reading the journal header is pretty scary. The MDS > itself > > >>>>>>>>> probably can't tell you much more about this: you need to dig > down > > >>>>>>>>> into the RADOS layer. Try reading the 200.00000000 object > (that > > >>>>>>>>> happens to be the rank 0 journal header, every CephFS > filesystem > > >>>>>>>>> should have one) using the `rados` command line tool. > > >>>>>>>>> > > >>>>>>>>> John > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>>> Any attempt of running the journal export results in errors, > like > > >>>>>>>>>> this one: > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> cephfs-journal-tool --rank=cephfs:0 journal export backup.bin > > >>>>>>>>>> Error ((5) Input/output error)2018-07-11 17:01:30.631571 > > >>>>>>>>>> 7f94354fff00 -1 > > >>>>>>>>>> Header 200.00000000 is unreadable > > >>>>>>>>>> > > >>>>>>>>>> 2018-07-11 17:01:30.631584 7f94354fff00 -1 journal_export: > Journal > > >>>>>>>>>> not > > >>>>>>>>>> readable, attempt object-by-object dump with `rados` > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Same happens for recover_dentries > > >>>>>>>>>> > > >>>>>>>>>> cephfs-journal-tool --rank=cephfs:0 event recover_dentries > summary > > >>>>>>>>>> Events by type:2018-07-11 17:04:19.770779 7f05429fef00 -1 > Header > > >>>>>>>>>> 200.00000000 is unreadable > > >>>>>>>>>> Errors: > > >>>>>>>>>> 0 > > >>>>>>>>>> > > >>>>>>>>>> Is there something I could try to do to have the cluster back? > > >>>>>>>>>> > > >>>>>>>>>> I was able to dump the contents of the metadata pool with > rados > > >>>>>>>>>> export > > >>>>>>>>>> -p cephfs_metadata <filename> and I'm currently trying the > > >>>>>>>>>> procedure > > >>>>>>>>>> described in > > >>>>>>>>>> > > >>>>>>>>>> > http://docs.ceph.com/docs/master/cephfs/disaster-recovery-experts/#using-an-alternate-metadata-pool-for-recovery > > >>>>>>>>>> but I'm not sure if it will work as it's apparently doing > nothing > > >>>>>>>>>> at the > > >>>>>>>>>> moment (maybe it's just very slow). > > >>>>>>>>>> > > >>>>>>>>>> Any help is appreciated, thanks! > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Alessandro > > >>>>>>>>>> > > >>>>>>>>>> _______________________________________________ > > >>>>>>>>>> ceph-users mailing list > > >>>>>>>>>> ceph-users@lists.ceph.com > > >>>>>>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>>>>> > > >>>>>> _______________________________________________ > > >>>>>> ceph-users mailing list > > >>>>>> ceph-users@lists.ceph.com > > >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>>>> > > >>>>> _______________________________________________ > > >>>>> ceph-users mailing list > > >>>>> ceph-users@lists.ceph.com > > >>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >>> > > >>> > > >>> _______________________________________________ > > >>> ceph-users mailing list > > >>> ceph-users@lists.ceph.com > > >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >> > > >> > > >> _______________________________________________ > > >> ceph-users mailing list > > >> ceph-users@lists.ceph.com > > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > > > > -- > > > Paul Emmerich > > > > > > Looking for help with your Ceph cluster? Contact us at > https://croit.io > > > > > > croit GmbH > > > Freseniusstr. 31h > > > 81247 München > > > www.croit.io > > > Tel: +49 89 1896585 90 > > > > > > > > > > > > > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > > > > > > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com