Re: [ceph-users] DR practice: "uuid != super.uuid" and csum error at blob offset 0x0

Igor Fedotov Tue, 09 Jul 2019 09:05:48 -0700

Hi Mark,

I doubt read-only mode would help here.

Log replay is required to build a consistent store state and one can'tbypass it. And looks like your drive/controller still detect some errorswhile reading.

For the second issue this PR might help (you'll be able to disable csumverification and hopefully OSD will start) once it's merged(https://github.com/ceph/ceph/pull/26247). For now IMO the only way togo for you is to have a custom build with this patch manually applied.



Thanks,

Igor

On 7/9/2019 5:02 PM, Mark Lehrer wrote:

My main question is this - is there a way to stop any replay or
journaling during OSD startup and bring up the pool/fs in read-only
mode?

Here is a description of what I'm seeing.  I have a Luminous cluster
with CephFS and 16 8TB SSDs, using size=3.

I had a problem with one of my SAS controllers, and now I have at
least 3 OSDs that refuse to start.  The hardware appears to be fine
now.

I have my essential data backed up, but there are a few files that I
wouldn't mind saving so I want to use this as disaster recovery
practice.

The two problems I am seeing are:

1) On two of OSDs, there is a startup replay error after successfully
replaying quite a few blocks:

2019-07-06 16:08:05.281063 7f6baec66e40 10 bluefs _replay 0x1543000:
stop: uuid c366a2d6-e221-98b3-59fe-0f324c9dac8e != super.uuid
263428d5-8963-4339-8815-92ab6067e7a4
2019-07-06 16:08:05.281064 7f6baec66e40 10 bluefs _replay log file
size was 0x1543000
2019-07-06 16:08:05.281085 7f6baec66e40 -1 bluefs _replay file with
link count 0: file(ino 1485 size 0x15f4c43 mtime 2019-07-04
20:39:39.387601 bdev 1 allocated 1600000 extents
[1:0x35771500000+100000,1:0x35771600000+100000,1:0x35771700000+100000,1:0x35771c00000+100000,1:0x35771d00000+100000,1:0x35772200000+100000,1:0x35772300000+100000,1:0x35772800000+100000,1:0x35772900000+100000,1:0x35772a00000+100000,1:0x35772b00000+100000,1:0x35772c00000+100000,1:0x35772d00000+100000,1:0x35772e00000+100000,1:0x35773300000+100000,1:0x35773400000+100000,1:0x35773500000+100000,1:0x35773600000+100000,1:0x35773700000+100000,1:0x35773800000+100000,1:0x35773900000+100000,1:0x35773a00000+100000])
2019-07-06 16:08:05.281093 7f6baec66e40 -1 bluefs mount failed to
replay log: (5) Input/output error


2) The following error happens on at least two OSDs:

2019-07-06 15:58:46.621008 7fdcee030e40 -1
bluestore(/var/lib/ceph/osd/ceph-74) _verify_csum bad crc32c/0x1000
checksum at blob offset 0x0, got 0x147db0c5, expected 0x8f052c9,
device location [0x10000~1000], logical extent 0x0~1000, object
#-1:7b3f43c4:::osd_superblock:0#


The system was archiving some unimportant files when it went down, so
I really don't care about any of the recent writes.

What are my recovery options here?  I was thinking that turning off
replaying and running in read-only mode would be feasible, but maybe
there are better options?

Thanks,
Mark
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] DR practice: "uuid != super.uuid" and csum error at blob offset 0x0

Reply via email to