[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

Sebastian Mazza Tue, 21 Dec 2021 15:10:18 -0800

Hi Mehmet,

thank you for your suggestion. I did now check the kernel log, but I didn’t see 
something interesting. However, I copied the parts that seams to be related to 
the SATA disks of the failed OSDs. Maybe you see more than I do.


[    1.815801] ata7: SATA link down (SStatus 0 SControl 300)
[    1.815829] ata5: SATA link down (SStatus 0 SControl 300)
[    1.815857] ata6: SATA link down (SStatus 0 SControl 300)
[    1.815898] ata1: SATA link down (SStatus 0 SControl 300)
[    1.815924] ata3: SATA link down (SStatus 0 SControl 300)
[    1.816082] ata8: SATA link down (SStatus 0 SControl 300)
[    1.826475] ata10: SATA link down (SStatus 0 SControl 300)
[    1.827513] ata9: SATA link down (SStatus 0 SControl 300)
[    1.827588] ata15: SATA link down (SStatus 0 SControl 300)
[    1.827611] ata14: SATA link down (SStatus 0 SControl 300)
[    1.827633] ata16: SATA link down (SStatus 0 SControl 300)
[    1.827656] ata12: SATA link down (SStatus 0 SControl 300)
[    1.827723] ata13: SATA link down (SStatus 0 SControl 300)
[    1.827749] ata11: SATA link down (SStatus 0 SControl 300)
[    1.881883] sfc 0000:41:00.1 enp65s0f1np1: renamed from eth3
[    1.905500] sfc 0000:42:00.1 enp66s0f1np1: renamed from eth5
[    1.965505] sfc 0000:41:00.0 enp65s0f0np0: renamed from eth2
[    1.969431] usb 7-1: new high-speed USB device number 2 using xhci_hcd
[    1.981407] ata4: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.981728] ata2: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.996617] ata4.00: ATA-11: ST12000NM0008-2H3101, SN02, max UDMA/133
[    1.996620] ata4.00: 23437770752 sectors, multi 16: LBA48 NCQ (depth 32), AA
[    2.001799] sfc 0000:42:00.0 enp66s0f0np0: renamed from eth4
[    2.011544] ata4.00: configured for UDMA/133
[    2.030004] ata2.00: ATA-11: ST12000VN0008-2PH103, SC61, max UDMA/133
[    2.030007] ata2.00: 23437770752 sectors, multi 16: LBA48 NCQ (depth 32), AA
[    2.063203] ata2.00: configured for UDMA/133

[    2.406013] scsi 1:0:0:0: Direct-Access     ATA      ST12000VN0008-2P SC61 
PQ: 0 ANSI: 5
[    2.406220] sd 1:0:0:0: Attached scsi generic sg0 type 0
[    2.406266] sd 1:0:0:0: [sda] 23437770752 512-byte logical blocks: (12.0 
TB/10.9 TiB)
[    2.406270] sd 1:0:0:0: [sda] 4096-byte physical blocks
[    2.406280] sd 1:0:0:0: [sda] Write Protect is off
[    2.406282] sd 1:0:0:0: [sda] Mode Sense: 00 3a 00 00
[    2.406295] sd 1:0:0:0: [sda] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[    2.406506] scsi 3:0:0:0: Direct-Access     ATA      ST12000NM0008-2H SN02 
PQ: 0 ANSI: 5
[    2.406683] sd 3:0:0:0: Attached scsi generic sg1 type 0
[    2.406689] sd 3:0:0:0: [sdb] 23437770752 512-byte logical blocks: (12.0 
TB/10.9 TiB)
[    2.406691] sd 3:0:0:0: [sdb] 4096-byte physical blocks
[    2.406695] sd 3:0:0:0: [sdb] Write Protect is off
[    2.406696] sd 3:0:0:0: [sdb] Mode Sense: 00 3a 00 00
[    2.406704] sd 3:0:0:0: [sdb] Write cache: enabled, read cache: enabled, 
doesn't support DPO or FUA
[    2.465415] usb 7-1.1: new full-speed USB device number 3 using xhci_hcd
[    2.485406] sd 3:0:0:0: [sdb] Attached SCSI removable disk
[    2.493432] sd 1:0:0:0: [sda] Attached SCSI removable disk

[    3.965442] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit 
configured to use KillMode=none. This is unsafe, as it disables systemd's 
process lifecycle management for the service. Please update your service to use 
a safer KillMode=, such as 'mixed' or 'control-group'. Support for 
KillMode=none is deprecated and will eventually be removed.
[    3.965726] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit 
configured to use KillMode=none. This is unsafe, as it disables systemd's 
process lifecycle management for the service. Please update your service to use 
a safer KillMode=, such as 'mixed' or 'control-group'. Support for 
KillMode=none is deprecated and will eventually be removed.
[    3.965919] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit 
configured to use KillMode=none. This is unsafe, as it disables systemd's 
process lifecycle management for the service. Please update your service to use 
a safer KillMode=, such as 'mixed' or 'control-group'. Support for 
KillMode=none is deprecated and will eventually be removed.
[    3.966134] systemd[1]: /lib/systemd/system/ceph-volume@.service:8: Unit 
configured to use KillMode=none. This is unsafe, as it disables systemd's 
process lifecycle management for the service. Please update your service to use 
a safer KillMode=, such as 'mixed' or 'control-group'. Support for 
KillMode=none is deprecated and will eventually be removed.
[    3.968007] systemd[1]: Queued start job for default target Graphical 
Interface.
[    4.002962] systemd[1]: Created slice system-ceph\x2dvolume.slice.

[    4.004015] systemd[1]: Reached target ceph target allowing to start/stop 
all ceph-fuse@.service instances at once.
[    4.004024] systemd[1]: Reached target ceph target allowing to start/stop 
all ceph-mon@.service instances at once.
[    4.004031] systemd[1]: Reached target ceph target allowing to start/stop 
all ceph-mds@.service instances at once.
[    4.004037] systemd[1]: Reached target ceph target allowing to start/stop 
all ceph-mgr@.service instances at once.
[    4.004041] systemd[1]: Reached target ceph target allowing to start/stop 
all ceph-osd@.service instances at once.


I try to execute `ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-7` 
which again result in 
        fsck failed: (5) Input/output error
but this does not produce a single line at dmesg


Thanks,
Sebastian

> On 21.12.2021, at 19:29, c...@elchaka.de wrote:
> 
> Hi,
> This
> > fsck failed: (5) Input/output error
> 
> Sounds like an Hardware issue.
> Did you have a Look on "dmesg"?
> 
> Hth
> Mehmet
> 
> Am 21. Dezember 2021 17:47:35 MEZ schrieb Sebastian Mazza 
> <sebast...@macforce.at>:
> Hi all,
> 
> after a reboot of a cluster 3 OSDs can not be started. The OSDs exit with  
> the following error message:
>       2021-12-21T01:01:02.209+0100 7fd368cebf00  4 rocksdb: 
> [db_impl/db_impl.cc:396] Shutdown: canceling all background work
>       2021-12-21T01:01:02.209+0100 7fd368cebf00  4 rocksdb: 
> [db_impl/db_impl.cc:573] Shutdown complete
>       2021-12-21T01:01:02.209+0100 7fd368cebf00 -1 rocksdb: Corruption: Bad 
> table magic number: expected 9863518390377041911, found 0 in db/002182.sst
>       2021-12-21T01:01:02.213+0100 7fd368cebf00 -1 
> bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db: 
>       2021-12-21T01:01:02.213+0100 7fd368cebf00  1 bluefs umount
>       2021-12-21T01:01:02.213+0100 7fd368cebf00  1 bdev(0x559bbe0ea800 
> /var/lib/ceph/osd/ceph-7/block) close
>       2021-12-21T01:01:02.293+0100 7fd368cebf00  1 bdev(0x559bbe0ea400 
> /var/lib/ceph/osd/ceph-7/block) close
>       2021-12-21T01:01:02.537+0100 7fd368cebf00 -1 osd.7 0 OSD:init: unable 
> to mount object store
>       2021-12-21T01:01:02.537+0100 7fd368cebf00 -1  ** ERROR: osd init 
> failed: (5) Input/output error
> 
> 
> I found a similar problem in this Mailing list: 
> https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/MJLVS7UPJ5AZKOYN3K2VQW7WIOEQGC5V/#MABLFA4FHG6SX7YN4S6BGSCP6DOAX6UE
> 
> In this thread, Francois was able to successfully repair his OSD data with 
> `ceph-bluestore-tool fsck`. I tried to run: 
> `ceph-bluestore-tool fsck --path /var/lib/ceph/osd/ceph-7 -l 
> /var/log/ceph/bluestore-tool-fsck-osd-7.log --log-level 20  > 
> /var/log/ceph/bluestore-tool-fsck-osd-7.out  2>&1`
> But that results in:
>       2021-12-21T16:44:18.455+0100 7fc54ef7a240 -1 rocksdb: Corruption: Bad 
> table magic number: expected 9863518390377041911, found 0 in db/002182.sst
>       2021-12-21T16:44:18.455+0100 7fc54ef7a240 -1 
> bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db: 
>       fsck failed: (5) Input/output error
> 
> I also tried to run `ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-7 
> repair`. But that also fails with:
>       2021-12-21T17:34:06.780+0100 7f35765f7240  0 
> bluestore(/var/lib/ceph/osd/ceph-7) _open_db_and_around read-only:0 repair:0
>       2021-12-21T17:34:06.780+0100 7f35765f7240  1 bdev(0x55fce5a1a800 
> /var/lib/ceph/osd/ceph-7/block) open path /var/lib/ceph/osd/ceph-7/block
>       2021-12-21T17:34:06.780+0100 7f35765f7240  1 bdev(0x55fce5a1a800 
> /var/lib/ceph/osd/ceph-7/block) open size 12000134430720 (0xae9ffc00000, 11 
> TiB) 
>               block_size 4096 (4 KiB) rotational discard not supported
>       2021-12-21T17:34:06.780+0100 7f35765f7240  1 
> bluestore(/var/lib/ceph/osd/ceph-7) _set_cache_sizes cache_size 1073741824 
> meta 0.45 kv 0.45 data 0.06
>       2021-12-21T17:34:06.780+0100 7f35765f7240  1 bdev(0x55fce5a1ac00 
> /var/lib/ceph/osd/ceph-7/block) open path /var/lib/ceph/osd/ceph-7/block
>       2021-12-21T17:34:06.780+0100 7f35765f7240  1 bdev(0x55fce5a1ac00 
> /var/lib/ceph/osd/ceph-7/block) open size 12000134430720 (0xae9ffc00000, 11 
> TiB) 
>               block_size 4096 (4 KiB) rotational discard not supported
>       2021-12-21T17:34:06.780+0100 7f35765f7240  1 bluefs add_block_device 
> bdev 1 path /var/lib/ceph/osd/ceph-7/block size 11 TiB
>       2021-12-21T17:34:06.780+0100 7f35765f7240  1 bluefs mount
>       2021-12-21T17:34:06.780+0100 7f35765f7240  1 bluefs _init_alloc shared, 
> id 1, capacity 0xae9ffc00000, block size 0x10000
>       2021-12-21T17:34:06.904+0100 7f35765f7240  1 bluefs mount 
> shared_bdev_used = 0
>       2021-12-21T17:34:06.904+0100 7f35765f7240  1 
> bluestore(/var/lib/ceph/osd/ceph-7) _prepare_db_environment set db_paths to 
> db,11400127709184 db.slow,11400127709184
>       2021-12-21T17:34:06.908+0100 7f35765f7240 -1 rocksdb: Corruption: Bad 
> table magic number: expected 9863518390377041911, found 0 in db/002182.sst
>       2021-12-21T17:34:06.908+0100 7f35765f7240 -1 
> bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db: 
>       2021-12-21T17:34:06.908+0100 7f35765f7240  1 bluefs umount
>       2021-12-21T17:34:06.908+0100 7f35765f7240  1 bdev(0x55fce5a1ac00 
> /var/lib/ceph/osd/ceph-7/block) close
>       2021-12-21T17:34:07.072+0100 7f35765f7240  1 bdev(0x55fce5a1a800 
> /var/lib/ceph/osd/ceph-7/block) close
> 
> 
> The cluster is not in production, therefore, I can remove all corrupt pools 
> and delete the OSDs. However, I would like to understand what was going on, 
> in order to be able to avoid such a situation in the future.
> 
> I will provide the OSD logs from the time around the server reboot at the 
> following link: https://we.tl/t-fArHXTmSM7
> 
> Ceph version: 16.2.6
> 
> 
> Thanks,
> Sebastian
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

Reply via email to