Hi,

rocksdb in BlueStore should be opened like this with ceph-kvstore-tool:

  ceph-kvstore-tool bluestore-kv

Instead of just "rocksdb" which is for rocksdb on some file system.

Paul

-- 
Paul Emmerich

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Tel: +49 89 1896585 90


On Sun, Jun 30, 2019 at 2:49 PM Christian Wahl <w...@teco.edu> wrote:

> Hi all,
>
> we are running a pretty small instance of Ceph (v13.2.6) with 1 host and 8
> OSDs and are planning to expand to a more default setup with 3 hosts and
> more OSDs.
>
> However tonight one of our redudant PSUs died and it did failover, but it
> looks like this has corrupted 3 out of 8 OSDs.
> The pools all have a replication level of 2.
>
> All OSDs are BlueStore with rocksdb, no external journal or wal
>
> 2 of them report a missing rocksdb:
>
> Jun 30 01:33:32 tecoceph systemd[1]: Starting Ceph object storage daemon
> osd.3...
> Jun 30 01:33:32 tecoceph systemd[1]: Started Ceph object storage daemon
> osd.3.
> Jun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.242
> 7f2666a75d80 -1 Public network was set, but cluster network was not set
> Jun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.242
> 7f2666a75d80 -1     Using public network also for cluster network
> Jun 30 01:33:32 tecoceph ceph-osd[11431]: starting osd.3 at - osd_data
> /var/lib/ceph/osd/ceph-3 /var/lib/ceph/osd/ceph-3/journal
> Jun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.898
> 7f2666a75d80 -1 rocksdb: NotFound:
> Jun 30 01:33:32 tecoceph ceph-osd[11431]: 2019-06-30 01:33:32.898
> 7f2666a75d80 -1 bluestore(/var/lib/ceph/osd/ceph-3) _open_db erroring
> opening db:
> Jun 30 01:33:33 tecoceph ceph-osd[11431]: 2019-06-30 01:33:33.267
> 7f2666a75d80 -1 osd.3 0 OSD:init: unable to mount object store
> Jun 30 01:33:33 tecoceph ceph-osd[11431]: 2019-06-30 01:33:33.267
> 7f2666a75d80 -1  ** ERROR: osd init failed: (5) Input/output error
> Jun 30 01:33:33 tecoceph systemd[1]: ceph-osd@3.service: main process
> exited, code=exited, status=1/FAILURE
>
> So I tried working with the bluestore-tool
>
> [root@tecoceph osd]# ceph-bluestore-tool show-label --path
> /var/lib/ceph/osd/ceph-3/
> inferring bluefs devices from bluestore path
> {
>     "/var/lib/ceph/osd/ceph-3//block": {
>         "osd_uuid": "c28c092c-00aa-4db0-9925-642bf99f0662",
>         "size": 8001561821184,
>         "btime": "2018-05-28 22:44:58.712336",
>         "description": "main",
>         "bluefs": "1",
>         "ceph_fsid": "a9493143-3e4e-450e-b3b8-28508d48d412",
>         "kv_backend": "rocksdb",
>         "magic": "ceph osd volume v026",
>         "mkfs_done": "yes",
>         "osd_key": "AQBG************************",
>         "ready": "ready",
>         "whoami": "3"
>     }
> }
>
> [root@tecoceph osd]# ceph-bluestore-tool fsck --deep yes --path
> /var/lib/ceph/osd/ceph-3/
> 2019-06-30 14:38:35.998 7f9947432940 -1 rocksdb: NotFound:
> 2019-06-30 14:38:35.998 7f9947432940 -1
> bluestore(/var/lib/ceph/osd/ceph-3/) _open_db erroring opening db:
> error from fsck: (5) Input/output error
>
> Trying to access the rocksdb with the kvstore-tool fails as well
> [root@tecoceph osd]# ceph-kvstore-tool rocksdb /var/lib/ceph/osd/ceph-3
> list
> 2019-06-30 14:39:36.021 7faa747e8a80  1 rocksdb: do_open column families:
> []
> failed to open type 2019-06-30 14:39:36.022 7faa747e8a80 -1 rocksdb:
> Invalid argument: /var/lib/ceph/osd/ceph-3: does not exist
> (create_if_missing is false)
> rocksdb path /var/lib/ceph/osd/ceph-3: (22) Invalid argument
>
> Repairing it with the kvstore-tool results in a segmentation fault…
> [root@tecoceph osd]# ceph-kvstore-tool rocksdb /var/lib/ceph/osd/ceph-3
> repair
> *** Caught signal (Segmentation fault) **
>  in thread 7ff8fde03a80 thread_name:ceph-kvstore-to
>  ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
> (stable)
>  1: (()+0xf5d0) [0x7ff8f23925d0]
>  2: (main()+0x2c4) [0x55ae6dadb4e4]
>  3: (__libc_start_main()+0xf5) [0x7ff8f0d673d5]
>  4: (()+0x21dde0) [0x55ae6dbafde0]
> 2019-06-30 14:39:15.785 7ff8fde03a80 -1 *** Caught signal (Segmentation
> fault) **
>  in thread 7ff8fde03a80 thread_name:ceph-kvstore-to
>
>  ceph version 13.2.6 (7b695f835b03642f85998b2ae7b6dd093d9fbce4) mimic
> (stable)
>  1: (()+0xf5d0) [0x7ff8f23925d0]
>  2: (main()+0x2c4) [0x55ae6dadb4e4]
>  3: (__libc_start_main()+0xf5) [0x7ff8f0d673d5]
>  4: (()+0x21dde0) [0x55ae6dbafde0]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed
> to interpret this.
>
> The other one crashes with a segfault and any tool other as well, because
> of a wrong magic number
> Jun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:27.805
> 7fa9bd453d80 -1 Public network was set, but cluster network was not set
> Jun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:27.805
> 7fa9bd453d80 -1     Using public network also for cluster network
> Jun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:29.771
> 7fa9bd453d80 -1 abort: Corruption: Bad table magic number: expected
> 9863518390377041911, found 15656361161312523986 in db/002923.sst
> Jun 30 01:32:29 tecoceph ceph-osd[8661]: -324> 2019-06-30 01:32:29.831
> 7fa9bd453d80 -1 *** Caught signal (Aborted) **
>
> Is there any way to recover any of these OSDs?
>
> Karlsruhe Institute of Technology (KIT)
> Pervasive Computing Systems – TECO
> Prof. Dr. Michael Beigl
> IT
> Christian Wahl
>
> Vincenz-Prießnitz-Str. 1
> Building 07.07., 2nd floor
> 76131 Karlsruhe, Germany
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to