So I did manage to bring my OSDs back to life, the pools are there
(.mgr and my test pool), and my few test bytes are available as well.
But it's a lot of work. And this is just a single host with only three
OSDs. But at least I can confirm that the procedure to rebuild the mon
store from the OSDs still works, you just have to keep in mind that
the described steps are for non-cephadm clusters. So you'll need to
use the cephadm shell to perform the ceph-objectstore-tool and
ceph-monstore-tool steps.
The osdmap was healthy (containing my three OSDs), and as soon as I
had injected the rebuilt monmap into the mon, the 'ceph status' showed
three OSDs again. But bringing the OSDs back up wasn't that easy, I
had to fiddle around a lot (this isn't exactly a regular procedure for
us).
Among (a lot of) other steps, you need to:
- rebuild mon store from *ALL* OSDs (that needs to work, otherwise the
osdmap won't be complete)
- bring at least one mgr back up (ceph auth import mgr keyring because
the mon store rebuild results in some config data loss)
- enable cephadm and set orchestrator backend
- add host(s) to orch
- set the correct container image (otherwise it will try to use the
image "docker.io/ceph/daemon-base:latest-master-devel")
- recreate osd directories with correct contents (type, whoami, fsid
etc.), some info is available with 'cephadm ceph-volume lvm list'
- ensure the unit.run files have the correct content
- and so on...
This is just a brief overview, but it should be possible to rescue
your data, it will just take quite some time and some Ceph experience.
Regards,
Eugen
Zitat von Eugen Block <[email protected]>:
Again, please don't drop the list from your responses.
I played around a bit with a single-host cluster, created three
OSDs, added a pool and copied a few bytes of data into it. Then I
attached the drives to a different host where I bootstrapped a fresh
cluster with the previous FSID. But the OSDs can't be activated,
unfortunately. I think the main problem is this:
orchestrator._interface.OrchestratorError: osd.0 not in osdmap
I'm not sure if the procedure to recover from OSDs will help here,
if I have the time, I'm gonna try.
Or another approach could be to create new OSDs in the new cluster
you built (if you have the resources) and then use
ceph-objectstore-tool export/import to get the data onto the new
OSDs. But that's also an advanced procedure, you require experience
to accomplish that.
I still think the easiest approach would have been to reduce the
monmap to 1 and start from there, but it's too late for that now.
Zitat von Jacek Rużyczka <[email protected]>:
But then I'd have to destroy the old Ceph cluster, right?
Am Sa., 30. Mai 2026 um 11:15 Uhr schrieb Eugen Block via ceph-users <
[email protected]>:
I haven't tried it myself yet, but you could bootstrap a fresh cluster
with the flag "--fsid", excerpt from the 'cephadm bootstrap -h' command:
--fsid FSID cluster FSID
and set the FSID your OSDs show. But again, I have no idea if that's
gonna work.
Zitat von Eugen Block <[email protected]>:
You're not gonna be able to reactivate those OSDs because they were
built with a different cluster FSID:
Your OSDs:
cluster fsid 8aad3073-39a1-11f1-bf6e-f2704a1efa9b
Your new cluster:
Inferring fsid 98e04296-5b5a-11f1-84cf-ceccf52b4a0f
And if you have a cephadm-managed cluster, you shouldn't use
ceph-volume directly but via cephadm [0]:
ceph cephadm osd activate <node>
But this won't help you if the cluster's fsid differs as mentioned
above. I think you might be able to recover if you recreate the
monmap by using the OSDs [1]. But this would mean you'd need to tear
down your MONs, it's a difficult situation, I can't tell if it's
gonna work that way. The procedure itself has been verified several
times on this list, but again, it's hard to tell.
[0]
https://docs.ceph.com/en/latest/cephadm/services/osd/#activate-existing-osds
[1]
https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#recovery-using-osds
Zitat von Jacek Rużyczka via ceph-users <[email protected]>:
Hi,
After being forced to reinstall the O/S on all my Ceph nodes, I'm
trying to
reactivate the OSDs, whose data lie on separate drives. The OSDs are
recognised:
mixtile@blade3n3:~$ sudo sudo ceph-volume lvm list
[sudo] password for mixtile:
====== osd.1 =======
[block]
/dev/ceph-5b8338a0-9246-48f5-9cfd-e1bbfa1cc199/osd-block-ccba3dd
2-0503-440f-b7c2-47e8d8e85253
block device
/dev/ceph-5b8338a0-9246-48f5-9cfd-e1bbfa1cc199/o
sd-block-ccba3dd2-0503-440f-b7c2-47e8d8e85253
block uuid 21ftde-FdBB-s4DM-Qeq7-TU4R-vjY7-6VXLoW
cephx lockbox secret
cluster fsid 8aad3073-39a1-11f1-bf6e-f2704a1efa9b
cluster name ceph
crush device class
encrypted 0
osd fsid ccba3dd2-0503-440f-b7c2-47e8d8e85253
osd id 1
osdspec affinity all-available-devices
type block
vdo 0
with tpm 0
devices /dev/nvme0n1
*But: *When trying to link the OSD to the all-new Ceph installation
(I've
made several Google searches with contradictory results. This is the
last
one.), I get an error:
mixtile@blade3n3:~$ sudo cephadm shell
Inferring fsid 98e04296-5b5a-11f1-84cf-ceccf52b4a0f
Inferring config
/var/lib/ceph/98e04296-5b5a-11f1-84cf-ceccf52b4a0f/mon.blade3n3
/config
Not using image
'sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102a753902f33ee16
c26b6cee' as it's not in list of non-dangling images with ceph=True
label
Unable to find image 'quay.io/ceph/ceph:v19' locally
v19: Pulling from ceph/ceph
Digest:
sha256:af0c5903e901e329adabe219dfc8d0c3efc1f05102a753902f33ee16c26b6cee
Status: Downloaded newer image for quay.io/ceph/ceph:v19
root@blade3n3:/usr/bin# cd
root@blade3n3:~# ceph-volume lvm activate --all
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph-authtool --gen-print-key
--> Activating OSD ID 1 FSID ccba3dd2-0503-440f-b7c2-47e8d8e85253
Running command: /usr/bin/mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-1
Running command: /usr/bin/chown -R ceph:ceph /var/lib/ceph/osd/ceph-1
Running command: /usr/bin/ceph-bluestore-tool --cluster=ceph
prime-osd-dir
--dev
/dev/ceph-5b8338a0-9246-48f5-9cfd-e1bbfa1cc199/osd-block-ccba3dd2-0503-440f-b7c2-47e8d8e85253
--path /var/lib/ceph/osd/ceph-1 --no-mon-config
stderr: 2026-05-29T17:28:36.854+0000 ffffbd2de040 -1 bdev(0xaaab21623800
/dev/c
eph-5b8338a0-9246-48f5-9cfd-e1bbfa1cc199/osd-block-ccba3dd2-0503-440f-b7c2-47e8d
8e85253) open stat got: (1) Operation not permitted
stderr: failed to read label for
/dev/ceph-5b8338a0-9246-48f5-9cfd-1bbfa1cc199
/osd-block-ccba3dd2-0503-440f-b7c2-47e8d8e85253: (1) Operation not
permitted
--> RuntimeError: command returned non-zero exit status: 1
This puzzles me because I'm already root, but the Bluestore tool won't
let
me reactivate the OSD? The old OSD is now full of valuable production
data.
And: Isn't it possible to use ceph orch to perform this operation?
Regards
Jacek Rużyczka
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]
_______________________________________________
ceph-users mailing list -- [email protected]
To unsubscribe send an email to [email protected]