[ceph-users] Re: Cluster recovery: DC power failure killed OSD node BlueStore block.DB devices

Igor Fedotov Wed, 23 Apr 2025 04:50:48 -0700

Paul,

unfortunately there is no way to recover an OSD which lost its DBvolume. All the metadata is kept on this volume so it's impossible torecover object structure after such a failure. Some user data piecescould be found on main device but they're completely unstructured. Notto mention that some Ceph apps (e.g. RGW or CephFS) depend severely onOMAP metadata which is kept purely in DB.

And yes - adding new DB volume implies legacy DB data is stillaccessible, e.g. original DB data collocates user one at main device.But once you had dedicated DB volume and it's lost that's not the caseany more.



Thanks,

Igor


On 23.04.2025 14:29, Paul Browne wrote:

Hi Igor,
Ah OK; So in the failure mode of, for instance, 1 failed dedicatedSSD/NVMe based dedicated DB devices serving 1 OSD, is there going tobe no real way to replace that failed DB device alone without alsoreplacing/wiping the OSD and then having the OSD back-fill objectsagain from cluster replicas?
I had been looking into e.g.
ceph-bluestore-tool<https://docs.ceph.com/en/latest/man/8/ceph-bluestore-tool/> bluefs-bdev-new-db<https://docs.ceph.com/en/latest/man/8/ceph-bluestore-tool/>
, which seemed to offer some way forward around this.
But it may be the case that this only offers a way forward to replacean older/smaller/slower but still working DB device to areplacement/larger/faster one. And if the old DB device is actuallyfailed and un-usable, then the whole DB+OSD needs replacing?
Here we've lost too many DB devices and by extension too many blockedOSDs to ever recover this Ceph cluster, I fear...
Thanks,
Paul

*******************
Paul Browne
Research Computing Platforms
University Information Services
Roger Needham Building
JJ Thompson Avenue
University of Cambridge
Cambridge
United Kingdom
E-Mail: pf...@cam.ac.uk <mailto:pf...@cam.ac.uk>
Tel: 0044-1223-746548
*******************
------------------------------------------------------------------------
*From:* Igor Fedotov <igor.fedo...@croit.io>
*Sent:* 23 April 2025 09:35
*To:* Paul Browne <pf...@cam.ac.uk>; ceph-users@ceph.io<ceph-users@ceph.io>*Subject:* Re: [ceph-users] Cluster recovery: DC power failure killedOSD node BlueStore block.DB devices
Hi Paul,

have I got your idea correct - you're trying to attach new empty DB
volumes to existing OSDs in an attempt to recover these OSDs, right? And
original SSD drives which kept DB volumes are physically dead?

If so than IMO it's a way to nowhere, "recovered" OSDs wouldn't run
without original metadata. This is rather a waste of time..


Thanks,

Igor

On 23.04.2025 0:11, Paul Browne wrote:
> Hi ceph-users,
>
> We recently suffered a total power failure at our main DC;fortunately, our production Ceph cluster emerged unscathed but asmaller Ceph cluster came back with the majority of its dedicated SSDBlueStore block.DB devices dead (but its HDD OSD devices unharmed).This cluster underpinned a small OpenStack cloud, so it would bepreferable to recover it rather than writing it off.
>
> In terms of deployment tooling, this ailing Ceph cluster is a fairlystandard Red Hat Ceph Storage 7 (so Quincy) cephadm deployed cluster,with the main wrinkle about it being that both the DB and HDD OSDdevices make use of the cephadm supported LVM->LUKS layering above theBlueStore devices.
>
> The dead BlueStore block.DB devices are of course blocking thesurviving HDD OSD daemons (in cephadm deployed containers) from comingup cleanly and so the Ceph cluster status is currently very degraded(attached status for the ugly picture)
>
> I've kicked around some ideas of recovering the dead DB devices andrestarting down+out OSDs by;
>
> * Manually partitioning replacement SSDs into new DB devicepartitions+LVs> * Installing the same LUKS keys on them retrieved from the Cephconfig DB, matching up against which OSD is on which OSD host.
>
>    *
> Manually changing over device links for OSDs to their DB device with"ceph-bluestore-tool bluefs-bdev-new-db" or similar
>    *
> Try restarting OSDs with updated links to new LVM->LUKS->block.DBdevices
>
> This approach seems highly messy and subject to needing to extract alot of information error-free from dumps of ceph volume lvm, list inorder to exactly match extant OSD UUIDs to newly created DBdevicemapper devices.
>
> Is there going to be some smarter/better/faster way tonon-destructively recover these intact HDD OSDs which have links todead block.DB devices, using native cephadm tooling rather thangetting so low-level as all the above?
>
> Many thanks for any advice,
>
> *******************
> Paul Browne
> Research Computing Platforms
> University Information Services
> Roger Needham Building
> JJ Thompson Avenue
> University of Cambridge
> Cambridge
> United Kingdom
> E-Mail: pf...@cam.ac.uk<mailto:pf...@cam.ac.uk <mailto:pf...@cam.ac.uk>>
> Tel: 0044-1223-746548
> *******************
> _______________________________________________
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Cluster recovery: DC power failure killed OSD node BlueStore block.DB devices

Reply via email to